Release history#
Version 0.12.3#
May 28, 2024
Changelog#
Compatibility#
Compatibility with scikit-learn 1.5 #1074 and #1084 by Guillaume Lemaitre.
Version 0.12.2#
March 31, 2024
Changelog#
Bug fixes#
Fix the way we check for a specific Python version in the test suite. #1075 by Guillaume Lemaitre.
Version 0.12.1#
March 31, 2024
Changelog#
Bug fixes#
Fix a bug in
InstanceHardnessThreshold
whereestimator
could not be aPipeline
object. #1049 by Gonenc Mogol.
Compatibility#
Do not use
distutils
in tests due to deprecation. #1065 by Michael R. Crusoe.Fix the scikit-learn import in tests to be compatible with version 1.4.1.post1. #1073 by Guillaume Lemaitre.
Fix test to be compatible with Python 3.13. #1073 by Guillaume Lemaitre.
Version 0.12.0#
January 24, 2024
Changelog#
Bug fixes#
Fix a bug in
SMOTENC
where the entries of the one-hot encoding should be divided bysqrt(2)
and not2
, taking into account that they are plugged into an Euclidean distance computation. #1014 by Guillaume Lemaitre.Raise an informative error message when all support vectors are tagged as noise in
SVMSMOTE
. #1016 by Guillaume Lemaitre.Fix a bug in
SMOTENC
where the median of standard deviation of the continuous features was only computed on the minority class. Now, we are computing this statistic for each class that is up-sampled. #1015 by Guillaume Lemaitre.Fix a bug in
SMOTENC
such that the case where the median of standard deviation of the continuous features is null is handled in the multiclass case as well. #1015 by Guillaume Lemaitre.Fix a bug in
BorderlineSMOTE
version 2 where samples should be generated from the whole dataset and not only from the minority class. #1023 by Guillaume Lemaitre.Fix a bug in
NeighbourhoodCleaningRule
where thekind_sel="all"
was not working as explained in the literature. #1012 by Guillaume Lemaitre.Fix a bug in
NeighbourhoodCleaningRule
where thethreshold_cleaning
ratio was multiplied on the total number of samples instead of the number of samples in the minority class. #1012 by Guillaume Lemaitre.Fix a bug in
RandomUnderSampler
andRandomOverSampler
where a column containing only NaT was not handled correctly. #1059 by Guillaume Lemaitre.
Compatibility#
BalancedRandomForestClassifier
now support missing values and monotonic constraints if scikit-learn >= 1.4 is installed.Pipeline
support metadata routing if scikit-learn >= 1.4 is installed.Compatibility with scikit-learn 1.4. #1058 by Guillaume Lemaitre.
Deprecations#
Deprecate
estimator_
argument in favor ofestimators_
for the classesCondensedNearestNeighbour
andOneSidedSelection
.estimator_
will be removed in 0.14. #1011 by Guillaume Lemaitre.Deprecate
kind_sel
in:pr:`1012
by Guillaume Lemaitre.
Enhancements#
Version 0.11.0#
July 8, 2023
Changelog#
Bug fixes#
Fix a bug in
classification_report_imbalanced
where the parametertarget_names
was not taken into account whenoutput_dict=True
. #989 by AYY7.SMOTENC
now handles mix types of data type such asbool
andpd.category
by delegating the conversion to scikit-learn encoder. #1002 by Guillaume Lemaitre.Handle sparse matrices in
SMOTEN
and raise a warning since it requires a conversion to dense matrices. #1003 by Guillaume Lemaitre.Remove spurious warning raised when minority class get over-sampled more than the number of sample in the majority class. #1007 by Guillaume Lemaitre.
Compatibility#
Maintenance release for being compatible with scikit-learn >= 1.3.0. #999 by Guillaume Lemaitre.
Deprecation#
The fitted attribute
ohe_
inSMOTENC
is deprecated and will be removed in version 0.13. Usecategorical_encoder_
instead. #1000 by Guillaume Lemaitre.The default of the parameters
sampling_strategy
,bootstrap
andreplacement
will change inBalancedRandomForestClassifier
to follow the implementation of the original paper. This changes will take effect in version 0.13. #1006 by Guillaume Lemaitre.
Enhancements#
SMOTENC
now accepts a parametercategorical_encoder
allowing to specify aOneHotEncoder
with custom parameters. #1000 by Guillaume Lemaitre.SMOTEN
now accepts a parametercategorical_encoder
allowing to specify aOrdinalEncoder
with custom parameters. A new fitted parametercategorical_encoder_
is exposed to access the fitted encoder. #1001 by Guillaume Lemaitre.RandomUnderSampler
andRandomOverSampler
(whenshrinkage is not None
) now accept any data types and will not attempt any data conversion. #1004 by Guillaume Lemaitre.SMOTENC
now support passing array-like ofstr
when passing thecategorical_features
parameter. #1008 by :user`Guillaume Lemaitre <glemaitre>`.SMOTENC
now support automatic categorical inference whencategorical_features
is set to"auto"
. #1009 by :user`Guillaume Lemaitre <glemaitre>`.
Version 0.10.1#
December 28, 2022
Changelog#
Bug fixes#
Fix a regression in over-sampler where the string
minority
was rejected as an unvalid sampling strategy. #964 by Prakhyath Bhandary.
Version 0.10.0#
December 9, 2022
Changelog#
Bug fixes#
Make sure that
Substitution
is working withpython -OO
that replace__doc__
byNone
. #953 bu Guillaume Lemaitre.
Compatibility#
Maintenance release for be compatible with scikit-learn >= 1.0.2. #946, #947, #949 by Guillaume Lemaitre.
Add support for automatic parameters validation as in scikit-learn >= 1.2. #955 by Guillaume Lemaitre.
Add support for
feature_names_in_
as well asget_feature_names_out
for all samplers. #959 by Guillaume Lemaitre.
Deprecation#
The parameter
n_jobs
has been deprecated from the classesADASYN
,BorderlineSMOTE
,SMOTE
,SMOTENC
,SMOTEN
, andSVMSMOTE
. Instead, pass a nearest neighbors estimator wheren_jobs
is set. #887 by Guillaume Lemaitre.The parameter
base_estimator
is deprecated and will be removed in version 0.12. It is impacted the following classes:BalancedBaggingClassifier
,EasyEnsembleClassifier
,RUSBoostClassifier
. #946 by Guillaume Lemaitre.
Enhancements#
Add support to accept compatible
NearestNeighbors
objects by only duck-typing. For instance, it allows to accept cuML instances. #858 by NV-jpt and Guillaume Lemaitre.
Version 0.9.1#
May 16, 2022
Changelog#
This release provides fixes that make imbalanced-learn
works with the
latest release (1.1.0
) of scikit-learn
.
Version 0.9.0#
January 11, 2022
Changelog#
This release is mainly providing fixes that make imbalanced-learn
works
with the latest release (1.0.2
) of scikit-learn
.
Version 0.8.1#
September 29, 2020
Changelog#
Maintenance#
Make
imbalanced-learn
compatible withscikit-learn
1.0. #864 by Guillaume Lemaitre.
Version 0.8.0#
February 18, 2021
Changelog#
New features#
Add the the function
imblearn.metrics.macro_averaged_mean_absolute_error
returning the average across class of the MAE. This metric is used in ordinal classification. #780 by Aurélien Massiot.Add the class
imblearn.metrics.pairwise.ValueDifferenceMetric
to compute pairwise distances between samples containing only categorical values. #796 by Guillaume Lemaitre.Add the class
imblearn.over_sampling.SMOTEN
to over-sample data only containing categorical features. #802 by Guillaume Lemaitre.Add the possibility to pass any type of samplers in
imblearn.ensemble.BalancedBaggingClassifier
unlocking the implementation of methods based on resampled bagging. #808 by Guillaume Lemaitre.
Enhancements#
Add option
output_dict
inimblearn.metrics.classification_report_imbalanced
to return a dictionary instead of a string. #770 by Guillaume Lemaitre.Added an option to generate smoothed bootstrap in
imblearn.over_sampling.RandomOverSampler
. It is controls by the parametershrinkage
. This method is also known as Random Over-Sampling Examples (ROSE). #754 by Andrea Lorenzon and Guillaume Lemaitre.
Bug fixes#
Fix a bug in
imblearn.under_sampling.ClusterCentroids
wherevoting="hard"
could have lead to select a sample from any class instead of the targeted class. #769 by Guillaume Lemaitre.Fix a bug in
imblearn.FunctionSampler
where validation was performed even withvalidate=False
when callingfit
. #790 by Guillaume Lemaitre.
Maintenance#
Remove requirements files in favour of adding the packages in the
extras_require
within thesetup.py
file. #816 by Guillaume Lemaitre.Change the website template to use
pydata-sphinx-theme
. #801 by Guillaume Lemaitre.
Deprecation#
The context manager
imblearn.utils.testing.warns
is deprecated in 0.8 and will be removed 1.0. #815 by Guillaume Lemaitre.
Version 0.7.0#
June 9, 2020
Changelog#
Maintenance#
Ensure that
imblearn.pipeline.Pipeline
is working whenmemory
is activated andjoblib==0.11
. #687 by Christos Aridas.Refactor common test to use the dev tools from
scikit-learn
0.23. #710 by Guillaume Lemaitre.Remove
FutureWarning
issued byscikit-learn
0.23. #710 by Guillaume Lemaitre.Impose keywords only argument as in
scikit-learn
. #721 by Guillaume Lemaitre.
Changed models#
The following models might give some different results due to changes:
Bug fixes#
Change the default value
min_samples_leaf
to be consistent with scikit-learn. #711 by zerolfx.Fix a bug due to change in
scikit-learn
0.23 inimblearn.metrics.make_index_balanced_accuracy
. The function was unusable. #710 by Guillaume Lemaitre.Raise a proper error message when only numerical or categorical features are given in
imblearn.over_sampling.SMOTENC
. #720 by Guillaume Lemaitre.Fix a bug when the median of the standard deviation is null in
imblearn.over_sampling.SMOTENC
. #675 by bganglia.
Enhancements#
The classifier implemented in imbalanced-learn,
imblearn.ensemble.BalancedBaggingClassifier
,imblearn.ensemble.BalancedRandomForestClassifier
,imblearn.ensemble.EasyEnsembleClassifier
, andimblearn.ensemble.RUSBoostClassifier
, acceptsampling_strategy
with the same key than iny
without the need of encodingy
in advance. #718 by Guillaume Lemaitre.Lazy import
keras
module when importingimblearn.keras
#719 by Guillaume Lemaitre.
Deprecation#
Deprecation of the parameters
n_jobs
inimblearn.under_sampling.ClusterCentroids
since it was used bysklearn.cluster.KMeans
which deprecated it. #710 by Guillaume Lemaitre.Deprecation of passing keyword argument by position similarly to
scikit-learn
. #721 by Guillaume lemaitre.
Version 0.6.2#
February 16, 2020
This is a bug-fix release to resolve some issues regarding the handling the input and the output format of the arrays.
Changelog#
Allow column vectors to be passed as targets. #673 by Christos Aridas.
Better input/output handling for pandas, numpy and plain lists. #681 by Christos Aridas.
Version 0.6.1#
December 7, 2019
This is a bug-fix release to primarily resolve some packaging issues in version 0.6.0. It also includes minor documentation improvements and some bug fixes.
Changelog#
Bug fixes#
Fix a bug in
imblearn.ensemble.BalancedRandomForestClassifier
leading to a wrong number of samples used during fitting duemax_samples
and therefore a bad computation of the OOB score. #656 by Guillaume Lemaitre.
Version 0.6.0#
December 5, 2019
Changelog#
Changed models#
The following models might give some different sampling due to changes in scikit-learn:
The following samplers will give different results due to change linked to the random state internal usage:
Bug fixes#
imblearn.under_sampling.InstanceHardnessThreshold
now take into account therandom_state
and will give deterministic results. In addition,cross_val_predict
is used to take advantage of the parallelism. #599 by Shihab Shahriar Khan.Fix a bug in
imblearn.ensemble.BalancedRandomForestClassifier
leading to a wrong computation of the OOB score. #656 by Guillaume Lemaitre.
Maintenance#
Update imports from scikit-learn after that some modules have been privatize. The following import have been changed:
sklearn.ensemble._base._set_random_states
,sklearn.ensemble._forest._parallel_build_trees
,sklearn.metrics._classification._check_targets
,sklearn.metrics._classification._prf_divide
,sklearn.utils.Bunch
,sklearn.utils._safe_indexing
,sklearn.utils._testing.assert_allclose
,sklearn.utils._testing.assert_array_equal
,sklearn.utils._testing.SkipTest
. #617 by Guillaume Lemaitre.Synchronize
imblearn.pipeline
withsklearn.pipeline
. #620 by Guillaume Lemaitre.Synchronize
imblearn.ensemble.BalancedRandomForestClassifier
and add parametersmax_samples
andccp_alpha
. #621 by Guillaume Lemaitre.
Enhancement#
imblearn.under_sampling.RandomUnderSampling
,imblearn.over_sampling.RandomOverSampling
,imblearn.datasets.make_imbalance
accepts Pandas DataFrame in and will output Pandas DataFrame. Similarly, it will accepts Pandas Series in and will output Pandas Series. #636 by Guillaume Lemaitre.imblearn.FunctionSampler
accepts a parametervalidate
allowing to check or not the inputX
andy
. #637 by Guillaume Lemaitre.imblearn.under_sampling.RandomUnderSampler
,imblearn.over_sampling.RandomOverSampler
can resample when non finite values are present inX
. #643 by Guillaume Lemaitre.All samplers will output a Pandas DataFrame if a Pandas DataFrame was given as an input. #644 by Guillaume Lemaitre.
The samples generation in
imblearn.over_sampling.ADASYN
,imblearn.over_sampling.SMOTE
,imblearn.over_sampling.BorderlineSMOTE
,imblearn.over_sampling.SVMSMOTE
,imblearn.over_sampling.KMeansSMOTE
,imblearn.over_sampling.SMOTENC
is now vectorize with giving an additional speed-up whenX
in sparse. #596 and #649 by Matt Eding.
Deprecation#
The following classes have been removed after 2 deprecation cycles:
ensemble.BalanceCascade
andensemble.EasyEnsemble
. #617 by Guillaume Lemaitre.The following functions have been removed after 2 deprecation cycles:
utils.check_ratio
. #617 by Guillaume Lemaitre.The parameter
ratio
andreturn_indices
has been removed from all samplers. #617 by Guillaume Lemaitre.The parameters
m_neighbors
,out_step
,kind
,svm_estimator
have been removed from theimblearn.over_sampling.SMOTE
. #617 by Guillaume Lemaitre.
Version 0.5.0#
June 28, 2019
Changelog#
Changed models#
The following models or function might give different results even if the
same data X
and y
are the same.
imblearn.ensemble.RUSBoostClassifier
default estimator changed fromsklearn.tree.DecisionTreeClassifier
with full depth to a decision stump (i.e., tree withmax_depth=1
).
Documentation#
Correct the definition of the ratio when using a
float
in sampling strategy for the over-sampling and under-sampling. #525 by Ariel Rossanigo.Add
imblearn.over_sampling.BorderlineSMOTE
andimblearn.over_sampling.SVMSMOTE
in the API documenation. #530 by Guillaume Lemaitre.
Enhancement#
Add Parallelisation for SMOTEENN and SMOTETomek. #547 by Michael Hsieh.
Add
imblearn.utils._show_versions
. Updated the contribution guide and issue template showing how to print system and dependency information from the command line. #557 by Alexander L. Hayes.Add
imblearn.over_sampling.KMeansSMOTE
which is an over-sampler clustering points before to apply SMOTE. #435 by Stephan Heijl.
Maintenance#
Make it possible to
import imblearn
and access submodule. #500 by Guillaume Lemaitre.Remove support for Python 2, remove deprecation warning from scikit-learn 0.21. #576 by Guillaume Lemaitre.
Bug#
Fix wrong usage of
keras.layers.BatchNormalization
inporto_seguro_keras_under_sampling.py
example. The batch normalization was moved before the activation function and the bias was removed from the dense layer. #531 by Guillaume Lemaitre.Fix bug which converting to COO format sparse when stacking the matrices in
imblearn.over_sampling.SMOTENC
. This bug was only old scipy version. #539 by Guillaume Lemaitre.Fix bug in
imblearn.pipeline.Pipeline
where None could be the final estimator. #554 by Oliver Rausch.Fix bug in
imblearn.over_sampling.SVMSMOTE
andimblearn.over_sampling.BorderlineSMOTE
where the default parameter ofn_neighbors
was not set properly. #578 by Guillaume Lemaitre.Fix bug by changing the default depth in
imblearn.ensemble.RUSBoostClassifier
to get a decision stump as a weak learner as in the original paper. #545 by Christos Aridas.Allow to import
keras
directly fromtensorflow
in theimblearn.keras
. #531 by Guillaume Lemaitre.
Version 0.4.2#
October 21, 2018
Changelog#
Bug fixes#
Fix a bug in
imblearn.over_sampling.SMOTENC
in which the the median of the standard deviation instead of half of the median of the standard deviation. By Guillaume Lemaitre in #491.Raise an error when passing target which is not supported, i.e. regression target or multilabel targets. Imbalanced-learn does not support this case. By Guillaume Lemaitre in #490.
Fix a bug in
imblearn.over_sampling.SMOTENC
in which a sparse matrices were densify duringinverse_transform
. By Guillaume Lemaitre in #495.Fix a bug in
imblearn.over_sampling.SMOTE_NC
in which a the tie breaking was wrongly sampling. By Guillaume Lemaitre in #497.
Version 0.4#
October 12, 2018
Warning
Version 0.4 is the last version of imbalanced-learn to support Python 2.7 and Python 3.4. Imbalanced-learn 0.5 will require Python 3.5 or higher.
Highlights#
This release brings its set of new feature as well as some API changes to strengthen the foundation of imbalanced-learn.
As new feature, 2 new modules imblearn.keras
and
imblearn.tensorflow
have been added in which imbalanced-learn samplers
can be used to generate balanced mini-batches.
The module imblearn.ensemble
has been consolidated with new classifier:
imblearn.ensemble.BalancedRandomForestClassifier
,
imblearn.ensemble.EasyEnsembleClassifier
,
imblearn.ensemble.RUSBoostClassifier
.
Support for string has been added in
imblearn.over_sampling.RandomOverSampler
and
imblearn.under_sampling.RandomUnderSampler
. In addition, a new class
imblearn.over_sampling.SMOTENC
allows to generate sample with data
sets containing both continuous and categorical features.
The imblearn.over_sampling.SMOTE
has been simplified and break down
to 2 additional classes:
imblearn.over_sampling.SVMSMOTE
and
imblearn.over_sampling.BorderlineSMOTE
.
There is also some changes regarding the API:
the parameter sampling_strategy
has been introduced to replace the
ratio
parameter. In addition, the return_indices
argument has been
deprecated and all samplers will exposed a sample_indices_
whenever this is
possible.
Changelog#
API#
Replace the parameter
ratio
bysampling_strategy
. #411 by Guillaume Lemaitre.Enable to use a
float
with binary classification forsampling_strategy
. #411 by Guillaume Lemaitre.Enable to use a
list
for the cleaning methods to specify the class to sample. #411 by Guillaume Lemaitre.Replace
fit_sample
byfit_resample
. An alias is still available for backward compatibility. In addition,sample
has been removed to avoid resampling on different set of data. #462 by Guillaume Lemaitre.
New features#
Add a
keras
andtensorflow
modules to create balanced mini-batches generator. #409 by Guillaume Lemaitre.Add
imblearn.ensemble.EasyEnsembleClassifier
which create a bag of AdaBoost classifier trained on balanced bootstrap samples. #455 by Guillaume Lemaitre.Add
imblearn.ensemble.BalancedRandomForestClassifier
which balanced each bootstrap provided to each tree of the forest. #459 by Guillaume Lemaitre.Add
imblearn.ensemble.RUSBoostClassifier
which applied a random under-sampling stage before each boosting iteration of AdaBoost. #469 by Guillaume Lemaitre.Add
imblern.over_sampling.SMOTENC
which generate synthetic samples on data set with heterogeneous data type (continuous and categorical features). #412 by Denis Dudnik and Guillaume Lemaitre.
Enhancement#
Add a documentation node to create a balanced random forest from a balanced bagging classifier. #372 by Guillaume Lemaitre.
Document the metrics to evaluate models on imbalanced dataset. #367 by Guillaume Lemaitre.
Add support for one-vs-all encoded target to support keras. #409 by Guillaume Lemaitre.
Adding specific class for borderline and SVM SMOTE using
BorderlineSMOTE
andSVMSMOTE
. #440 by Guillaume Lemaitre.Allow
imblearn.over_sampling.RandomOverSampler
can return indices using the attributesreturn_indices
. #439 by Hugo Gascon and Guillaume Lemaitre.Allow
imblearn.under_sampling.RandomUnderSampler
andimblearn.over_sampling.RandomOverSampler
to sample object array containing strings. #451 by Guillaume Lemaitre.
Bug fixes#
Fix bug in
metrics.classification_report_imbalanced
for whichy_pred
andy_true
where inversed. #394 by @Ole Silvig <klizter>.Fix bug in ADASYN to consider only samples from the current class when generating new samples. #354 by Guillaume Lemaitre.
Fix bug which allow for sorted behavior of
sampling_strategy
dictionary and thus to obtain a deterministic results when using the same random state. #447 by Guillaume Lemaitre.Force to clone scikit-learn estimator passed as attributes to samplers. #446 by Guillaume Lemaitre.
Fix bug which was not preserving the dtype of X and y when generating samples. #450 by Guillaume Lemaitre.
Add the option to pass a
Memory
object tomake_pipeline
like inpipeline.Pipeline
class. #458 by Christos Aridas.
Maintenance#
Remove deprecated parameters in 0.2 - #331 by Guillaume Lemaitre.
Make some modules private. #452 by Guillaume Lemaitre.
Upgrade requirements to scikit-learn 0.20. #379 by Guillaume Lemaitre.
Catch deprecation warning in testing. #441 by Guillaume Lemaitre.
Refactor and impose
pytest
style tests. #470 by Guillaume Lemaitre.
Documentation#
Remove some docstring which are not necessary. #454 by Guillaume Lemaitre.
Fix the documentation of the
sampling_strategy
parameters when used as a float. #480 by Guillaume Lemaitre.
Deprecation#
Deprecate
ratio
in favor ofsampling_strategy
. #411 by Guillaume Lemaitre.Deprecate the use of a
dict
for cleaning methods. alist
should be used. #411 by Guillaume Lemaitre.Deprecate
random_state
inimblearn.under_sampling.NearMiss
,imblearn.under_sampling.EditedNearestNeighbors
,imblearn.under_sampling.RepeatedEditedNearestNeighbors
,imblearn.under_sampling.AllKNN
,imblearn.under_sampling.NeighbourhoodCleaningRule
,imblearn.under_sampling.InstanceHardnessThreshold
,imblearn.under_sampling.CondensedNearestNeighbours
.Deprecate
kind
,out_step
,svm_estimator
,m_neighbors
inimblearn.over_sampling.SMOTE
. User should useimblearn.over_sampling.SVMSMOTE
andimblearn.over_sampling.BorderlineSMOTE
. #440 by Guillaume Lemaitre.Deprecate
imblearn.ensemble.EasyEnsemble
in favor of meta-estimatorimblearn.ensemble.EasyEnsembleClassifier
which follow the exact algorithm described in the literature. #455 by Guillaume Lemaitre.Deprecate
imblearn.ensemble.BalanceCascade
. #472 by Guillaume Lemaitre.Deprecate
return_indices
in all samplers. Instead, an attributesample_indices_
is created whenever the sampler is selecting a subset of the original samples. #474 by @Guillaume Lemaitre <glemaitre.
Version 0.3#
February 22, 2018
Changelog#
Pytest is used instead of nosetests. #321 by Joan Massich.
Added a User Guide and extended some examples. #295 by Guillaume Lemaitre.
Fixed a bug in
utils.check_ratio
such that an error is raised when the number of samples required is negative. #312 by Guillaume Lemaitre.Fixed a bug in
under_sampling.NearMiss
version 3. The indices returned were wrong. #312 by Guillaume Lemaitre.Fixed bug for
ensemble.BalanceCascade
andcombine.SMOTEENN
andSMOTETomek
. #295 by Guillaume Lemaitre.Fixed bug for
check_ratio
to be able to pass arguments whenratio
is a callable. #307 by Guillaume Lemaitre.
Turn off steps in
pipeline.Pipeline
using theNone
object. By Christos Aridas.Add a fetching function
datasets.fetch_datasets
in order to get some imbalanced datasets useful for benchmarking. #249 by Guillaume Lemaitre.
All samplers accepts sparse matrices with defaulting on CSR type. #316 by Guillaume Lemaitre.
datasets.make_imbalance
take a ratio similarly to other samplers. It supports multiclass. #312 by Guillaume Lemaitre.All the unit tests have been factorized and a
utils.check_estimators
has been derived from scikit-learn. By Guillaume Lemaitre.Script for automatic build of conda packages and uploading. #242 by Guillaume Lemaitre
Remove seaborn dependence and improve the examples. #264 by Guillaume Lemaitre.
adapt all classes to multi-class resampling. #290 by Guillaume Lemaitre
__init__
has been removed from thebase.SamplerMixin
to create a real mixin class. #242 by Guillaume Lemaitre.creation of a module
exceptions
to handle consistant raising of errors. #242 by Guillaume Lemaitre.creation of a module
utils.validation
to make checking of recurrent patterns. #242 by Guillaume Lemaitre.move the under-sampling methods in
prototype_selection
andprototype_generation
submodule to make a clearer dinstinction. #277 by Guillaume Lemaitre.change
ratio
such that it can adapt to multiple class problems. #290 by Guillaume Lemaitre.
Deprecation of the use of
min_c_
indatasets.make_imbalance
. #312 by Guillaume LemaitreDeprecation of the use of float in
datasets.make_imbalance
for the ratio parameter. #290 by Guillaume Lemaitre.deprecate the use of float as ratio in favor of dictionary, string, or callable. #290 by Guillaume Lemaitre.
Version 0.2#
January 1, 2017
Changelog#
Fixed a bug in
under_sampling.NearMiss
which was not picking the right samples during under sampling for the method 3. By Guillaume Lemaitre.Fixed a bug in
ensemble.EasyEnsemble
, correction of therandom_state
generation. By Guillaume Lemaitre and Christos Aridas.Fixed a bug in
under_sampling.RepeatedEditedNearestNeighbours
, add additional stopping criterion to avoid that the minority class become a majority class or that a class disappear. By Guillaume Lemaitre.Fixed a bug in
under_sampling.AllKNN
, add stopping criteria to avoid that the minority class become a majority class or that a class disappear. By Guillaume Lemaitre.Fixed a bug in
under_sampling.CondensedNeareastNeigbour
, correction of the list of indices returned. By Guillaume Lemaitre.Fixed a bug in
ensemble.BalanceCascade
, solve the issue to obtain a single array if desired. By Guillaume Lemaitre.Fixed a bug in
pipeline.Pipeline
, solve to embedPipeline
in otherPipeline
. #231 by Christos Aridas.Fixed a bug in
pipeline.Pipeline
, solve the issue to put to sampler in the samePipeline
. #188 by Christos Aridas.Fixed a bug in
under_sampling.CondensedNeareastNeigbour
, correction of the shape ofsel_x
when only one sample is selected. By Aliaksei Halachkin.Fixed a bug in
under_sampling.NeighbourhoodCleaningRule
, selecting neighbours instead of minority class misclassified samples. #230 by Aleksandr Loskutov.Fixed a bug in
over_sampling.ADASYN
, correction of the creation of a new sample so that the new sample lies between the minority sample and the nearest neighbour. #235 by Rafael Wampfler.
Added AllKNN under sampling technique. By Dayvid Oliveira.
Added a module
metrics
implementing some specific scoring function for the problem of balancing. #204 by Guillaume Lemaitre and Christos Aridas.
Added support for bumpversion. By Guillaume Lemaitre.
Validate the type of target in binary samplers. A warning is raised for the moment. By Guillaume Lemaitre and Christos Aridas.
Change from
cross_validation
module tomodel_selection
module forsklearn
deprecation cycle. By Dayvid Oliveira and Christos Aridas.
size_ngh
has been deprecated incombine.SMOTEENN
. Usen_neighbors
instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.size_ngh
has been deprecated inunder_sampling.EditedNearestNeighbors
. Usen_neighbors
instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.size_ngh
has been deprecated inunder_sampling.CondensedNeareastNeigbour
. Usen_neighbors
instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.size_ngh
has been deprecated inunder_sampling.OneSidedSelection
. Usen_neighbors
instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.size_ngh
has been deprecated inunder_sampling.NeighbourhoodCleaningRule
. Usen_neighbors
instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.size_ngh
has been deprecated inunder_sampling.RepeatedEditedNearestNeighbours
. Usen_neighbors
instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.size_ngh
has been deprecated inunder_sampling.AllKNN
. Usen_neighbors
instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.Two base classes
BaseBinaryclassSampler
andBaseMulticlassSampler
have been created to handle the target type and raise warning in case of abnormality. By Guillaume Lemaitre and Christos Aridas.Move
random_state
to be assigned in theSamplerMixin
initialization. By Guillaume Lemaitre.Provide estimators instead of parameters in
combine.SMOTEENN
andcombine.SMOTETomek
. Therefore, the list of parameters have been deprecated. By Guillaume Lemaitre and Christos Aridas.k
has been deprecated inover_sampling.ADASYN
. Usen_neighbors
instead. #183 by Guillaume Lemaitre.k
andm
have been deprecated inover_sampling.SMOTE
. Usek_neighbors
andm_neighbors
instead. #182 by Guillaume Lemaitre.n_neighbors
acceptKNeighborsMixin
based object forunder_sampling.EditedNearestNeighbors
,under_sampling.CondensedNeareastNeigbour
,under_sampling.NeighbourhoodCleaningRule
,under_sampling.RepeatedEditedNearestNeighbours
, andunder_sampling.AllKNN
. #109 by Guillaume Lemaitre.
Replace some remaining
UnbalancedDataset
occurences. By Francois Magimel.Added doctest in the documentation. By Guillaume Lemaitre.
Version 0.1#
December 26, 2016
Changelog#
First release of the stable API. By :user;`Fernando Nogueira <fmfn>`, Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.
- Under-sampling
Random majority under-sampling with replacement
Extraction of majority-minority Tomek links
Under-sampling with Cluster Centroids
NearMiss-(1 & 2 & 3)
Condensend Nearest Neighbour
One-Sided Selection
Neighboorhood Cleaning Rule
Edited Nearest Neighbours
Instance Hardness Threshold
Repeated Edited Nearest Neighbours
- Over-sampling
Random minority over-sampling with replacement
SMOTE - Synthetic Minority Over-sampling Technique
bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2
SVM SMOTE - Support Vectors SMOTE
ADASYN - Adaptive synthetic sampling approach for imbalanced learning
- Over-sampling followed by under-sampling
SMOTE + Tomek links
SMOTE + ENN
- Ensemble sampling
EasyEnsemble
BalanceCascade