Release history#

Version 0.12.4#

October 4, 2024

Changelog#

Compatibility#

Compatibility with NumPy 2.0+ #1097 by Guillaume Lemaitre.

Version 0.12.3#

May 28, 2024

Changelog#

Compatibility#

Compatibility with scikit-learn 1.5 #1074 and #1084 by Guillaume Lemaitre.

Version 0.12.2#

March 31, 2024

Changelog#

Bug fixes#

Fix the way we check for a specific Python version in the test suite. #1075 by Guillaume Lemaitre.

Version 0.12.1#

March 31, 2024

Changelog#

Bug fixes#

Fix a bug in InstanceHardnessThreshold where estimator could not be a Pipeline object. #1049 by Gonenc Mogol.

Compatibility#

Do not use distutils in tests due to deprecation. #1065 by Michael R. Crusoe.
Fix the scikit-learn import in tests to be compatible with version 1.4.1.post1. #1073 by Guillaume Lemaitre.
Fix test to be compatible with Python 3.13. #1073 by Guillaume Lemaitre.

Version 0.12.0#

January 24, 2024

Changelog#

Bug fixes#

Fix a bug in SMOTENC where the entries of the one-hot encoding should be divided by sqrt(2) and not 2, taking into account that they are plugged into an Euclidean distance computation. #1014 by Guillaume Lemaitre.
Raise an informative error message when all support vectors are tagged as noise in SVMSMOTE. #1016 by Guillaume Lemaitre.
Fix a bug in SMOTENC where the median of standard deviation of the continuous features was only computed on the minority class. Now, we are computing this statistic for each class that is up-sampled. #1015 by Guillaume Lemaitre.
Fix a bug in SMOTENC such that the case where the median of standard deviation of the continuous features is null is handled in the multiclass case as well. #1015 by Guillaume Lemaitre.
Fix a bug in BorderlineSMOTE version 2 where samples should be generated from the whole dataset and not only from the minority class. #1023 by Guillaume Lemaitre.
Fix a bug in NeighbourhoodCleaningRule where the kind_sel="all" was not working as explained in the literature. #1012 by Guillaume Lemaitre.
Fix a bug in NeighbourhoodCleaningRule where the threshold_cleaning ratio was multiplied on the total number of samples instead of the number of samples in the minority class. #1012 by Guillaume Lemaitre.
Fix a bug in RandomUnderSampler and RandomOverSampler where a column containing only NaT was not handled correctly. #1059 by Guillaume Lemaitre.

Compatibility#

BalancedRandomForestClassifier now support missing values and monotonic constraints if scikit-learn >= 1.4 is installed.
Pipeline support metadata routing if scikit-learn >= 1.4 is installed.
Compatibility with scikit-learn 1.4. #1058 by Guillaume Lemaitre.

Deprecations#

Deprecate estimator_ argument in favor of estimators_ for the classes CondensedNearestNeighbour and OneSidedSelection. estimator_ will be removed in 0.14. #1011 by Guillaume Lemaitre.
Deprecate kind_sel in :pr:`1012 by Guillaume Lemaitre.

Enhancements#

Allows to output dataframe with sparse format if provided as input. #1059 by ts2095.

Version 0.11.0#

July 8, 2023

Changelog#

Bug fixes#

Fix a bug in classification_report_imbalanced where the parameter target_names was not taken into account when output_dict=True. #989 by AYY7.
SMOTENC now handles mix types of data type such as bool and pd.category by delegating the conversion to scikit-learn encoder. #1002 by Guillaume Lemaitre.
Handle sparse matrices in SMOTEN and raise a warning since it requires a conversion to dense matrices. #1003 by Guillaume Lemaitre.
Remove spurious warning raised when minority class get over-sampled more than the number of sample in the majority class. #1007 by Guillaume Lemaitre.

Compatibility#

Maintenance release for being compatible with scikit-learn >= 1.3.0. #999 by Guillaume Lemaitre.

Deprecation#

The fitted attribute ohe_ in SMOTENC is deprecated and will be removed in version 0.13. Use categorical_encoder_ instead. #1000 by Guillaume Lemaitre.
The default of the parameters sampling_strategy, bootstrap and replacement will change in BalancedRandomForestClassifier to follow the implementation of the original paper. This changes will take effect in version 0.13. #1006 by Guillaume Lemaitre.

Enhancements#

SMOTENC now accepts a parameter categorical_encoder allowing to specify a OneHotEncoder with custom parameters. #1000 by Guillaume Lemaitre.
SMOTEN now accepts a parameter categorical_encoder allowing to specify a OrdinalEncoder with custom parameters. A new fitted parameter categorical_encoder_ is exposed to access the fitted encoder. #1001 by Guillaume Lemaitre.
RandomUnderSampler and RandomOverSampler (when shrinkage is not None) now accept any data types and will not attempt any data conversion. #1004 by Guillaume Lemaitre.
SMOTENC now support passing array-like of str when passing the categorical_features parameter. #1008 by :user`Guillaume Lemaitre <glemaitre>`.
SMOTENC now support automatic categorical inference when categorical_features is set to "auto". #1009 by :user`Guillaume Lemaitre <glemaitre>`.

Version 0.10.1#

December 28, 2022

Changelog#

Bug fixes#

Fix a regression in over-sampler where the string minority was rejected as an unvalid sampling strategy. #964 by Prakhyath Bhandary.

Version 0.10.0#

December 9, 2022

Changelog#

Bug fixes#

Make sure that Substitution is working with python -OO that replace __doc__ by None. #953 bu Guillaume Lemaitre.

Compatibility#

Maintenance release for be compatible with scikit-learn >= 1.0.2. #946, #947, #949 by Guillaume Lemaitre.
Add support for automatic parameters validation as in scikit-learn >= 1.2. #955 by Guillaume Lemaitre.
Add support for feature_names_in_ as well as get_feature_names_out for all samplers. #959 by Guillaume Lemaitre.

Deprecation#

The parameter n_jobs has been deprecated from the classes ADASYN, BorderlineSMOTE, SMOTE, SMOTENC, SMOTEN, and SVMSMOTE. Instead, pass a nearest neighbors estimator where n_jobs is set. #887 by Guillaume Lemaitre.
The parameter base_estimator is deprecated and will be removed in version 0.12. It is impacted the following classes: BalancedBaggingClassifier, EasyEnsembleClassifier, RUSBoostClassifier. #946 by Guillaume Lemaitre.

Enhancements#

Add support to accept compatible NearestNeighbors objects by only duck-typing. For instance, it allows to accept cuML instances. #858 by NV-jpt and Guillaume Lemaitre.

Version 0.9.1#

May 16, 2022

Changelog#

This release provides fixes that make imbalanced-learn works with the latest release (1.1.0) of scikit-learn.

Version 0.9.0#

January 11, 2022

Changelog#

This release is mainly providing fixes that make imbalanced-learn works with the latest release (1.0.2) of scikit-learn.

Version 0.8.1#

September 29, 2020

Changelog#

Maintenance#

Make imbalanced-learn compatible with scikit-learn 1.0. #864 by Guillaume Lemaitre.

Version 0.8.0#

February 18, 2021

Changelog#

New features#

Add the the function imblearn.metrics.macro_averaged_mean_absolute_error returning the average across class of the MAE. This metric is used in ordinal classification. #780 by Aurélien Massiot.
Add the class imblearn.metrics.pairwise.ValueDifferenceMetric to compute pairwise distances between samples containing only categorical values. #796 by Guillaume Lemaitre.
Add the class imblearn.over_sampling.SMOTEN to over-sample data only containing categorical features. #802 by Guillaume Lemaitre.
Add the possibility to pass any type of samplers in imblearn.ensemble.BalancedBaggingClassifier unlocking the implementation of methods based on resampled bagging. #808 by Guillaume Lemaitre.

Enhancements#

Add option output_dict in imblearn.metrics.classification_report_imbalanced to return a dictionary instead of a string. #770 by Guillaume Lemaitre.
Added an option to generate smoothed bootstrap in imblearn.over_sampling.RandomOverSampler. It is controls by the parameter shrinkage. This method is also known as Random Over-Sampling Examples (ROSE). #754 by Andrea Lorenzon and Guillaume Lemaitre.

Bug fixes#

Fix a bug in imblearn.under_sampling.ClusterCentroids where voting="hard" could have lead to select a sample from any class instead of the targeted class. #769 by Guillaume Lemaitre.
Fix a bug in imblearn.FunctionSampler where validation was performed even with validate=False when calling fit. #790 by Guillaume Lemaitre.

Maintenance#

Remove requirements files in favour of adding the packages in the extras_require within the setup.py file. #816 by Guillaume Lemaitre.
Change the website template to use pydata-sphinx-theme. #801 by Guillaume Lemaitre.

Deprecation#

The context manager imblearn.utils.testing.warns is deprecated in 0.8 and will be removed 1.0. #815 by Guillaume Lemaitre.

Version 0.7.0#

June 9, 2020

Changelog#

Maintenance#

Ensure that imblearn.pipeline.Pipeline is working when memory is activated and joblib==0.11. #687 by Christos Aridas.
Refactor common test to use the dev tools from scikit-learn 0.23. #710 by Guillaume Lemaitre.
Remove FutureWarning issued by scikit-learn 0.23. #710 by Guillaume Lemaitre.
Impose keywords only argument as in scikit-learn. #721 by Guillaume Lemaitre.

Changed models#

The following models might give some different results due to changes:

imblearn.ensemble.BalancedRandomForestClassifier

Bug fixes#

Change the default value min_samples_leaf to be consistent with scikit-learn. #711 by zerolfx.
Fix a bug due to change in scikit-learn 0.23 in imblearn.metrics.make_index_balanced_accuracy. The function was unusable. #710 by Guillaume Lemaitre.
Raise a proper error message when only numerical or categorical features are given in imblearn.over_sampling.SMOTENC. #720 by Guillaume Lemaitre.
Fix a bug when the median of the standard deviation is null in imblearn.over_sampling.SMOTENC. #675 by bganglia.

Enhancements#

The classifier implemented in imbalanced-learn, imblearn.ensemble.BalancedBaggingClassifier, imblearn.ensemble.BalancedRandomForestClassifier, imblearn.ensemble.EasyEnsembleClassifier, and imblearn.ensemble.RUSBoostClassifier, accept sampling_strategy with the same key than in y without the need of encoding y in advance. #718 by Guillaume Lemaitre.
Lazy import keras module when importing imblearn.keras #719 by Guillaume Lemaitre.

Deprecation#

Deprecation of the parameters n_jobs in imblearn.under_sampling.ClusterCentroids since it was used by sklearn.cluster.KMeans which deprecated it. #710 by Guillaume Lemaitre.
Deprecation of passing keyword argument by position similarly to scikit-learn. #721 by Guillaume lemaitre.

Version 0.6.2#

February 16, 2020

This is a bug-fix release to resolve some issues regarding the handling the input and the output format of the arrays.

Changelog#

Allow column vectors to be passed as targets. #673 by Christos Aridas.
Better input/output handling for pandas, numpy and plain lists. #681 by Christos Aridas.

Version 0.6.1#

December 7, 2019

This is a bug-fix release to primarily resolve some packaging issues in version 0.6.0. It also includes minor documentation improvements and some bug fixes.

Changelog#

Bug fixes#

Fix a bug in imblearn.ensemble.BalancedRandomForestClassifier leading to a wrong number of samples used during fitting due max_samples and therefore a bad computation of the OOB score. #656 by Guillaume Lemaitre.

Version 0.6.0#

December 5, 2019

Changelog#

Changed models#

The following models might give some different sampling due to changes in scikit-learn:

The following samplers will give different results due to change linked to the random state internal usage:

Bug fixes#

imblearn.under_sampling.InstanceHardnessThreshold now take into account the random_state and will give deterministic results. In addition, cross_val_predict is used to take advantage of the parallelism. #599 by Shihab Shahriar Khan.
Fix a bug in imblearn.ensemble.BalancedRandomForestClassifier leading to a wrong computation of the OOB score. #656 by Guillaume Lemaitre.

Maintenance#

Update imports from scikit-learn after that some modules have been privatize. The following import have been changed: sklearn.ensemble._base._set_random_states, sklearn.ensemble._forest._parallel_build_trees, sklearn.metrics._classification._check_targets, sklearn.metrics._classification._prf_divide, sklearn.utils.Bunch, sklearn.utils._safe_indexing, sklearn.utils._testing.assert_allclose, sklearn.utils._testing.assert_array_equal, sklearn.utils._testing.SkipTest. #617 by Guillaume Lemaitre.
Synchronize imblearn.pipeline with sklearn.pipeline. #620 by Guillaume Lemaitre.
Synchronize imblearn.ensemble.BalancedRandomForestClassifier and add parameters max_samples and ccp_alpha. #621 by Guillaume Lemaitre.

Enhancement#

imblearn.under_sampling.RandomUnderSampling, imblearn.over_sampling.RandomOverSampling, imblearn.datasets.make_imbalance accepts Pandas DataFrame in and will output Pandas DataFrame. Similarly, it will accepts Pandas Series in and will output Pandas Series. #636 by Guillaume Lemaitre.
imblearn.FunctionSampler accepts a parameter validate allowing to check or not the input X and y. #637 by Guillaume Lemaitre.
imblearn.under_sampling.RandomUnderSampler, imblearn.over_sampling.RandomOverSampler can resample when non finite values are present in X. #643 by Guillaume Lemaitre.
All samplers will output a Pandas DataFrame if a Pandas DataFrame was given as an input. #644 by Guillaume Lemaitre.
The samples generation in imblearn.over_sampling.ADASYN, imblearn.over_sampling.SMOTE, imblearn.over_sampling.BorderlineSMOTE, imblearn.over_sampling.SVMSMOTE, imblearn.over_sampling.KMeansSMOTE, imblearn.over_sampling.SMOTENC is now vectorize with giving an additional speed-up when X in sparse. #596 and #649 by Matt Eding.

Deprecation#

The following classes have been removed after 2 deprecation cycles: ensemble.BalanceCascade and ensemble.EasyEnsemble. #617 by Guillaume Lemaitre.
The following functions have been removed after 2 deprecation cycles: utils.check_ratio. #617 by Guillaume Lemaitre.
The parameter ratio and return_indices has been removed from all samplers. #617 by Guillaume Lemaitre.
The parameters m_neighbors, out_step, kind, svm_estimator have been removed from the imblearn.over_sampling.SMOTE. #617 by Guillaume Lemaitre.

Version 0.5.0#

June 28, 2019

Changelog#

Changed models#

The following models or function might give different results even if the same data X and y are the same.

imblearn.ensemble.RUSBoostClassifier default estimator changed from sklearn.tree.DecisionTreeClassifier with full depth to a decision stump (i.e., tree with max_depth=1).

Documentation#

Correct the definition of the ratio when using a float in sampling strategy for the over-sampling and under-sampling. #525 by Ariel Rossanigo.
Add imblearn.over_sampling.BorderlineSMOTE and imblearn.over_sampling.SVMSMOTE in the API documenation. #530 by Guillaume Lemaitre.

Enhancement#

Add Parallelisation for SMOTEENN and SMOTETomek. #547 by Michael Hsieh.
Add imblearn.utils._show_versions. Updated the contribution guide and issue template showing how to print system and dependency information from the command line. #557 by Alexander L. Hayes.
Add imblearn.over_sampling.KMeansSMOTE which is an over-sampler clustering points before to apply SMOTE. #435 by Stephan Heijl.

Maintenance#

Make it possible to import imblearn and access submodule. #500 by Guillaume Lemaitre.
Remove support for Python 2, remove deprecation warning from scikit-learn 0.21. #576 by Guillaume Lemaitre.

Bug#

Fix wrong usage of keras.layers.BatchNormalization in porto_seguro_keras_under_sampling.py example. The batch normalization was moved before the activation function and the bias was removed from the dense layer. #531 by Guillaume Lemaitre.
Fix bug which converting to COO format sparse when stacking the matrices in imblearn.over_sampling.SMOTENC. This bug was only old scipy version. #539 by Guillaume Lemaitre.
Fix bug in imblearn.pipeline.Pipeline where None could be the final estimator. #554 by Oliver Rausch.
Fix bug in imblearn.over_sampling.SVMSMOTE and imblearn.over_sampling.BorderlineSMOTE where the default parameter of n_neighbors was not set properly. #578 by Guillaume Lemaitre.
Fix bug by changing the default depth in imblearn.ensemble.RUSBoostClassifier to get a decision stump as a weak learner as in the original paper. #545 by Christos Aridas.
Allow to import keras directly from tensorflow in the imblearn.keras. #531 by Guillaume Lemaitre.

Version 0.4.2#

October 21, 2018

Changelog#

Bug fixes#

Fix a bug in imblearn.over_sampling.SMOTENC in which the the median of the standard deviation instead of half of the median of the standard deviation. By Guillaume Lemaitre in #491.
Raise an error when passing target which is not supported, i.e. regression target or multilabel targets. Imbalanced-learn does not support this case. By Guillaume Lemaitre in #490.
Fix a bug in imblearn.over_sampling.SMOTENC in which a sparse matrices were densify during inverse_transform. By Guillaume Lemaitre in #495.
Fix a bug in imblearn.over_sampling.SMOTE_NC in which a the tie breaking was wrongly sampling. By Guillaume Lemaitre in #497.

Version 0.4#

October 12, 2018

Warning

Version 0.4 is the last version of imbalanced-learn to support Python 2.7 and Python 3.4. Imbalanced-learn 0.5 will require Python 3.5 or higher.

Highlights#

This release brings its set of new feature as well as some API changes to strengthen the foundation of imbalanced-learn.

As new feature, 2 new modules imblearn.keras and imblearn.tensorflow have been added in which imbalanced-learn samplers can be used to generate balanced mini-batches.

The module imblearn.ensemble has been consolidated with new classifier: imblearn.ensemble.BalancedRandomForestClassifier, imblearn.ensemble.EasyEnsembleClassifier, imblearn.ensemble.RUSBoostClassifier.

Support for string has been added in imblearn.over_sampling.RandomOverSampler and imblearn.under_sampling.RandomUnderSampler. In addition, a new class imblearn.over_sampling.SMOTENC allows to generate sample with data sets containing both continuous and categorical features.

The imblearn.over_sampling.SMOTE has been simplified and break down to 2 additional classes: imblearn.over_sampling.SVMSMOTE and imblearn.over_sampling.BorderlineSMOTE.

There is also some changes regarding the API: the parameter sampling_strategy has been introduced to replace the ratio parameter. In addition, the return_indices argument has been deprecated and all samplers will exposed a sample_indices_ whenever this is possible.

Changelog#

API#

Replace the parameter ratio by sampling_strategy. #411 by Guillaume Lemaitre.
Enable to use a float with binary classification for sampling_strategy. #411 by Guillaume Lemaitre.
Enable to use a list for the cleaning methods to specify the class to sample. #411 by Guillaume Lemaitre.
Replace fit_sample by fit_resample. An alias is still available for backward compatibility. In addition, sample has been removed to avoid resampling on different set of data. #462 by Guillaume Lemaitre.

New features#

Add a keras and tensorflow modules to create balanced mini-batches generator. #409 by Guillaume Lemaitre.
Add imblearn.ensemble.EasyEnsembleClassifier which create a bag of AdaBoost classifier trained on balanced bootstrap samples. #455 by Guillaume Lemaitre.
Add imblearn.ensemble.BalancedRandomForestClassifier which balanced each bootstrap provided to each tree of the forest. #459 by Guillaume Lemaitre.
Add imblearn.ensemble.RUSBoostClassifier which applied a random under-sampling stage before each boosting iteration of AdaBoost. #469 by Guillaume Lemaitre.
Add imblern.over_sampling.SMOTENC which generate synthetic samples on data set with heterogeneous data type (continuous and categorical features). #412 by Denis Dudnik and Guillaume Lemaitre.

Enhancement#

Add a documentation node to create a balanced random forest from a balanced bagging classifier. #372 by Guillaume Lemaitre.
Document the metrics to evaluate models on imbalanced dataset. #367 by Guillaume Lemaitre.
Add support for one-vs-all encoded target to support keras. #409 by Guillaume Lemaitre.
Adding specific class for borderline and SVM SMOTE using BorderlineSMOTE and SVMSMOTE. #440 by Guillaume Lemaitre.
Allow imblearn.over_sampling.RandomOverSampler can return indices using the attributes return_indices. #439 by Hugo Gascon and Guillaume Lemaitre.
Allow imblearn.under_sampling.RandomUnderSampler and imblearn.over_sampling.RandomOverSampler to sample object array containing strings. #451 by Guillaume Lemaitre.

Bug fixes#

Fix bug in metrics.classification_report_imbalanced for which y_pred and y_true where inversed. #394 by @Ole Silvig <klizter>.
Fix bug in ADASYN to consider only samples from the current class when generating new samples. #354 by Guillaume Lemaitre.
Fix bug which allow for sorted behavior of sampling_strategy dictionary and thus to obtain a deterministic results when using the same random state. #447 by Guillaume Lemaitre.
Force to clone scikit-learn estimator passed as attributes to samplers. #446 by Guillaume Lemaitre.
Fix bug which was not preserving the dtype of X and y when generating samples. #450 by Guillaume Lemaitre.
Add the option to pass a Memory object to make_pipeline like in pipeline.Pipeline class. #458 by Christos Aridas.

Maintenance#

Remove deprecated parameters in 0.2 - #331 by Guillaume Lemaitre.
Make some modules private. #452 by Guillaume Lemaitre.
Upgrade requirements to scikit-learn 0.20. #379 by Guillaume Lemaitre.
Catch deprecation warning in testing. #441 by Guillaume Lemaitre.
Refactor and impose pytest style tests. #470 by Guillaume Lemaitre.

Documentation#

Remove some docstring which are not necessary. #454 by Guillaume Lemaitre.
Fix the documentation of the sampling_strategy parameters when used as a float. #480 by Guillaume Lemaitre.

Deprecation#

Deprecate ratio in favor of sampling_strategy. #411 by Guillaume Lemaitre.
Deprecate the use of a dict for cleaning methods. a list should be used. #411 by Guillaume Lemaitre.
Deprecate random_state in imblearn.under_sampling.NearMiss, imblearn.under_sampling.EditedNearestNeighbors, imblearn.under_sampling.RepeatedEditedNearestNeighbors, imblearn.under_sampling.AllKNN, imblearn.under_sampling.NeighbourhoodCleaningRule, imblearn.under_sampling.InstanceHardnessThreshold, imblearn.under_sampling.CondensedNearestNeighbours.
Deprecate kind, out_step, svm_estimator, m_neighbors in imblearn.over_sampling.SMOTE. User should use imblearn.over_sampling.SVMSMOTE and imblearn.over_sampling.BorderlineSMOTE. #440 by Guillaume Lemaitre.
Deprecate imblearn.ensemble.EasyEnsemble in favor of meta-estimator imblearn.ensemble.EasyEnsembleClassifier which follow the exact algorithm described in the literature. #455 by Guillaume Lemaitre.
Deprecate imblearn.ensemble.BalanceCascade. #472 by Guillaume Lemaitre.
Deprecate return_indices in all samplers. Instead, an attribute sample_indices_ is created whenever the sampler is selecting a subset of the original samples. #474 by @Guillaume Lemaitre <glemaitre.

Version 0.3#

February 22, 2018

Changelog#

Pytest is used instead of nosetests. #321 by Joan Massich.

Added a User Guide and extended some examples. #295 by Guillaume Lemaitre.

Fixed a bug in utils.check_ratio such that an error is raised when the number of samples required is negative. #312 by Guillaume Lemaitre.
Fixed a bug in under_sampling.NearMiss version 3. The indices returned were wrong. #312 by Guillaume Lemaitre.
Fixed bug for ensemble.BalanceCascade and combine.SMOTEENN and SMOTETomek. #295 by Guillaume Lemaitre.
Fixed bug for check_ratio to be able to pass arguments when ratio is a callable. #307 by Guillaume Lemaitre.

Turn off steps in pipeline.Pipeline using the None object. By Christos Aridas.
Add a fetching function datasets.fetch_datasets in order to get some imbalanced datasets useful for benchmarking. #249 by Guillaume Lemaitre.

All samplers accepts sparse matrices with defaulting on CSR type. #316 by Guillaume Lemaitre.
datasets.make_imbalance take a ratio similarly to other samplers. It supports multiclass. #312 by Guillaume Lemaitre.
All the unit tests have been factorized and a utils.check_estimators has been derived from scikit-learn. By Guillaume Lemaitre.
Script for automatic build of conda packages and uploading. #242 by Guillaume Lemaitre
Remove seaborn dependence and improve the examples. #264 by Guillaume Lemaitre.
adapt all classes to multi-class resampling. #290 by Guillaume Lemaitre

__init__ has been removed from the base.SamplerMixin to create a real mixin class. #242 by Guillaume Lemaitre.
creation of a module exceptions to handle consistant raising of errors. #242 by Guillaume Lemaitre.
creation of a module utils.validation to make checking of recurrent patterns. #242 by Guillaume Lemaitre.
move the under-sampling methods in prototype_selection and prototype_generation submodule to make a clearer dinstinction. #277 by Guillaume Lemaitre.
change ratio such that it can adapt to multiple class problems. #290 by Guillaume Lemaitre.

Deprecation of the use of min_c_ in datasets.make_imbalance. #312 by Guillaume Lemaitre
Deprecation of the use of float in datasets.make_imbalance for the ratio parameter. #290 by Guillaume Lemaitre.
deprecate the use of float as ratio in favor of dictionary, string, or callable. #290 by Guillaume Lemaitre.

Version 0.2#

January 1, 2017

Changelog#

Fixed a bug in under_sampling.NearMiss which was not picking the right samples during under sampling for the method 3. By Guillaume Lemaitre.
Fixed a bug in ensemble.EasyEnsemble, correction of the random_state generation. By Guillaume Lemaitre and Christos Aridas.
Fixed a bug in under_sampling.RepeatedEditedNearestNeighbours, add additional stopping criterion to avoid that the minority class become a majority class or that a class disappear. By Guillaume Lemaitre.
Fixed a bug in under_sampling.AllKNN, add stopping criteria to avoid that the minority class become a majority class or that a class disappear. By Guillaume Lemaitre.
Fixed a bug in under_sampling.CondensedNeareastNeigbour, correction of the list of indices returned. By Guillaume Lemaitre.
Fixed a bug in ensemble.BalanceCascade, solve the issue to obtain a single array if desired. By Guillaume Lemaitre.
Fixed a bug in pipeline.Pipeline, solve to embed Pipeline in other Pipeline. #231 by Christos Aridas.
Fixed a bug in pipeline.Pipeline, solve the issue to put to sampler in the same Pipeline. #188 by Christos Aridas.
Fixed a bug in under_sampling.CondensedNeareastNeigbour, correction of the shape of sel_x when only one sample is selected. By Aliaksei Halachkin.
Fixed a bug in under_sampling.NeighbourhoodCleaningRule, selecting neighbours instead of minority class misclassified samples. #230 by Aleksandr Loskutov.
Fixed a bug in over_sampling.ADASYN, correction of the creation of a new sample so that the new sample lies between the minority sample and the nearest neighbour. #235 by Rafael Wampfler.

Added AllKNN under sampling technique. By Dayvid Oliveira.
Added a module metrics implementing some specific scoring function for the problem of balancing. #204 by Guillaume Lemaitre and Christos Aridas.

Added support for bumpversion. By Guillaume Lemaitre.
Validate the type of target in binary samplers. A warning is raised for the moment. By Guillaume Lemaitre and Christos Aridas.
Change from cross_validation module to model_selection module for sklearn deprecation cycle. By Dayvid Oliveira and Christos Aridas.

size_ngh has been deprecated in combine.SMOTEENN. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.
size_ngh has been deprecated in under_sampling.EditedNearestNeighbors. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.
size_ngh has been deprecated in under_sampling.CondensedNeareastNeigbour. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.
size_ngh has been deprecated in under_sampling.OneSidedSelection. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.
size_ngh has been deprecated in under_sampling.NeighbourhoodCleaningRule. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.
size_ngh has been deprecated in under_sampling.RepeatedEditedNearestNeighbours. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.
size_ngh has been deprecated in under_sampling.AllKNN. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.
Two base classes BaseBinaryclassSampler and BaseMulticlassSampler have been created to handle the target type and raise warning in case of abnormality. By Guillaume Lemaitre and Christos Aridas.
Move random_state to be assigned in the SamplerMixin initialization. By Guillaume Lemaitre.
Provide estimators instead of parameters in combine.SMOTEENN and combine.SMOTETomek. Therefore, the list of parameters have been deprecated. By Guillaume Lemaitre and Christos Aridas.
k has been deprecated in over_sampling.ADASYN. Use n_neighbors instead. #183 by Guillaume Lemaitre.
k and m have been deprecated in over_sampling.SMOTE. Use k_neighbors and m_neighbors instead. #182 by Guillaume Lemaitre.
n_neighbors accept KNeighborsMixin based object for under_sampling.EditedNearestNeighbors, under_sampling.CondensedNeareastNeigbour, under_sampling.NeighbourhoodCleaningRule, under_sampling.RepeatedEditedNearestNeighbours, and under_sampling.AllKNN. #109 by Guillaume Lemaitre.

Replace some remaining UnbalancedDataset occurences. By Francois Magimel.
Added doctest in the documentation. By Guillaume Lemaitre.

Version 0.1#

December 26, 2016

Changelog#

First release of the stable API. By :user;`Fernando Nogueira <fmfn>`, Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.

Under-sampling
1. Random majority under-sampling with replacement
2. Extraction of majority-minority Tomek links
3. Under-sampling with Cluster Centroids
4. NearMiss-(1 & 2 & 3)
5. Condensend Nearest Neighbour
6. One-Sided Selection
7. Neighboorhood Cleaning Rule
8. Edited Nearest Neighbours
9. Instance Hardness Threshold
10. Repeated Edited Nearest Neighbours
Over-sampling
1. Random minority over-sampling with replacement
2. SMOTE - Synthetic Minority Over-sampling Technique
3. bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2
4. SVM SMOTE - Support Vectors SMOTE
5. ADASYN - Adaptive synthetic sampling approach for imbalanced learning
Over-sampling followed by under-sampling
1. SMOTE + Tomek links
2. SMOTE + ENN
Ensemble sampling
1. EasyEnsemble
2. BalanceCascade