Release history#
Version 0.14.0#
August 14, 2025
Changelog#
Bug fixes#
Enhancements#
Add
InstanceHardnessCVto split data and ensure that samples are distributed in folds based on their instance hardness. #1125 by Frits Hermans.
Compatibility#
Compatibility with scikit-learn 1.7 #1137, #1145, #1146 by Guillaume Lemaitre.
Deprecations#
Version 0.13.0#
December 20, 2024
Changelog#
Bug fixes#
Fix
get_metadata_routinginPipelinesuch that one can use a sampler with metadata routing. #1115 by Guillaume Lemaitre.
Compatibility#
Compatibility with scikit-learn 1.6 #1109 by Guillaume Lemaitre.
Deprecations#
Pipelinenow usescheck_is_fittedinstead ofcheck_fittedto check if the pipeline is fitted. In 0.15, it will raise an error instead of a warning. #1109 by Guillaume Lemaitre.algorithmparameter inRUSBoostClassifieris now deprecated and will be removed in 0.14. #1109 by Guillaume Lemaitre.
Version 0.12.4#
October 4, 2024
Changelog#
Compatibility#
Compatibility with NumPy 2.0+ #1097 by Guillaume Lemaitre.
Version 0.12.3#
May 28, 2024
Changelog#
Compatibility#
Compatibility with scikit-learn 1.5 #1074 and #1084 by Guillaume Lemaitre.
Version 0.12.2#
March 31, 2024
Changelog#
Bug fixes#
Fix the way we check for a specific Python version in the test suite. #1075 by Guillaume Lemaitre.
Version 0.12.1#
March 31, 2024
Changelog#
Bug fixes#
Fix a bug in
InstanceHardnessThresholdwhereestimatorcould not be aPipelineobject. #1049 by Gonenc Mogol.
Compatibility#
Do not use
distutilsin tests due to deprecation. #1065 by Michael R. Crusoe.Fix the scikit-learn import in tests to be compatible with version 1.4.1.post1. #1073 by Guillaume Lemaitre.
Fix test to be compatible with Python 3.13. #1073 by Guillaume Lemaitre.
Version 0.12.0#
January 24, 2024
Changelog#
Bug fixes#
Fix a bug in
SMOTENCwhere the entries of the one-hot encoding should be divided bysqrt(2)and not2, taking into account that they are plugged into an Euclidean distance computation. #1014 by Guillaume Lemaitre.Raise an informative error message when all support vectors are tagged as noise in
SVMSMOTE. #1016 by Guillaume Lemaitre.Fix a bug in
SMOTENCwhere the median of standard deviation of the continuous features was only computed on the minority class. Now, we are computing this statistic for each class that is up-sampled. #1015 by Guillaume Lemaitre.Fix a bug in
SMOTENCsuch that the case where the median of standard deviation of the continuous features is null is handled in the multiclass case as well. #1015 by Guillaume Lemaitre.Fix a bug in
BorderlineSMOTEversion 2 where samples should be generated from the whole dataset and not only from the minority class. #1023 by Guillaume Lemaitre.Fix a bug in
NeighbourhoodCleaningRulewhere thekind_sel="all"was not working as explained in the literature. #1012 by Guillaume Lemaitre.Fix a bug in
NeighbourhoodCleaningRulewhere thethreshold_cleaningratio was multiplied on the total number of samples instead of the number of samples in the minority class. #1012 by Guillaume Lemaitre.Fix a bug in
RandomUnderSamplerandRandomOverSamplerwhere a column containing only NaT was not handled correctly. #1059 by Guillaume Lemaitre.
Compatibility#
BalancedRandomForestClassifiernow support missing values and monotonic constraints if scikit-learn >= 1.4 is installed.Pipelinesupport metadata routing if scikit-learn >= 1.4 is installed.Compatibility with scikit-learn 1.4. #1058 by Guillaume Lemaitre.
Deprecations#
Deprecate
estimator_argument in favor ofestimators_for the classesCondensedNearestNeighbourandOneSidedSelection.estimator_will be removed in 0.14. #1011 by Guillaume Lemaitre.Deprecate
kind_selin:pr:`1012by Guillaume Lemaitre.
Enhancements#
Version 0.11.0#
July 8, 2023
Changelog#
Bug fixes#
Fix a bug in
classification_report_imbalancedwhere the parametertarget_nameswas not taken into account whenoutput_dict=True. #989 by AYY7.SMOTENCnow handles mix types of data type such asboolandpd.categoryby delegating the conversion to scikit-learn encoder. #1002 by Guillaume Lemaitre.Handle sparse matrices in
SMOTENand raise a warning since it requires a conversion to dense matrices. #1003 by Guillaume Lemaitre.Remove spurious warning raised when minority class get over-sampled more than the number of sample in the majority class. #1007 by Guillaume Lemaitre.
Compatibility#
Maintenance release for being compatible with scikit-learn >= 1.3.0. #999 by Guillaume Lemaitre.
Deprecation#
The fitted attribute
ohe_inSMOTENCis deprecated and will be removed in version 0.13. Usecategorical_encoder_instead. #1000 by Guillaume Lemaitre.The default of the parameters
sampling_strategy,bootstrapandreplacementwill change inBalancedRandomForestClassifierto follow the implementation of the original paper. This changes will take effect in version 0.13. #1006 by Guillaume Lemaitre.
Enhancements#
SMOTENCnow accepts a parametercategorical_encoderallowing to specify aOneHotEncoderwith custom parameters. #1000 by Guillaume Lemaitre.SMOTENnow accepts a parametercategorical_encoderallowing to specify aOrdinalEncoderwith custom parameters. A new fitted parametercategorical_encoder_is exposed to access the fitted encoder. #1001 by Guillaume Lemaitre.RandomUnderSamplerandRandomOverSampler(whenshrinkage is not None) now accept any data types and will not attempt any data conversion. #1004 by Guillaume Lemaitre.SMOTENCnow support passing array-like ofstrwhen passing thecategorical_featuresparameter. #1008 by :user`Guillaume Lemaitre <glemaitre>`.SMOTENCnow support automatic categorical inference whencategorical_featuresis set to"auto". #1009 by :user`Guillaume Lemaitre <glemaitre>`.
Version 0.10.1#
December 28, 2022
Changelog#
Bug fixes#
Fix a regression in over-sampler where the string
minoritywas rejected as an unvalid sampling strategy. #964 by Prakhyath Bhandary.
Version 0.10.0#
December 9, 2022
Changelog#
Bug fixes#
Make sure that
Substitutionis working withpython -OOthat replace__doc__byNone. #953 bu Guillaume Lemaitre.
Compatibility#
Maintenance release for be compatible with scikit-learn >= 1.0.2. #946, #947, #949 by Guillaume Lemaitre.
Add support for automatic parameters validation as in scikit-learn >= 1.2. #955 by Guillaume Lemaitre.
Add support for
feature_names_in_as well asget_feature_names_outfor all samplers. #959 by Guillaume Lemaitre.
Deprecation#
The parameter
n_jobshas been deprecated from the classesADASYN,BorderlineSMOTE,SMOTE,SMOTENC,SMOTEN, andSVMSMOTE. Instead, pass a nearest neighbors estimator wheren_jobsis set. #887 by Guillaume Lemaitre.The parameter
base_estimatoris deprecated and will be removed in version 0.12. It is impacted the following classes:BalancedBaggingClassifier,EasyEnsembleClassifier,RUSBoostClassifier. #946 by Guillaume Lemaitre.
Enhancements#
Add support to accept compatible
NearestNeighborsobjects by only duck-typing. For instance, it allows to accept cuML instances. #858 by NV-jpt and Guillaume Lemaitre.
Version 0.9.1#
May 16, 2022
Changelog#
This release provides fixes that make imbalanced-learn works with the
latest release (1.1.0) of scikit-learn.
Version 0.9.0#
January 11, 2022
Changelog#
This release is mainly providing fixes that make imbalanced-learn works
with the latest release (1.0.2) of scikit-learn.
Version 0.8.1#
September 29, 2020
Changelog#
Maintenance#
Make
imbalanced-learncompatible withscikit-learn1.0. #864 by Guillaume Lemaitre.
Version 0.8.0#
February 18, 2021
Changelog#
New features#
Add the the function
imblearn.metrics.macro_averaged_mean_absolute_errorreturning the average across class of the MAE. This metric is used in ordinal classification. #780 by Aurélien Massiot.Add the class
imblearn.metrics.pairwise.ValueDifferenceMetricto compute pairwise distances between samples containing only categorical values. #796 by Guillaume Lemaitre.Add the class
imblearn.over_sampling.SMOTENto over-sample data only containing categorical features. #802 by Guillaume Lemaitre.Add the possibility to pass any type of samplers in
imblearn.ensemble.BalancedBaggingClassifierunlocking the implementation of methods based on resampled bagging. #808 by Guillaume Lemaitre.
Enhancements#
Add option
output_dictinimblearn.metrics.classification_report_imbalancedto return a dictionary instead of a string. #770 by Guillaume Lemaitre.Added an option to generate smoothed bootstrap in
imblearn.over_sampling.RandomOverSampler. It is controls by the parametershrinkage. This method is also known as Random Over-Sampling Examples (ROSE). #754 by Andrea Lorenzon and Guillaume Lemaitre.
Bug fixes#
Fix a bug in
imblearn.under_sampling.ClusterCentroidswherevoting="hard"could have lead to select a sample from any class instead of the targeted class. #769 by Guillaume Lemaitre.Fix a bug in
imblearn.FunctionSamplerwhere validation was performed even withvalidate=Falsewhen callingfit. #790 by Guillaume Lemaitre.
Maintenance#
Remove requirements files in favour of adding the packages in the
extras_requirewithin thesetup.pyfile. #816 by Guillaume Lemaitre.Change the website template to use
pydata-sphinx-theme. #801 by Guillaume Lemaitre.
Deprecation#
The context manager
imblearn.utils.testing.warnsis deprecated in 0.8 and will be removed 1.0. #815 by Guillaume Lemaitre.
Version 0.7.0#
June 9, 2020
Changelog#
Maintenance#
Ensure that
imblearn.pipeline.Pipelineis working whenmemoryis activated andjoblib==0.11. #687 by Christos Aridas.Refactor common test to use the dev tools from
scikit-learn0.23. #710 by Guillaume Lemaitre.Remove
FutureWarningissued byscikit-learn0.23. #710 by Guillaume Lemaitre.Impose keywords only argument as in
scikit-learn. #721 by Guillaume Lemaitre.
Changed models#
The following models might give some different results due to changes:
Bug fixes#
Change the default value
min_samples_leafto be consistent with scikit-learn. #711 by zerolfx.Fix a bug due to change in
scikit-learn0.23 inimblearn.metrics.make_index_balanced_accuracy. The function was unusable. #710 by Guillaume Lemaitre.Raise a proper error message when only numerical or categorical features are given in
imblearn.over_sampling.SMOTENC. #720 by Guillaume Lemaitre.Fix a bug when the median of the standard deviation is null in
imblearn.over_sampling.SMOTENC. #675 by bganglia.
Enhancements#
The classifier implemented in imbalanced-learn,
imblearn.ensemble.BalancedBaggingClassifier,imblearn.ensemble.BalancedRandomForestClassifier,imblearn.ensemble.EasyEnsembleClassifier, andimblearn.ensemble.RUSBoostClassifier, acceptsampling_strategywith the same key than inywithout the need of encodingyin advance. #718 by Guillaume Lemaitre.Lazy import
kerasmodule when importingimblearn.keras#719 by Guillaume Lemaitre.
Deprecation#
Deprecation of the parameters
n_jobsinimblearn.under_sampling.ClusterCentroidssince it was used bysklearn.cluster.KMeanswhich deprecated it. #710 by Guillaume Lemaitre.Deprecation of passing keyword argument by position similarly to
scikit-learn. #721 by Guillaume lemaitre.
Version 0.6.2#
February 16, 2020
This is a bug-fix release to resolve some issues regarding the handling the input and the output format of the arrays.
Changelog#
Allow column vectors to be passed as targets. #673 by Christos Aridas.
Better input/output handling for pandas, numpy and plain lists. #681 by Christos Aridas.
Version 0.6.1#
December 7, 2019
This is a bug-fix release to primarily resolve some packaging issues in version 0.6.0. It also includes minor documentation improvements and some bug fixes.
Changelog#
Bug fixes#
Fix a bug in
imblearn.ensemble.BalancedRandomForestClassifierleading to a wrong number of samples used during fitting duemax_samplesand therefore a bad computation of the OOB score. #656 by Guillaume Lemaitre.
Version 0.6.0#
December 5, 2019
Changelog#
Changed models#
The following models might give some different sampling due to changes in scikit-learn:
The following samplers will give different results due to change linked to the random state internal usage:
Bug fixes#
imblearn.under_sampling.InstanceHardnessThresholdnow take into account therandom_stateand will give deterministic results. In addition,cross_val_predictis used to take advantage of the parallelism. #599 by Shihab Shahriar Khan.Fix a bug in
imblearn.ensemble.BalancedRandomForestClassifierleading to a wrong computation of the OOB score. #656 by Guillaume Lemaitre.
Maintenance#
Update imports from scikit-learn after that some modules have been privatize. The following import have been changed:
sklearn.ensemble._base._set_random_states,sklearn.ensemble._forest._parallel_build_trees,sklearn.metrics._classification._check_targets,sklearn.metrics._classification._prf_divide,sklearn.utils.Bunch,sklearn.utils._safe_indexing,sklearn.utils._testing.assert_allclose,sklearn.utils._testing.assert_array_equal,sklearn.utils._testing.SkipTest. #617 by Guillaume Lemaitre.Synchronize
imblearn.pipelinewithsklearn.pipeline. #620 by Guillaume Lemaitre.Synchronize
imblearn.ensemble.BalancedRandomForestClassifierand add parametersmax_samplesandccp_alpha. #621 by Guillaume Lemaitre.
Enhancement#
imblearn.under_sampling.RandomUnderSampling,imblearn.over_sampling.RandomOverSampling,imblearn.datasets.make_imbalanceaccepts Pandas DataFrame in and will output Pandas DataFrame. Similarly, it will accepts Pandas Series in and will output Pandas Series. #636 by Guillaume Lemaitre.imblearn.FunctionSampleraccepts a parametervalidateallowing to check or not the inputXandy. #637 by Guillaume Lemaitre.imblearn.under_sampling.RandomUnderSampler,imblearn.over_sampling.RandomOverSamplercan resample when non finite values are present inX. #643 by Guillaume Lemaitre.All samplers will output a Pandas DataFrame if a Pandas DataFrame was given as an input. #644 by Guillaume Lemaitre.
The samples generation in
imblearn.over_sampling.ADASYN,imblearn.over_sampling.SMOTE,imblearn.over_sampling.BorderlineSMOTE,imblearn.over_sampling.SVMSMOTE,imblearn.over_sampling.KMeansSMOTE,imblearn.over_sampling.SMOTENCis now vectorize with giving an additional speed-up whenXin sparse. #596 and #649 by Matt Eding.
Deprecation#
The following classes have been removed after 2 deprecation cycles:
ensemble.BalanceCascadeandensemble.EasyEnsemble. #617 by Guillaume Lemaitre.The following functions have been removed after 2 deprecation cycles:
utils.check_ratio. #617 by Guillaume Lemaitre.The parameter
ratioandreturn_indiceshas been removed from all samplers. #617 by Guillaume Lemaitre.The parameters
m_neighbors,out_step,kind,svm_estimatorhave been removed from theimblearn.over_sampling.SMOTE. #617 by Guillaume Lemaitre.
Version 0.5.0#
June 28, 2019
Changelog#
Changed models#
The following models or function might give different results even if the
same data X and y are the same.
imblearn.ensemble.RUSBoostClassifierdefault estimator changed fromsklearn.tree.DecisionTreeClassifierwith full depth to a decision stump (i.e., tree withmax_depth=1).
Documentation#
Correct the definition of the ratio when using a
floatin sampling strategy for the over-sampling and under-sampling. #525 by Ariel Rossanigo.Add
imblearn.over_sampling.BorderlineSMOTEandimblearn.over_sampling.SVMSMOTEin the API documenation. #530 by Guillaume Lemaitre.
Enhancement#
Add Parallelisation for SMOTEENN and SMOTETomek. #547 by Michael Hsieh.
Add
imblearn.utils._show_versions. Updated the contribution guide and issue template showing how to print system and dependency information from the command line. #557 by Alexander L. Hayes.Add
imblearn.over_sampling.KMeansSMOTEwhich is an over-sampler clustering points before to apply SMOTE. #435 by Stephan Heijl.
Maintenance#
Make it possible to
import imblearnand access submodule. #500 by Guillaume Lemaitre.Remove support for Python 2, remove deprecation warning from scikit-learn 0.21. #576 by Guillaume Lemaitre.
Bug#
Fix wrong usage of
keras.layers.BatchNormalizationinporto_seguro_keras_under_sampling.pyexample. The batch normalization was moved before the activation function and the bias was removed from the dense layer. #531 by Guillaume Lemaitre.Fix bug which converting to COO format sparse when stacking the matrices in
imblearn.over_sampling.SMOTENC. This bug was only old scipy version. #539 by Guillaume Lemaitre.Fix bug in
imblearn.pipeline.Pipelinewhere None could be the final estimator. #554 by Oliver Rausch.Fix bug in
imblearn.over_sampling.SVMSMOTEandimblearn.over_sampling.BorderlineSMOTEwhere the default parameter ofn_neighborswas not set properly. #578 by Guillaume Lemaitre.Fix bug by changing the default depth in
imblearn.ensemble.RUSBoostClassifierto get a decision stump as a weak learner as in the original paper. #545 by Christos Aridas.Allow to import
kerasdirectly fromtensorflowin theimblearn.keras. #531 by Guillaume Lemaitre.
Version 0.4.2#
October 21, 2018
Changelog#
Bug fixes#
Fix a bug in
imblearn.over_sampling.SMOTENCin which the the median of the standard deviation instead of half of the median of the standard deviation. By Guillaume Lemaitre in #491.Raise an error when passing target which is not supported, i.e. regression target or multilabel targets. Imbalanced-learn does not support this case. By Guillaume Lemaitre in #490.
Fix a bug in
imblearn.over_sampling.SMOTENCin which a sparse matrices were densify duringinverse_transform. By Guillaume Lemaitre in #495.Fix a bug in
imblearn.over_sampling.SMOTE_NCin which a the tie breaking was wrongly sampling. By Guillaume Lemaitre in #497.
Version 0.4#
October 12, 2018
Warning
Version 0.4 is the last version of imbalanced-learn to support Python 2.7 and Python 3.4. Imbalanced-learn 0.5 will require Python 3.5 or higher.
Highlights#
This release brings its set of new feature as well as some API changes to strengthen the foundation of imbalanced-learn.
As new feature, 2 new modules imblearn.keras and
imblearn.tensorflow have been added in which imbalanced-learn samplers
can be used to generate balanced mini-batches.
The module imblearn.ensemble has been consolidated with new classifier:
imblearn.ensemble.BalancedRandomForestClassifier,
imblearn.ensemble.EasyEnsembleClassifier,
imblearn.ensemble.RUSBoostClassifier.
Support for string has been added in
imblearn.over_sampling.RandomOverSampler and
imblearn.under_sampling.RandomUnderSampler. In addition, a new class
imblearn.over_sampling.SMOTENC allows to generate sample with data
sets containing both continuous and categorical features.
The imblearn.over_sampling.SMOTE has been simplified and break down
to 2 additional classes:
imblearn.over_sampling.SVMSMOTE and
imblearn.over_sampling.BorderlineSMOTE.
There is also some changes regarding the API:
the parameter sampling_strategy has been introduced to replace the
ratio parameter. In addition, the return_indices argument has been
deprecated and all samplers will exposed a sample_indices_ whenever this is
possible.
Changelog#
API#
Replace the parameter
ratiobysampling_strategy. #411 by Guillaume Lemaitre.Enable to use a
floatwith binary classification forsampling_strategy. #411 by Guillaume Lemaitre.Enable to use a
listfor the cleaning methods to specify the class to sample. #411 by Guillaume Lemaitre.Replace
fit_samplebyfit_resample. An alias is still available for backward compatibility. In addition,samplehas been removed to avoid resampling on different set of data. #462 by Guillaume Lemaitre.
New features#
Add a
kerasandtensorflowmodules to create balanced mini-batches generator. #409 by Guillaume Lemaitre.Add
imblearn.ensemble.EasyEnsembleClassifierwhich create a bag of AdaBoost classifier trained on balanced bootstrap samples. #455 by Guillaume Lemaitre.Add
imblearn.ensemble.BalancedRandomForestClassifierwhich balanced each bootstrap provided to each tree of the forest. #459 by Guillaume Lemaitre.Add
imblearn.ensemble.RUSBoostClassifierwhich applied a random under-sampling stage before each boosting iteration of AdaBoost. #469 by Guillaume Lemaitre.Add
imblern.over_sampling.SMOTENCwhich generate synthetic samples on data set with heterogeneous data type (continuous and categorical features). #412 by Denis Dudnik and Guillaume Lemaitre.
Enhancement#
Add a documentation node to create a balanced random forest from a balanced bagging classifier. #372 by Guillaume Lemaitre.
Document the metrics to evaluate models on imbalanced dataset. #367 by Guillaume Lemaitre.
Add support for one-vs-all encoded target to support keras. #409 by Guillaume Lemaitre.
Adding specific class for borderline and SVM SMOTE using
BorderlineSMOTEandSVMSMOTE. #440 by Guillaume Lemaitre.Allow
imblearn.over_sampling.RandomOverSamplercan return indices using the attributesreturn_indices. #439 by Hugo Gascon and Guillaume Lemaitre.Allow
imblearn.under_sampling.RandomUnderSamplerandimblearn.over_sampling.RandomOverSamplerto sample object array containing strings. #451 by Guillaume Lemaitre.
Bug fixes#
Fix bug in
metrics.classification_report_imbalancedfor whichy_predandy_truewhere inversed. #394 by @Ole Silvig <klizter>.Fix bug in ADASYN to consider only samples from the current class when generating new samples. #354 by Guillaume Lemaitre.
Fix bug which allow for sorted behavior of
sampling_strategydictionary and thus to obtain a deterministic results when using the same random state. #447 by Guillaume Lemaitre.Force to clone scikit-learn estimator passed as attributes to samplers. #446 by Guillaume Lemaitre.
Fix bug which was not preserving the dtype of X and y when generating samples. #450 by Guillaume Lemaitre.
Add the option to pass a
Memoryobject tomake_pipelinelike inpipeline.Pipelineclass. #458 by Christos Aridas.
Maintenance#
Remove deprecated parameters in 0.2 - #331 by Guillaume Lemaitre.
Make some modules private. #452 by Guillaume Lemaitre.
Upgrade requirements to scikit-learn 0.20. #379 by Guillaume Lemaitre.
Catch deprecation warning in testing. #441 by Guillaume Lemaitre.
Refactor and impose
pyteststyle tests. #470 by Guillaume Lemaitre.
Documentation#
Remove some docstring which are not necessary. #454 by Guillaume Lemaitre.
Fix the documentation of the
sampling_strategyparameters when used as a float. #480 by Guillaume Lemaitre.
Deprecation#
Deprecate
ratioin favor ofsampling_strategy. #411 by Guillaume Lemaitre.Deprecate the use of a
dictfor cleaning methods. alistshould be used. #411 by Guillaume Lemaitre.Deprecate
random_stateinimblearn.under_sampling.NearMiss,imblearn.under_sampling.EditedNearestNeighbors,imblearn.under_sampling.RepeatedEditedNearestNeighbors,imblearn.under_sampling.AllKNN,imblearn.under_sampling.NeighbourhoodCleaningRule,imblearn.under_sampling.InstanceHardnessThreshold,imblearn.under_sampling.CondensedNearestNeighbours.Deprecate
kind,out_step,svm_estimator,m_neighborsinimblearn.over_sampling.SMOTE. User should useimblearn.over_sampling.SVMSMOTEandimblearn.over_sampling.BorderlineSMOTE. #440 by Guillaume Lemaitre.Deprecate
imblearn.ensemble.EasyEnsemblein favor of meta-estimatorimblearn.ensemble.EasyEnsembleClassifierwhich follow the exact algorithm described in the literature. #455 by Guillaume Lemaitre.Deprecate
imblearn.ensemble.BalanceCascade. #472 by Guillaume Lemaitre.Deprecate
return_indicesin all samplers. Instead, an attributesample_indices_is created whenever the sampler is selecting a subset of the original samples. #474 by @Guillaume Lemaitre <glemaitre.
Version 0.3#
February 22, 2018
Changelog#
Pytest is used instead of nosetests. #321 by Joan Massich.
Added a User Guide and extended some examples. #295 by Guillaume Lemaitre.
Fixed a bug in
utils.check_ratiosuch that an error is raised when the number of samples required is negative. #312 by Guillaume Lemaitre.Fixed a bug in
under_sampling.NearMissversion 3. The indices returned were wrong. #312 by Guillaume Lemaitre.Fixed bug for
ensemble.BalanceCascadeandcombine.SMOTEENNandSMOTETomek. #295 by Guillaume Lemaitre.Fixed bug for
check_ratioto be able to pass arguments whenratiois a callable. #307 by Guillaume Lemaitre.
Turn off steps in
pipeline.Pipelineusing theNoneobject. By Christos Aridas.Add a fetching function
datasets.fetch_datasetsin order to get some imbalanced datasets useful for benchmarking. #249 by Guillaume Lemaitre.
All samplers accepts sparse matrices with defaulting on CSR type. #316 by Guillaume Lemaitre.
datasets.make_imbalancetake a ratio similarly to other samplers. It supports multiclass. #312 by Guillaume Lemaitre.All the unit tests have been factorized and a
utils.check_estimatorshas been derived from scikit-learn. By Guillaume Lemaitre.Script for automatic build of conda packages and uploading. #242 by Guillaume Lemaitre
Remove seaborn dependence and improve the examples. #264 by Guillaume Lemaitre.
adapt all classes to multi-class resampling. #290 by Guillaume Lemaitre
__init__has been removed from thebase.SamplerMixinto create a real mixin class. #242 by Guillaume Lemaitre.creation of a module
exceptionsto handle consistant raising of errors. #242 by Guillaume Lemaitre.creation of a module
utils.validationto make checking of recurrent patterns. #242 by Guillaume Lemaitre.move the under-sampling methods in
prototype_selectionandprototype_generationsubmodule to make a clearer dinstinction. #277 by Guillaume Lemaitre.change
ratiosuch that it can adapt to multiple class problems. #290 by Guillaume Lemaitre.
Deprecation of the use of
min_c_indatasets.make_imbalance. #312 by Guillaume LemaitreDeprecation of the use of float in
datasets.make_imbalancefor the ratio parameter. #290 by Guillaume Lemaitre.deprecate the use of float as ratio in favor of dictionary, string, or callable. #290 by Guillaume Lemaitre.
Version 0.2#
January 1, 2017
Changelog#
Fixed a bug in
under_sampling.NearMisswhich was not picking the right samples during under sampling for the method 3. By Guillaume Lemaitre.Fixed a bug in
ensemble.EasyEnsemble, correction of therandom_stategeneration. By Guillaume Lemaitre and Christos Aridas.Fixed a bug in
under_sampling.RepeatedEditedNearestNeighbours, add additional stopping criterion to avoid that the minority class become a majority class or that a class disappear. By Guillaume Lemaitre.Fixed a bug in
under_sampling.AllKNN, add stopping criteria to avoid that the minority class become a majority class or that a class disappear. By Guillaume Lemaitre.Fixed a bug in
under_sampling.CondensedNeareastNeigbour, correction of the list of indices returned. By Guillaume Lemaitre.Fixed a bug in
ensemble.BalanceCascade, solve the issue to obtain a single array if desired. By Guillaume Lemaitre.Fixed a bug in
pipeline.Pipeline, solve to embedPipelinein otherPipeline. #231 by Christos Aridas.Fixed a bug in
pipeline.Pipeline, solve the issue to put to sampler in the samePipeline. #188 by Christos Aridas.Fixed a bug in
under_sampling.CondensedNeareastNeigbour, correction of the shape ofsel_xwhen only one sample is selected. By Aliaksei Halachkin.Fixed a bug in
under_sampling.NeighbourhoodCleaningRule, selecting neighbours instead of minority class misclassified samples. #230 by Aleksandr Loskutov.Fixed a bug in
over_sampling.ADASYN, correction of the creation of a new sample so that the new sample lies between the minority sample and the nearest neighbour. #235 by Rafael Wampfler.
Added AllKNN under sampling technique. By Dayvid Oliveira.
Added a module
metricsimplementing some specific scoring function for the problem of balancing. #204 by Guillaume Lemaitre and Christos Aridas.
Added support for bumpversion. By Guillaume Lemaitre.
Validate the type of target in binary samplers. A warning is raised for the moment. By Guillaume Lemaitre and Christos Aridas.
Change from
cross_validationmodule tomodel_selectionmodule forsklearndeprecation cycle. By Dayvid Oliveira and Christos Aridas.
size_nghhas been deprecated incombine.SMOTEENN. Usen_neighborsinstead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.size_nghhas been deprecated inunder_sampling.EditedNearestNeighbors. Usen_neighborsinstead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.size_nghhas been deprecated inunder_sampling.CondensedNeareastNeigbour. Usen_neighborsinstead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.size_nghhas been deprecated inunder_sampling.OneSidedSelection. Usen_neighborsinstead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.size_nghhas been deprecated inunder_sampling.NeighbourhoodCleaningRule. Usen_neighborsinstead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.size_nghhas been deprecated inunder_sampling.RepeatedEditedNearestNeighbours. Usen_neighborsinstead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.size_nghhas been deprecated inunder_sampling.AllKNN. Usen_neighborsinstead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.Two base classes
BaseBinaryclassSamplerandBaseMulticlassSamplerhave been created to handle the target type and raise warning in case of abnormality. By Guillaume Lemaitre and Christos Aridas.Move
random_stateto be assigned in theSamplerMixininitialization. By Guillaume Lemaitre.Provide estimators instead of parameters in
combine.SMOTEENNandcombine.SMOTETomek. Therefore, the list of parameters have been deprecated. By Guillaume Lemaitre and Christos Aridas.khas been deprecated inover_sampling.ADASYN. Usen_neighborsinstead. #183 by Guillaume Lemaitre.kandmhave been deprecated inover_sampling.SMOTE. Usek_neighborsandm_neighborsinstead. #182 by Guillaume Lemaitre.n_neighborsacceptKNeighborsMixinbased object forunder_sampling.EditedNearestNeighbors,under_sampling.CondensedNeareastNeigbour,under_sampling.NeighbourhoodCleaningRule,under_sampling.RepeatedEditedNearestNeighbours, andunder_sampling.AllKNN. #109 by Guillaume Lemaitre.
Replace some remaining
UnbalancedDatasetoccurences. By Francois Magimel.Added doctest in the documentation. By Guillaume Lemaitre.
Version 0.1#
December 26, 2016
Changelog#
First release of the stable API. By :user;`Fernando Nogueira <fmfn>`, Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.
- Under-sampling
Random majority under-sampling with replacement
Extraction of majority-minority Tomek links
Under-sampling with Cluster Centroids
NearMiss-(1 & 2 & 3)
Condensend Nearest Neighbour
One-Sided Selection
Neighboorhood Cleaning Rule
Edited Nearest Neighbours
Instance Hardness Threshold
Repeated Edited Nearest Neighbours
- Over-sampling
Random minority over-sampling with replacement
SMOTE - Synthetic Minority Over-sampling Technique
bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2
SVM SMOTE - Support Vectors SMOTE
ADASYN - Adaptive synthetic sampling approach for imbalanced learning
- Over-sampling followed by under-sampling
SMOTE + Tomek links
SMOTE + ENN
- Ensemble sampling
EasyEnsemble
BalanceCascade