make_imbalance#
- imblearn.datasets.make_imbalance(X, y, *, sampling_strategy=None, random_state=None, verbose=False, **kwargs)[source]#
Turn a dataset into an imbalanced dataset with a specific sampling strategy.
A simple toy dataset to visualize clustering and classification algorithms.
Read more in the User Guide.
- Parameters:
- X{array-like, dataframe} of shape (n_samples, n_features)
Matrix containing the data to be imbalanced.
- yarray-like of shape (n_samples,)
Corresponding label for each sample in X.
- sampling_strategydict or callable,
Ratio to use for resampling the data set.
When
dict
, the keys correspond to the targeted classes. The values correspond to the desired number of samples for each targeted class.When callable, function taking
y
and returns adict
. The keys correspond to the targeted classes. The values correspond to the desired number of samples for each class.
- random_stateint, RandomState instance or None, default=None
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- verbosebool, default=False
Show information regarding the sampling.
- **kwargsdict
Dictionary of additional keyword arguments to pass to
sampling_strategy
.
- Returns:
- X_resampled{ndarray, dataframe} of shape (n_samples_new, n_features)
The array containing the imbalanced data.
- y_resampledndarray of shape (n_samples_new)
The corresponding label of
X_resampled
.
Notes
See Multiclass classification with under-sampling, Create an imbalanced dataset, and How to use sampling_strategy in imbalanced-learn.
Examples
>>> from collections import Counter >>> from sklearn.datasets import load_iris >>> from imblearn.datasets import make_imbalance
>>> data = load_iris() >>> X, y = data.data, data.target >>> print(f'Distribution before imbalancing: {Counter(y)}') Distribution before imbalancing: Counter({0: 50, 1: 50, 2: 50}) >>> X_res, y_res = make_imbalance(X, y, ... sampling_strategy={0: 10, 1: 20, 2: 30}, ... random_state=42) >>> print(f'Distribution after imbalancing: {Counter(y_res)}') Distribution after imbalancing: Counter({2: 30, 1: 20, 0: 10})
Examples using imblearn.datasets.make_imbalance
#
How to use sampling_strategy in imbalanced-learn
Multiclass classification with under-sampling
Fitting model on imbalanced datasets and how to fight bias