
class imblearn.keras.BalancedBatchGenerator(X, y, *, sample_weight=None, sampler=None, batch_size=32, keep_sparse=False, random_state=None)[source]#

Create balanced batches when training a keras model.

Create a keras Sequence which is given to fit. The sampler defines the sampling strategy used to balance the dataset ahead of creating the batch. The sampler should have an attribute sample_indices_.

Added in version 0.4.

Xndarray of shape (n_samples, n_features)

Original imbalanced dataset.

yndarray of shape (n_samples,) or (n_samples, n_classes)

Associated targets.

sample_weightndarray of shape (n_samples,)

Sample weight.

samplersampler object, default=None

A sampler instance which has an attribute sample_indices_. By default, the sampler used is a RandomUnderSampler.

batch_sizeint, default=32

Number of samples per gradient update.

keep_sparsebool, default=False

Either or not to conserve or not the sparsity of the input (i.e. X, y, sample_weight). By default, the returned batches will be dense.

random_stateint, RandomState instance or None, default=None

Control the randomization of the algorithm:

  • If int, random_state is the seed used by the random number generator;

  • If RandomState instance, random_state is the random number generator;

  • If None, the random number generator is the RandomState instance used by np.random.

sampler_sampler object

The sampler used to balance the dataset.

indices_ndarray of shape (n_samples, n_features)

The indices of the samples selected during sampling.


>>> from sklearn.datasets import load_iris
>>> iris = load_iris()
>>> from imblearn.datasets import make_imbalance
>>> class_dict = dict()
>>> class_dict[0] = 30; class_dict[1] = 50; class_dict[2] = 40
>>> X, y = make_imbalance(,, sampling_strategy=class_dict)
>>> import tensorflow
>>> y = tensorflow.keras.utils.to_categorical(y, 3)
>>> model = tensorflow.keras.models.Sequential()
>>> model.add(
...     tensorflow.keras.layers.Dense(
...         y.shape[1], input_dim=X.shape[1], activation='softmax'
...     )
... )
>>> model.compile(optimizer='sgd', loss='categorical_crossentropy',
...               metrics=['accuracy'])
>>> from imblearn.keras import BalancedBatchGenerator
>>> from imblearn.under_sampling import NearMiss
>>> training_generator = BalancedBatchGenerator(
...     X, y, sampler=NearMiss(), batch_size=10, random_state=42)
>>> callback_history =, epochs=10, verbose=0)



Method called at the beginning of every epoch.


Method called at the end of every epoch.

property num_batches#

Number of batches in the PyDataset.


The number of batches in the PyDataset or None to indicate that the dataset is infinite.


Method called at the beginning of every epoch.


Method called at the end of every epoch.

