imblearn.keras.balanced_batch_generator

imblearn.keras.balanced_batch_generator(X, y, sample_weight=None, sampler=None, batch_size=32, keep_sparse=False, random_state=None)[source][source]

Create a balanced batch generator to train keras model.

Returns a generator — as well as the number of step per epoch — which is given to fit_generator. The sampler defines the sampling strategy used to balance the dataset ahead of creating the batch. The sampler should have an attribute sample_indices_.

Parameters:
X : ndarray, shape (n_samples, n_features)

Original imbalanced dataset.

y : ndarray, shape (n_samples,) or (n_samples, n_classes)

Associated targets.

sample_weight : ndarray, shape (n_samples,)

Sample weight.

sampler : object or None, optional (default=RandomUnderSampler)

A sampler instance which has an attribute sample_indices_. By default, the sampler used is a imblearn.under_sampling.RandomUnderSampler.

batch_size : int, optional (default=32)

Number of samples per gradient update.

keep_sparse : bool, optional (default=False)

Either or not to conserve or not the sparsity of the input (i.e. X, y, sample_weight). By default, the returned batches will be dense.

random_state : int, RandomState instance or None, optional (default=None)

Control the randomization of the algorithm.

  • If int, random_state is the seed used by the random number generator;
  • If RandomState instance, random_state is the random number generator;
  • If None, the random number generator is the RandomState instance used by np.random.
Returns:
generator : generator of tuple

Generate batch of data. The tuple generated are either (X_batch, y_batch) or (X_batch, y_batch, sampler_weight_batch).

steps_per_epoch : int

The number of samples per epoch. Required by fit_generator in keras.

Examples

>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> from imblearn.datasets import make_imbalance
>>> class_dict = dict()
>>> class_dict[0] = 30; class_dict[1] = 50; class_dict[2] = 40
>>> from imblearn.datasets import make_imbalance
>>> X, y = make_imbalance(X, y, class_dict)
>>> import keras
>>> y = keras.utils.to_categorical(y, 3)
>>> model = keras.models.Sequential()
>>> model.add(keras.layers.Dense(y.shape[1], input_dim=X.shape[1],
...                              activation='softmax'))
>>> model.compile(optimizer='sgd', loss='categorical_crossentropy',
...               metrics=['accuracy'])
>>> from imblearn.keras import balanced_batch_generator
>>> from imblearn.under_sampling import NearMiss
>>> training_generator, steps_per_epoch = balanced_batch_generator(
...     X, y, sampler=NearMiss(), batch_size=10, random_state=42)
>>> callback_history = model.fit_generator(generator=training_generator,
...                                        steps_per_epoch=steps_per_epoch,
...                                        epochs=10, verbose=0)