balanced_batch_generator#
- imblearn.keras.balanced_batch_generator(X, y, *, sample_weight=None, sampler=None, batch_size=32, keep_sparse=False, random_state=None)[source]#
Create a balanced batch generator to train keras model.
Returns a generator — as well as the number of step per epoch — which is given to
fit
. The sampler defines the sampling strategy used to balance the dataset ahead of creating the batch. The sampler should have an attributesample_indices_
.- Parameters:
- Xndarray of shape (n_samples, n_features)
Original imbalanced dataset.
- yndarray of shape (n_samples,) or (n_samples, n_classes)
Associated targets.
- sample_weightndarray of shape (n_samples,), default=None
Sample weight.
- samplersampler object, default=None
A sampler instance which has an attribute
sample_indices_
. By default, the sampler used is aRandomUnderSampler
.- batch_sizeint, default=32
Number of samples per gradient update.
- keep_sparsebool, default=False
Either or not to conserve or not the sparsity of the input (i.e.
X
,y
,sample_weight
). By default, the returned batches will be dense.- random_stateint, RandomState instance, default=None
Control the randomization of the algorithm.
If int,
random_state
is the seed used by the random number generator;If
RandomState
instance, random_state is the random number generator;If
None
, the random number generator is theRandomState
instance used bynp.random
.
- Returns:
- generatorgenerator of tuple
Generate batch of data. The tuple generated are either (X_batch, y_batch) or (X_batch, y_batch, sampler_weight_batch).
- steps_per_epochint
The number of samples per epoch. Required by
fit_generator
in keras.
Examples
>>> from sklearn.datasets import load_iris >>> X, y = load_iris(return_X_y=True) >>> from imblearn.datasets import make_imbalance >>> class_dict = dict() >>> class_dict[0] = 30; class_dict[1] = 50; class_dict[2] = 40 >>> from imblearn.datasets import make_imbalance >>> X, y = make_imbalance(X, y, sampling_strategy=class_dict) >>> import tensorflow >>> y = tensorflow.keras.utils.to_categorical(y, 3) >>> model = tensorflow.keras.models.Sequential() >>> model.add( ... tensorflow.keras.layers.Dense( ... y.shape[1], input_dim=X.shape[1], activation='softmax' ... ) ... ) >>> model.compile(optimizer='sgd', loss='categorical_crossentropy', ... metrics=['accuracy']) >>> from imblearn.keras import balanced_batch_generator >>> from imblearn.under_sampling import NearMiss >>> training_generator, steps_per_epoch = balanced_batch_generator( ... X, y, sampler=NearMiss(), batch_size=10, random_state=42) >>> callback_history = model.fit(training_generator, ... steps_per_epoch=steps_per_epoch, ... epochs=10, verbose=0)