imblearn.tensorflow.balanced_batch_generator

imblearn.tensorflow.balanced_batch_generator(X, y, sample_weight=None, sampler=None, batch_size=32, keep_sparse=False, random_state=None)[source][source]

Create a balanced batch generator to train keras model.

Returns a generator — as well as the number of step per epoch — which is given to fit_generator. The sampler defines the sampling strategy used to balance the dataset ahead of creating the batch. The sampler should have an attribute sample_indices_.

Parameters:
X : ndarray, shape (n_samples, n_features)

Original imbalanced dataset.

y : ndarray, shape (n_samples,) or (n_samples, n_classes)

Associated targets.

sample_weight : ndarray, shape (n_samples,)

Sample weight.

sampler : object or None, optional (default=RandomUnderSampler)

A sampler instance which has an attribute sample_indices_. By default, the sampler used is a imblearn.under_sampling.RandomUnderSampler.

batch_size : int, optional (default=32)

Number of samples per gradient update.

keep_sparse : bool, optional (default=False)

Either or not to conserve or not the sparsity of the input X. By default, the returned batches will be dense.

random_state : int, RandomState instance or None, optional (default=None)

Control the randomization of the algorithm.

  • If int, random_state is the seed used by the random number generator;
  • If RandomState instance, random_state is the random number generator;
  • If None, the random number generator is the RandomState instance used by np.random.
Returns:
generator : generator of tuple

Generate batch of data. The tuple generated are either (X_batch, y_batch) or (X_batch, y_batch, sampler_weight_batch).

steps_per_epoch : int

The number of samples per epoch.

Examples

>>> import numpy as np
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> class_dict = dict()
>>> class_dict[0] = 30; class_dict[1] = 50; class_dict[2] = 40
>>> from imblearn.datasets import make_imbalance
>>> X, y = make_imbalance(X, y, class_dict)
>>> X = X.astype(np.float32)
>>> batch_size, learning_rate, epochs = 10, 0.01, 10
>>> training_generator, steps_per_epoch = balanced_batch_generator(
...     X, y, sample_weight=None, sampler=None,
...     batch_size=batch_size, random_state=42)
>>> input_size, output_size = X.shape[1], 3
>>> import tensorflow as tf
>>> def init_weights(shape):
...     return tf.Variable(tf.random_normal(shape, stddev=0.01))
>>> def accuracy(y_true, y_pred):
...     return np.mean(np.argmax(y_pred, axis=1) == y_true)
>>> # input and output
>>> data = tf.placeholder("float32", shape=[None, input_size])
>>> targets = tf.placeholder("int32", shape=[None])
>>> # build the model and weights
>>> W = init_weights([input_size, output_size])
>>> b = init_weights([output_size])
>>> out_act = tf.nn.sigmoid(tf.matmul(data, W) + b)
>>> # build the loss, predict, and train operator
>>> cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
...     logits=out_act, labels=targets)
>>> loss = tf.reduce_sum(cross_entropy)
>>> optimizer = tf.train.GradientDescentOptimizer(learning_rate)
>>> train_op = optimizer.minimize(loss)
>>> predict = tf.nn.softmax(out_act)
>>> # Initialization of all variables in the graph
>>> init = tf.global_variables_initializer()
>>> with tf.Session() as sess:
...     print('Starting training')
...     sess.run(init)
...     for e in range(epochs):
...         for i in range(steps_per_epoch):
...             X_batch, y_batch = next(training_generator)
...             feed_dict = dict()
...             feed_dict[data] = X_batch; feed_dict[targets] = y_batch
...             sess.run([train_op, loss], feed_dict=feed_dict)
...         # For each epoch, run accuracy on train and test
...         feed_dict = dict()
...         feed_dict[data] = X
...         predicts_train = sess.run(predict, feed_dict=feed_dict)
...         print("epoch: {} train accuracy: {:.3f}"
...               .format(e, accuracy(y, predicts_train)))
... # doctest: +ELLIPSIS
Starting training
[...