Metrics specific to imbalanced learning#
Specific metrics have been developed to evaluate classifier which
has been trained using imbalanced data.
imblearn provides mainly
two additional metrics which are not implemented in
geometric mean and (ii) index balanced accuracy.
# Authors: Guillaume Lemaitre <firstname.lastname@example.org> # License: MIT
print(__doc__) RANDOM_STATE = 42
First, we will generate some imbalanced dataset.
We will split the data into a training and testing set.
We will create a pipeline made of a
over-sampler followed by a
from sklearn.linear_model import LogisticRegression from sklearn.preprocessing import StandardScaler from imblearn.over_sampling import SMOTE
Now, we will train the model on the training set and get the prediction
associated with the testing set. Be aware that the resampling will happen
only when calling
fit: the number of samples in
y_pred is the same than
The geometric mean corresponds to the square root of the product of the sensitivity and specificity. Combining the two metrics should account for the balancing of the dataset.
The geometric mean is 0.940
The index balanced accuracy can transform any metric to be used in imbalanced learning problems.
The IBA using alpha=0.1 and the geometric mean: 0.884
The IBA using alpha=0.5 and the geometric mean: 0.884
Total running time of the script: ( 0 minutes 1.675 seconds)
Estimated memory usage: 9 MB