.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/evaluation/plot_metrics.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_evaluation_plot_metrics.py: ======================================= Metrics specific to imbalanced learning ======================================= Specific metrics have been developed to evaluate classifier which has been trained using imbalanced data. :mod:`imblearn` provides mainly two additional metrics which are not implemented in :mod:`sklearn`: (i) geometric mean and (ii) index balanced accuracy. .. GENERATED FROM PYTHON SOURCE LINES 11-15 .. code-block:: Python # Authors: Guillaume Lemaitre # License: MIT .. GENERATED FROM PYTHON SOURCE LINES 16-20 .. code-block:: Python print(__doc__) RANDOM_STATE = 42 .. GENERATED FROM PYTHON SOURCE LINES 21-22 First, we will generate some imbalanced dataset. .. GENERATED FROM PYTHON SOURCE LINES 24-39 .. code-block:: Python from sklearn.datasets import make_classification X, y = make_classification( n_classes=3, class_sep=2, weights=[0.1, 0.9], n_informative=10, n_redundant=1, flip_y=0, n_features=20, n_clusters_per_class=4, n_samples=5000, random_state=RANDOM_STATE, ) .. GENERATED FROM PYTHON SOURCE LINES 40-41 We will split the data into a training and testing set. .. GENERATED FROM PYTHON SOURCE LINES 43-49 .. code-block:: Python from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split( X, y, stratify=y, random_state=RANDOM_STATE ) .. GENERATED FROM PYTHON SOURCE LINES 50-53 We will create a pipeline made of a :class:`~imblearn.over_sampling.SMOTE` over-sampler followed by a :class:`~sklearn.linear_model.LogisticRegression` classifier. .. GENERATED FROM PYTHON SOURCE LINES 53-59 .. code-block:: Python from sklearn.linear_model import LogisticRegression from sklearn.preprocessing import StandardScaler from imblearn.over_sampling import SMOTE .. GENERATED FROM PYTHON SOURCE LINES 60-68 .. code-block:: Python from imblearn.pipeline import make_pipeline model = make_pipeline( StandardScaler(), SMOTE(random_state=RANDOM_STATE), LogisticRegression(max_iter=10_000, random_state=RANDOM_STATE), ) .. GENERATED FROM PYTHON SOURCE LINES 69-73 Now, we will train the model on the training set and get the prediction associated with the testing set. Be aware that the resampling will happen only when calling `fit`: the number of samples in `y_pred` is the same than in `y_test`. .. GENERATED FROM PYTHON SOURCE LINES 75-78 .. code-block:: Python model.fit(X_train, y_train) y_pred = model.predict(X_test) .. GENERATED FROM PYTHON SOURCE LINES 79-82 The geometric mean corresponds to the square root of the product of the sensitivity and specificity. Combining the two metrics should account for the balancing of the dataset. .. GENERATED FROM PYTHON SOURCE LINES 84-88 .. code-block:: Python from imblearn.metrics import geometric_mean_score print(f"The geometric mean is {geometric_mean_score(y_test, y_pred):.3f}") .. rst-class:: sphx-glr-script-out .. code-block:: none The geometric mean is 0.940 .. GENERATED FROM PYTHON SOURCE LINES 89-91 The index balanced accuracy can transform any metric to be used in imbalanced learning problems. .. GENERATED FROM PYTHON SOURCE LINES 93-103 .. code-block:: Python from imblearn.metrics import make_index_balanced_accuracy alpha = 0.1 geo_mean = make_index_balanced_accuracy(alpha=alpha, squared=True)(geometric_mean_score) print( f"The IBA using alpha={alpha} and the geometric mean: " f"{geo_mean(y_test, y_pred):.3f}" ) .. rst-class:: sphx-glr-script-out .. code-block:: none The IBA using alpha=0.1 and the geometric mean: 0.884 .. GENERATED FROM PYTHON SOURCE LINES 104-111 .. code-block:: Python alpha = 0.5 geo_mean = make_index_balanced_accuracy(alpha=alpha, squared=True)(geometric_mean_score) print( f"The IBA using alpha={alpha} and the geometric mean: " f"{geo_mean(y_test, y_pred):.3f}" ) .. rst-class:: sphx-glr-script-out .. code-block:: none The IBA using alpha=0.5 and the geometric mean: 0.884 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 2.652 seconds) **Estimated memory usage:** 10 MB .. _sphx_glr_download_auto_examples_evaluation_plot_metrics.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_metrics.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_metrics.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_