=======================================
Metrics specific to imbalanced learning
=======================================

Specific metrics have been developed to evaluate classifier which
has been trained using imbalanced data. :mod:`imblearn` provides mainly two
additional metrics which are not implemented in :mod:`sklearn`: (i) geometric
mean and (ii) index balanced accuracy.

.. code-block:: Python

    # Authors: Guillaume Lemaitre
    # License: MIT

.. code-block:: Python

    print(__doc__)

    RANDOM_STATE = 42

First, we will generate some imbalanced dataset.

.. code-block:: Python

    from sklearn.datasets import make_classification

    X, y = make_classification(
        n_classes=3,
        class_sep=2,
        weights=[0.1, 0.9],
        n_informative=10,
        n_redundant=1,
        flip_y=0,
        n_features=20,
        n_clusters_per_class=4,
        n_samples=5000,
        random_state=RANDOM_STATE,
    )

We will split the data into a training and testing set.

.. code-block:: Python

    from sklearn.model_selection import train_test_split

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, stratify=y, random_state=RANDOM_STATE
    )

We will create a pipeline made of a :class:`~imblearn.over_sampling.SMOTE`
over-sampler followed by a :class:`~sklearn.linear_model.LogisticRegression`
classifier.

.. code-block:: Python

    from sklearn.linear_model import LogisticRegression
    from sklearn.preprocessing import StandardScaler

    from imblearn.over_sampling import SMOTE

.. code-block:: Python

    from imblearn.pipeline import make_pipeline

    model = make_pipeline(
        StandardScaler(),
        SMOTE(random_state=RANDOM_STATE),
        LogisticRegression(max_iter=10_000, random_state=RANDOM_STATE),
    )

Now, we will train the model on the training set and get the prediction
associated with the testing set. Be aware that the resampling will happen
only when calling `fit`: the number of samples in `y_pred` is the same than
in `y_test`.

.. code-block:: Python

    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

The geometric mean corresponds to the square root of the product of the
sensitivity and specificity. Combining the two metrics should account for
the balancing of the dataset.

.. code-block:: Python

    from imblearn.metrics import geometric_mean_score

    print(f"The geometric mean is {geometric_mean_score(y_test, y_pred):.3f}")

.. code-block:: none

    The geometric mean is 0.940

The index balanced accuracy can transform any metric to be used in
imbalanced learning problems.

.. code-block:: Python

    from imblearn.metrics import make_index_balanced_accuracy

    alpha = 0.1
    geo_mean = make_index_balanced_accuracy(alpha=alpha, squared=True)(geometric_mean_score)

    print(
        f"The IBA using alpha={alpha} and the geometric mean: "
        f"{geo_mean(y_test, y_pred):.3f}"
    )

.. code-block:: none

    The IBA using alpha=0.1 and the geometric mean: 0.884

.. code-block:: Python

    alpha = 0.5
    geo_mean = make_index_balanced_accuracy(alpha=alpha, squared=True)(geometric_mean_score)

    print(
        f"The IBA using alpha={alpha} and the geometric mean: "
        f"{geo_mean(y_test, y_pred):.3f}"
    )

.. code-block:: none

    The IBA using alpha=0.5 and the geometric mean: 0.884