.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/model_selection/plot_validation_curve.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_model_selection_plot_validation_curve.py: ========================== Plotting Validation Curves ========================== In this example the impact of the :class:`~imblearn.over_sampling.SMOTE`'s `k_neighbors` parameter is examined. In the plot you can see the validation scores of a SMOTE-CART classifier for different values of the :class:`~imblearn.over_sampling.SMOTE`'s `k_neighbors` parameter. .. GENERATED FROM PYTHON SOURCE LINES 11-16 .. code-block:: Python # Authors: Christos Aridas # Guillaume Lemaitre # License: MIT .. GENERATED FROM PYTHON SOURCE LINES 17-26 .. code-block:: Python print(__doc__) import seaborn as sns sns.set_context("poster") RANDOM_STATE = 42 .. GENERATED FROM PYTHON SOURCE LINES 27-28 Let's first generate a dataset with imbalanced class distribution. .. GENERATED FROM PYTHON SOURCE LINES 30-45 .. code-block:: Python from sklearn.datasets import make_classification X, y = make_classification( n_classes=2, class_sep=2, weights=[0.1, 0.9], n_informative=10, n_redundant=1, flip_y=0, n_features=20, n_clusters_per_class=4, n_samples=5000, random_state=RANDOM_STATE, ) .. GENERATED FROM PYTHON SOURCE LINES 46-50 We will use an over-sampler :class:`~imblearn.over_sampling.SMOTE` followed by a :class:`~sklearn.tree.DecisionTreeClassifier`. The aim will be to search which `k_neighbors` parameter is the most adequate with the dataset that we generated. .. GENERATED FROM PYTHON SOURCE LINES 50-53 .. code-block:: Python from sklearn.tree import DecisionTreeClassifier .. GENERATED FROM PYTHON SOURCE LINES 54-61 .. code-block:: Python from imblearn.over_sampling import SMOTE from imblearn.pipeline import make_pipeline model = make_pipeline( SMOTE(random_state=RANDOM_STATE), DecisionTreeClassifier(random_state=RANDOM_STATE) ) .. GENERATED FROM PYTHON SOURCE LINES 62-66 We can use the :class:`~sklearn.model_selection.validation_curve` to inspect the impact of varying the parameter `k_neighbors`. In this case, we need to use a score to evaluate the generalization score during the cross-validation. .. GENERATED FROM PYTHON SOURCE LINES 68-83 .. code-block:: Python from sklearn.metrics import cohen_kappa_score, make_scorer from sklearn.model_selection import validation_curve scorer = make_scorer(cohen_kappa_score) param_range = range(1, 11) train_scores, test_scores = validation_curve( model, X, y, param_name="smote__k_neighbors", param_range=param_range, cv=3, scoring=scorer, ) .. GENERATED FROM PYTHON SOURCE LINES 84-89 .. code-block:: Python train_scores_mean = train_scores.mean(axis=1) train_scores_std = train_scores.std(axis=1) test_scores_mean = test_scores.mean(axis=1) test_scores_std = test_scores.std(axis=1) .. GENERATED FROM PYTHON SOURCE LINES 90-92 We can now plot the results of the cross-validation for the different parameter values that we tried. .. GENERATED FROM PYTHON SOURCE LINES 94-124 .. code-block:: Python import matplotlib.pyplot as plt fig, ax = plt.subplots(figsize=(7, 7)) ax.plot(param_range, test_scores_mean, label="SMOTE") ax.fill_between( param_range, test_scores_mean + test_scores_std, test_scores_mean - test_scores_std, alpha=0.2, ) idx_max = test_scores_mean.argmax() ax.scatter( param_range[idx_max], test_scores_mean[idx_max], label=r"Cohen Kappa: ${:.2f}\pm{:.2f}$".format( test_scores_mean[idx_max], test_scores_std[idx_max] ), ) fig.suptitle("Validation Curve with SMOTE-CART") ax.set_xlabel("Number of neighbors") ax.set_ylabel("Cohen's kappa") # make nice plotting sns.despine(ax=ax, offset=10) ax.set_xlim([1, 10]) ax.set_ylim([0.4, 0.8]) ax.legend(loc="lower right", fontsize=16) plt.tight_layout() plt.show() .. image-sg:: /auto_examples/model_selection/images/sphx_glr_plot_validation_curve_001.png :alt: Validation Curve with SMOTE-CART :srcset: /auto_examples/model_selection/images/sphx_glr_plot_validation_curve_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 6.770 seconds) **Estimated memory usage:** 10 MB .. _sphx_glr_download_auto_examples_model_selection_plot_validation_curve.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_validation_curve.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_validation_curve.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_