.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/over-sampling/plot_shrinkage_effect.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_over-sampling_plot_shrinkage_effect.py: ====================================================== Effect of the shrinkage factor in random over-sampling ====================================================== This example shows the effect of the shrinkage factor used to generate the smoothed bootstrap using the :class:`~imblearn.over_sampling.RandomOverSampler`. .. GENERATED FROM PYTHON SOURCE LINES 10-14 .. code-block:: Python # Authors: Guillaume Lemaitre # License: MIT .. GENERATED FROM PYTHON SOURCE LINES 15-21 .. code-block:: Python print(__doc__) import seaborn as sns sns.set_context("poster") .. GENERATED FROM PYTHON SOURCE LINES 22-24 First, we will generate a toy classification dataset with only few samples. The ratio between the classes will be imbalanced. .. GENERATED FROM PYTHON SOURCE LINES 24-38 .. code-block:: Python from collections import Counter from sklearn.datasets import make_classification X, y = make_classification( n_samples=100, n_features=2, n_redundant=0, weights=[0.1, 0.9], random_state=0, ) Counter(y) .. rst-class:: sphx-glr-script-out .. code-block:: none Counter({1: 90, 0: 10}) .. GENERATED FROM PYTHON SOURCE LINES 39-49 .. code-block:: Python import matplotlib.pyplot as plt fig, ax = plt.subplots(figsize=(7, 7)) scatter = plt.scatter(X[:, 0], X[:, 1], c=y, alpha=0.4) class_legend = ax.legend(*scatter.legend_elements(), loc="lower left", title="Classes") ax.add_artist(class_legend) ax.set_xlabel("Feature #1") _ = ax.set_ylabel("Feature #2") plt.tight_layout() .. image-sg:: /auto_examples/over-sampling/images/sphx_glr_plot_shrinkage_effect_001.png :alt: plot shrinkage effect :srcset: /auto_examples/over-sampling/images/sphx_glr_plot_shrinkage_effect_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 50-53 Now, we will use a :class:`~imblearn.over_sampling.RandomOverSampler` to generate a bootstrap for the minority class with as many samples as in the majority class. .. GENERATED FROM PYTHON SOURCE LINES 53-59 .. code-block:: Python from imblearn.over_sampling import RandomOverSampler sampler = RandomOverSampler(random_state=0) X_res, y_res = sampler.fit_resample(X, y) Counter(y_res) .. rst-class:: sphx-glr-script-out .. code-block:: none Counter({1: 90, 0: 90}) .. GENERATED FROM PYTHON SOURCE LINES 60-67 .. code-block:: Python fig, ax = plt.subplots(figsize=(7, 7)) scatter = plt.scatter(X_res[:, 0], X_res[:, 1], c=y_res, alpha=0.4) class_legend = ax.legend(*scatter.legend_elements(), loc="lower left", title="Classes") ax.add_artist(class_legend) ax.set_xlabel("Feature #1") _ = ax.set_ylabel("Feature #2") plt.tight_layout() .. image-sg:: /auto_examples/over-sampling/images/sphx_glr_plot_shrinkage_effect_002.png :alt: plot shrinkage effect :srcset: /auto_examples/over-sampling/images/sphx_glr_plot_shrinkage_effect_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 68-74 We observe that the minority samples are less transparent than the samples from the majority class. Indeed, it is due to the fact that these samples of the minority class are repeated during the bootstrap generation. We can set `shrinkage` to a floating value to add a small perturbation to the samples created and therefore create a smoothed bootstrap. .. GENERATED FROM PYTHON SOURCE LINES 74-78 .. code-block:: Python sampler = RandomOverSampler(shrinkage=1, random_state=0) X_res, y_res = sampler.fit_resample(X, y) Counter(y_res) .. rst-class:: sphx-glr-script-out .. code-block:: none Counter({1: 90, 0: 90}) .. GENERATED FROM PYTHON SOURCE LINES 79-87 .. code-block:: Python fig, ax = plt.subplots(figsize=(7, 7)) scatter = plt.scatter(X_res[:, 0], X_res[:, 1], c=y_res, alpha=0.4) class_legend = ax.legend(*scatter.legend_elements(), loc="lower left", title="Classes") ax.add_artist(class_legend) ax.set_xlabel("Feature #1") _ = ax.set_ylabel("Feature #2") plt.tight_layout() .. image-sg:: /auto_examples/over-sampling/images/sphx_glr_plot_shrinkage_effect_003.png :alt: plot shrinkage effect :srcset: /auto_examples/over-sampling/images/sphx_glr_plot_shrinkage_effect_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 88-93 In this case, we see that the samples in the minority class are not overlapping anymore due to the added noise. The parameter `shrinkage` allows to add more or less perturbation. Let's add more perturbation when generating the smoothed bootstrap. .. GENERATED FROM PYTHON SOURCE LINES 93-97 .. code-block:: Python sampler = RandomOverSampler(shrinkage=3, random_state=0) X_res, y_res = sampler.fit_resample(X, y) Counter(y_res) .. rst-class:: sphx-glr-script-out .. code-block:: none Counter({1: 90, 0: 90}) .. GENERATED FROM PYTHON SOURCE LINES 98-106 .. code-block:: Python fig, ax = plt.subplots(figsize=(7, 7)) scatter = plt.scatter(X_res[:, 0], X_res[:, 1], c=y_res, alpha=0.4) class_legend = ax.legend(*scatter.legend_elements(), loc="lower left", title="Classes") ax.add_artist(class_legend) ax.set_xlabel("Feature #1") _ = ax.set_ylabel("Feature #2") plt.tight_layout() .. image-sg:: /auto_examples/over-sampling/images/sphx_glr_plot_shrinkage_effect_004.png :alt: plot shrinkage effect :srcset: /auto_examples/over-sampling/images/sphx_glr_plot_shrinkage_effect_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 107-109 Increasing the value of `shrinkage` will disperse the new samples. Forcing the shrinkage to 0 will be equivalent to generating a normal bootstrap. .. GENERATED FROM PYTHON SOURCE LINES 109-113 .. code-block:: Python sampler = RandomOverSampler(shrinkage=0, random_state=0) X_res, y_res = sampler.fit_resample(X, y) Counter(y_res) .. rst-class:: sphx-glr-script-out .. code-block:: none Counter({1: 90, 0: 90}) .. GENERATED FROM PYTHON SOURCE LINES 114-122 .. code-block:: Python fig, ax = plt.subplots(figsize=(7, 7)) scatter = plt.scatter(X_res[:, 0], X_res[:, 1], c=y_res, alpha=0.4) class_legend = ax.legend(*scatter.legend_elements(), loc="lower left", title="Classes") ax.add_artist(class_legend) ax.set_xlabel("Feature #1") _ = ax.set_ylabel("Feature #2") plt.tight_layout() .. image-sg:: /auto_examples/over-sampling/images/sphx_glr_plot_shrinkage_effect_005.png :alt: plot shrinkage effect :srcset: /auto_examples/over-sampling/images/sphx_glr_plot_shrinkage_effect_005.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 123-125 Therefore, the `shrinkage` is handy to manually tune the dispersion of the new samples. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 4.200 seconds) **Estimated memory usage:** 10 MB .. _sphx_glr_download_auto_examples_over-sampling_plot_shrinkage_effect.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_shrinkage_effect.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_shrinkage_effect.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_