.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/under-sampling/plot_illustration_tomek_links.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_under-sampling_plot_illustration_tomek_links.py: ============================================== Illustration of the definition of a Tomek link ============================================== This example illustrates what is a Tomek link. .. GENERATED FROM PYTHON SOURCE LINES 8-12 .. code-block:: Python # Authors: Guillaume Lemaitre # License: MIT .. GENERATED FROM PYTHON SOURCE LINES 13-20 .. code-block:: Python print(__doc__) import matplotlib.pyplot as plt import seaborn as sns sns.set_context("poster") .. GENERATED FROM PYTHON SOURCE LINES 21-22 This function allows to make nice plotting .. GENERATED FROM PYTHON SOURCE LINES 24-35 .. code-block:: Python def make_plot_despine(ax): sns.despine(ax=ax, offset=10) ax.set_xlim([0, 3]) ax.set_ylim([0, 3]) ax.set_xlabel(r"$X_1$") ax.set_ylabel(r"$X_2$") ax.legend(loc="lower right") .. GENERATED FROM PYTHON SOURCE LINES 36-38 We will generate some toy data that illustrates how :class:`~imblearn.under_sampling.TomekLinks` is used to clean a dataset. .. GENERATED FROM PYTHON SOURCE LINES 40-54 .. code-block:: Python import numpy as np rng = np.random.RandomState(18) X_minority = np.transpose( [[1.1, 1.3, 1.15, 0.8, 0.55, 2.1], [1.0, 1.5, 1.7, 2.5, 0.55, 1.9]] ) X_majority = np.transpose( [ [2.1, 2.12, 2.13, 2.14, 2.2, 2.3, 2.5, 2.45], [1.5, 2.1, 2.7, 0.9, 1.0, 1.4, 2.4, 2.9], ] ) .. GENERATED FROM PYTHON SOURCE LINES 55-57 In the figure above, the samples highlighted in green form a Tomek link since they are of different classes and are nearest neighbors of each other. .. GENERATED FROM PYTHON SOURCE LINES 57-86 .. code-block:: Python fig, ax = plt.subplots(figsize=(8, 8)) ax.scatter( X_minority[:, 0], X_minority[:, 1], label="Minority class", s=200, marker="_", ) ax.scatter( X_majority[:, 0], X_majority[:, 1], label="Majority class", s=200, marker="+", ) # highlight the samples of interest ax.scatter( [X_minority[-1, 0], X_majority[1, 0]], [X_minority[-1, 1], X_majority[1, 1]], label="Tomek link", s=200, alpha=0.3, ) make_plot_despine(ax) fig.suptitle("Illustration of a Tomek link") fig.tight_layout() .. image-sg:: /auto_examples/under-sampling/images/sphx_glr_plot_illustration_tomek_links_001.png :alt: Illustration of a Tomek link :srcset: /auto_examples/under-sampling/images/sphx_glr_plot_illustration_tomek_links_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 87-91 We can run the :class:`~imblearn.under_sampling.TomekLinks` sampling to remove the corresponding samples. If `sampling_strategy='auto'` only the sample from the majority class will be removed. If `sampling_strategy='all'` both samples will be removed. .. GENERATED FROM PYTHON SOURCE LINES 93-136 .. code-block:: Python from imblearn.under_sampling import TomekLinks fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(16, 8)) samplers = { "Removing only majority samples": TomekLinks(sampling_strategy="auto"), "Removing all samples": TomekLinks(sampling_strategy="all"), } for ax, (title, sampler) in zip(axs, samplers.items()): X_res, y_res = sampler.fit_resample( np.vstack((X_minority, X_majority)), np.array([0] * X_minority.shape[0] + [1] * X_majority.shape[0]), ) ax.scatter( X_res[y_res == 0][:, 0], X_res[y_res == 0][:, 1], label="Minority class", s=200, marker="_", ) ax.scatter( X_res[y_res == 1][:, 0], X_res[y_res == 1][:, 1], label="Majority class", s=200, marker="+", ) # highlight the samples of interest ax.scatter( [X_minority[-1, 0], X_majority[1, 0]], [X_minority[-1, 1], X_majority[1, 1]], label="Tomek link", s=200, alpha=0.3, ) ax.set_title(title) make_plot_despine(ax) fig.tight_layout() plt.show() .. image-sg:: /auto_examples/under-sampling/images/sphx_glr_plot_illustration_tomek_links_002.png :alt: Removing only majority samples, Removing all samples :srcset: /auto_examples/under-sampling/images/sphx_glr_plot_illustration_tomek_links_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 2.356 seconds) **Estimated memory usage:** 10 MB .. _sphx_glr_download_auto_examples_under-sampling_plot_illustration_tomek_links.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_illustration_tomek_links.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_illustration_tomek_links.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_