TomekLinks#

class imblearn.under_sampling.TomekLinks(*, sampling_strategy='auto', n_jobs=None)[source]#

Under-sampling by removing Tomek’s links.

See also

EditedNearestNeighbours: Undersample by samples edition.
CondensedNearestNeighbour: Undersample by samples condensation.
RandomUnderSampler: Randomly under-sample the dataset.

Notes

This method is based on [1].

Supports multi-class resampling. A one-vs.-rest scheme is used as originally proposed in [1].

References

[1] (1,2)

I. Tomek, “Two modifications of CNN,” In Systems, Man, and Cybernetics, IEEE Transactions on, vol. 6, pp 769-772, 1976.

Examples

>>> from collections import Counter
>>> from sklearn.datasets import make_classification
>>> from imblearn.under_sampling import TomekLinks
>>> X, y = make_classification(n_classes=2, class_sep=2,
... weights=[0.1, 0.9], n_informative=3, n_redundant=1, flip_y=0,
... n_features=20, n_clusters_per_class=1, n_samples=1000, random_state=10)
>>> print('Original dataset shape %s' % Counter(y))
Original dataset shape Counter({1: 900, 0: 100})
>>> tl = TomekLinks()
>>> X_res, y_res = tl.fit_resample(X, y)
>>> print('Resampled dataset shape %s' % Counter(y_res))
Resampled dataset shape Counter({1: 897, 0: 100})

Methods

`fit`(X, y, **params)	Check inputs and statistics of the sampler.
`fit_resample`(X, y, **params)	Resample the dataset.
`get_feature_names_out`([input_features])	Get output feature names for transformation.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`is_tomek`(y, nn_index, class_type)	Detect if samples are Tomek's link.
`set_params`(**params)	Set the parameters of this estimator.

fit(X, y, **params)[source]#

Check inputs and statistics of the sampler.

You should use fit_resample in all cases.

Parameters:

X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features): Data array.
yarray-like of shape (n_samples,): Target array.

Returns:

selfobject: Return the instance itself.

fit_resample(X, y, **params)[source]#

Resample the dataset.

Parameters:

X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features): Matrix containing the data which have to be sampled.
yarray-like of shape (n_samples,): Corresponding label for each sample in X.

Returns:

X_resampled{array-like, dataframe, sparse matrix} of shape (n_samples_new, n_features): The array containing the resampled data.
y_resampledarray-like of shape (n_samples_new,): The corresponding label of X_resampled.

get_feature_names_out(input_features=None)[source]#

Get output feature names for transformation.

Parameters:

input_featuresarray-like of str or None, default=None

Input features.

If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: ["x0", "x1", ..., "x(n_features_in_ - 1)"].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

Returns:

feature_names_outndarray of str objects: Same as input features.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

static is_tomek(y, nn_index, class_type)[source]#

Detect if samples are Tomek’s link.

More precisely, it uses the target vector and the first neighbour of every sample point and looks for Tomek pairs. Returning a boolean vector with True for majority Tomek links.

Parameters:

yndarray of shape (n_samples,): Target vector of the data set, necessary to keep track of whether a sample belongs to minority or not.
nn_indexndarray of shape (len(y),): The index of the closes nearest neighbour to a sample point.
class_typeint or str: The label of the minority class.

Returns:

is_tomekndarray of shape (len(y), ): Boolean vector on len( # samples ), with True for majority samples that are Tomek links.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

Examples using `imblearn.under_sampling.TomekLinks`#

How to use sampling_strategy in imbalanced-learn

Illustration of the definition of a Tomek link

Compare under-sampling samplers

TomekLinks#

Examples using imblearn.under_sampling.TomekLinks#

This Page

Examples using `imblearn.under_sampling.TomekLinks`#