imblearn.under_sampling
.CondensedNearestNeighbour¶

class
imblearn.under_sampling.
CondensedNearestNeighbour
(*, sampling_strategy='auto', random_state=None, n_neighbors=None, n_seeds_S=1, n_jobs=None)[source]¶ Undersample based on the condensed nearest neighbour method.
 Parameters
 sampling_strategystr, list or callable
Sampling information to sample the data set.
When
str
, specify the class targeted by the resampling. Note the the number of samples will not be equal in each. Possible choices are:'majority'
: resample only the majority class;'not minority'
: resample all classes but the minority class;'not majority'
: resample all classes but the majority class;'all'
: resample all classes;'auto'
: equivalent to'not minority'
.When
list
, the list contains the classes targeted by the resampling.When callable, function taking
y
and returns adict
. The keys correspond to the targeted classes. The values correspond to the desired number of samples for each class.
 random_stateint, RandomState instance, default=None
Control the randomization of the algorithm.
If int,
random_state
is the seed used by the random number generator;If
RandomState
instance, random_state is the random number generator;If
None
, the random number generator is theRandomState
instance used bynp.random
.
 n_neighborsint or object, default= KNeighborsClassifier(n_neighbors=1)
If
int
, size of the neighbourhood to consider to compute the nearest neighbors. If object, an estimator that inherits fromsklearn.neighbors.base.KNeighborsMixin
that will be used to find the nearestneighbors. n_seeds_Sint, default=1
Number of samples to extract in order to build the set S.
 n_jobsint, default=None
Number of CPU cores used during the crossvalidation loop.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.
Notes
The method is based on [1].
Supports multiclass resampling. A onevs.rest scheme is used when sampling a class as proposed in [1].
References
 1(1,2)
P. Hart, “The condensed nearest neighbor rule,” In Information Theory, IEEE Transactions on, vol. 14(3), pp. 515516, 1968.
Examples
>>> from collections import Counter >>> from sklearn.datasets import fetch_mldata >>> from imblearn.under_sampling import CondensedNearestNeighbour >>> pima = fetch_mldata('diabetes_scale') >>> X, y = pima['data'], pima['target'] >>> print('Original dataset shape %s' % Counter(y)) Original dataset shape Counter({1: 500, 1: 268}) >>> cnn = CondensedNearestNeighbour(random_state=42) >>> X_res, y_res = cnn.fit_resample(X, y) >>> print('Resampled dataset shape %s' % Counter(y_res)) Resampled dataset shape Counter({1: 268, 1: 227})
 Attributes
 sample_indices_ndarray of shape (n_new_samples)
Indices of the samples selected.
New in version 0.4.