imblearn.utils.check_sampling_strategy

imblearn.utils.check_sampling_strategy(sampling_strategy, y, sampling_type, **kwargs)[source][source]

Sampling target validation for samplers.

Checks that sampling_strategy is of consistent type and return a dictionary containing each targeted class with its corresponding number of sample. It is used in imblearn.base.BaseSampler.

Parameters:
sampling_strategy : float, str, dict, list or callable,

Sampling information to sample the data set.

  • When float:

    For under-sampling methods, it corresponds to the ratio \alpha_{us} defined by N_{rM} = \alpha_{us}
\times N_{m} where N_{rM} and N_{m} are the number of samples in the majority class after resampling and the number of samples in the minority class, respectively;

    For over-sampling methods, it correspond to the ratio \alpha_{os} defined by N_{rm} = \alpha_{os}
\times N_{m} where N_{rm} and N_{M} are the number of samples in the minority class after resampling and the number of samples in the majority class, respectively.

    Warning

    float is only available for binary classification. An error is raised for multi-class classification and with cleaning samplers.

  • When str, specify the class targeted by the resampling. For under- and over-sampling methods, the number of samples in the different classes will be equalized. For cleaning methods, the number of samples will not be equal. Possible choices are:

    'minority': resample only the minority class;

    'majority': resample only the majority class;

    'not minority': resample all classes but the minority class;

    'not majority': resample all classes but the majority class;

    'all': resample all classes;

    'auto': for under-sampling methods, equivalent to 'not minority' and for over-sampling methods, equivalent to 'not majority'.

  • When dict, the keys correspond to the targeted classes. The values correspond to the desired number of samples for each targeted class.

    Warning

    dict is available for both under- and over-sampling methods. An error is raised with cleaning methods. Use a list instead.

  • When list, the list contains the targeted classes. It used only for cleaning methods.

    Warning

    list is available for cleaning methods. An error is raised with under- and over-sampling methods.

  • When callable, function taking y and returns a dict. The keys correspond to the targeted classes. The values correspond to the desired number of samples for each class.

y : ndarray, shape (n_samples,)

The target array.

sampling_type : str,

The type of sampling. Can be either 'over-sampling', 'under-sampling', or 'clean-sampling'.

kwargs : dict, optional

Dictionary of additional keyword arguments to pass to sampling_strategy when this is a callable.

Returns:
sampling_strategy_converted : dict,

The converted and validated sampling target. Returns a dictionary with the key being the class target and the value being the desired number of samples.