ValueDifferenceMetric#

class imblearn.metrics.pairwise.ValueDifferenceMetric(*, n_categories='auto', k=1, r=2)[source]#

Class implementing the Value Difference Metric.

This metric computes the distance between samples containing only categorical features. The distance between feature values of two samples is defined as:

\[\delta(x, y) = \sum_{c=1}^{C} |p(c|x_{f}) - p(c|y_{f})|^{k} \ ,\]

where \(x\) and \(y\) are two samples and \(f\) a given feature, \(C\) is the number of classes, \(p(c|x_{f})\) is the conditional probability that the output class is \(c\) given that the feature value \(f\) has the value \(x\) and \(k\) an exponent usually defined to 1 or 2.

The distance for the feature vectors \(X\) and \(Y\) is subsequently defined as:

\[\Delta(X, Y) = \sum_{f=1}^{F} \delta(X_{f}, Y_{f})^{r} \ ,\]

where \(F\) is the number of feature and \(r\) an exponent usually defined equal to 1 or 2.

The definition of this distance was propoed in [1].

See also

sklearn.neighbors.DistanceMetric: Interface for fast metric computation.

Notes

The input data X are expected to be encoded by an OrdinalEncoder and the data type is used should be np.int32. If other data types are given, X will be converted to np.int32.

References

[1]

Stanfill, Craig, and David Waltz. “Toward memory-based reasoning.” Communications of the ACM 29.12 (1986): 1213-1228.

Examples

>>> import numpy as np
>>> X = np.array(["green"] * 10 + ["red"] * 10 + ["blue"] * 10).reshape(-1, 1)
>>> y = [1] * 8 + [0] * 5 + [1] * 7 + [0] * 9 + [1]
>>> from sklearn.preprocessing import OrdinalEncoder
>>> encoder = OrdinalEncoder(dtype=np.int32)
>>> X_encoded = encoder.fit_transform(X)
>>> from imblearn.metrics.pairwise import ValueDifferenceMetric
>>> vdm = ValueDifferenceMetric().fit(X_encoded, y)
>>> pairwise_distance = vdm.pairwise(X_encoded)
>>> pairwise_distance.shape
(30, 30)
>>> X_test = np.array(["green", "red", "blue"]).reshape(-1, 1)
>>> X_test_encoded = encoder.transform(X_test)
>>> vdm.pairwise(X_test_encoded)
array([[0.  ,  0.04,  1.96],
       [0.04,  0.  ,  1.44],
       [1.96,  1.44,  0.  ]])

Methods

`fit`(X, y)	Compute the necessary statistics from the training set.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`pairwise`(X[, Y])	Compute the VDM distance pairwise.
`set_params`(**params)	Set the parameters of this estimator.

fit(X, y)[source]#

Compute the necessary statistics from the training set.

Parameters:

Xndarray of shape (n_samples, n_features), dtype=np.int32: The input data. The data are expected to be encoded with a OrdinalEncoder.
yndarray of shape (n_features,): The target.

Returns:

selfobject: Return the instance itself.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

pairwise(X, Y=None)[source]#

Compute the VDM distance pairwise.

Parameters:

Xndarray of shape (n_samples, n_features), dtype=np.int32: The input data. The data are expected to be encoded with a OrdinalEncoder.
Yndarray of shape (n_samples, n_features), dtype=np.int32: The input data. The data are expected to be encoded with a OrdinalEncoder.

Returns:

distance_matrixndarray of shape (n_samples, n_samples): The VDM pairwise distance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

ValueDifferenceMetric#

This Page