ValueDifferenceMetric#
- class imblearn.metrics.pairwise.ValueDifferenceMetric(*, n_categories='auto', k=1, r=2)[source]#
- Class implementing the Value Difference Metric. - This metric computes the distance between samples containing only categorical features. The distance between feature values of two samples is defined as: \[\delta(x, y) = \sum_{c=1}^{C} |p(c|x_{f}) - p(c|y_{f})|^{k} \ ,\]- where \(x\) and \(y\) are two samples and \(f\) a given feature, \(C\) is the number of classes, \(p(c|x_{f})\) is the conditional probability that the output class is \(c\) given that the feature value \(f\) has the value \(x\) and \(k\) an exponent usually defined to 1 or 2. - The distance for the feature vectors \(X\) and \(Y\) is subsequently defined as: \[\Delta(X, Y) = \sum_{f=1}^{F} \delta(X_{f}, Y_{f})^{r} \ ,\]- where \(F\) is the number of feature and \(r\) an exponent usually defined equal to 1 or 2. - The definition of this distance was propoed in [1]. - Read more in the User Guide. - Added in version 0.8. - Parameters:
- n_categories“auto” or array-like of shape (n_features,), default=”auto”
- The number of unique categories per features. If - "auto", the number of categories will be computed from- Xat- fit. Otherwise, you can provide an array-like of such counts to avoid computation. You can use the fitted attribute- categories_of the- OrdinalEncoderto deduce these counts.
- kint, default=1
- Exponent used to compute the distance between feature value. 
- rint, default=2
- Exponent used to compute the distance between the feature vector. 
 
- Attributes:
- n_categories_ndarray of shape (n_features,)
- The number of categories per features. 
- proba_per_class_list of ndarray of shape (n_categories, n_classes)
- List of length - n_featurescontaining the conditional probabilities for each category given a class.
- n_features_in_int
- Number of features in the input dataset. - Added in version 0.10. 
- feature_names_in_ndarray of shape (n_features_in_,)
- Names of features seen during - fit. Defined only when- Xhas feature names that are all strings.- Added in version 0.10. 
 
 - See also - sklearn.neighbors.DistanceMetric
- Interface for fast metric computation. 
 - Notes - The input data - Xare expected to be encoded by an- OrdinalEncoderand the data type is used should be- np.int32. If other data types are given,- Xwill be converted to- np.int32.- References [1]- Stanfill, Craig, and David Waltz. “Toward memory-based reasoning.” Communications of the ACM 29.12 (1986): 1213-1228. - Examples - >>> import numpy as np >>> X = np.array(["green"] * 10 + ["red"] * 10 + ["blue"] * 10).reshape(-1, 1) >>> y = [1] * 8 + [0] * 5 + [1] * 7 + [0] * 9 + [1] >>> from sklearn.preprocessing import OrdinalEncoder >>> encoder = OrdinalEncoder(dtype=np.int32) >>> X_encoded = encoder.fit_transform(X) >>> from imblearn.metrics.pairwise import ValueDifferenceMetric >>> vdm = ValueDifferenceMetric().fit(X_encoded, y) >>> pairwise_distance = vdm.pairwise(X_encoded) >>> pairwise_distance.shape (30, 30) >>> X_test = np.array(["green", "red", "blue"]).reshape(-1, 1) >>> X_test_encoded = encoder.transform(X_test) >>> vdm.pairwise(X_test_encoded) array([[0. , 0.04, 1.96], [0.04, 0. , 1.44], [1.96, 1.44, 0. ]]) - Methods - fit(X, y)- Compute the necessary statistics from the training set. - Get metadata routing of this object. - get_params([deep])- Get parameters for this estimator. - pairwise(X[, Y])- Compute the VDM distance pairwise. - set_params(**params)- Set the parameters of this estimator. - fit(X, y)[source]#
- Compute the necessary statistics from the training set. - Parameters:
- Xndarray of shape (n_samples, n_features), dtype=np.int32
- The input data. The data are expected to be encoded with a - OrdinalEncoder.
- yndarray of shape (n_features,)
- The target. 
 
- Returns:
- selfobject
- Return the instance itself. 
 
 
 - get_metadata_routing()[source]#
- Get metadata routing of this object. - Please check User Guide on how the routing mechanism works. - Returns:
- routingMetadataRequest
- A - MetadataRequestencapsulating routing information.
 
 
 - get_params(deep=True)[source]#
- Get parameters for this estimator. - Parameters:
- deepbool, default=True
- If True, will return the parameters for this estimator and contained subobjects that are estimators. 
 
- Returns:
- paramsdict
- Parameter names mapped to their values. 
 
 
 - pairwise(X, Y=None)[source]#
- Compute the VDM distance pairwise. - Parameters:
- Xndarray of shape (n_samples, n_features), dtype=np.int32
- The input data. The data are expected to be encoded with a - OrdinalEncoder.
- Yndarray of shape (n_samples, n_features), dtype=np.int32
- The input data. The data are expected to be encoded with a - OrdinalEncoder.
 
- Returns:
- distance_matrixndarray of shape (n_samples, n_samples)
- The VDM pairwise distance. 
 
 
 - set_params(**params)[source]#
- Set the parameters of this estimator. - The method works on simple estimators as well as on nested objects (such as - Pipeline). The latter have parameters of the form- <component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
- Estimator parameters. 
 
- Returns:
- selfestimator instance
- Estimator instance. 
 
 
 
 
    