Note
Go to the end to download the full example code.
Usage of pipeline embedding samplers#
An example of the :class:~imblearn.pipeline.Pipeline` object (or
make_pipeline
helper function) working with
transformers and resamplers.
# Authors: Christos Aridas
# Guillaume Lemaitre <g.lemaitre58@gmail.com>
# License: MIT
print(__doc__)
Let’s first create an imbalanced dataset and split in to two sets.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(
n_classes=2,
class_sep=1.25,
weights=[0.3, 0.7],
n_informative=3,
n_redundant=1,
flip_y=0,
n_features=5,
n_clusters_per_class=1,
n_samples=5000,
random_state=10,
)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42)
Now, we will create each individual steps that we would like later to combine
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import EditedNearestNeighbours
pca = PCA(n_components=2)
enn = EditedNearestNeighbours()
smote = SMOTE(random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
Now, we can finally create a pipeline to specify in which order the different transformers and samplers should be executed before to provide the data to the final classifier.
from imblearn.pipeline import make_pipeline
model = make_pipeline(pca, enn, smote, knn)
We can now use the pipeline created as a normal classifier where resampling
will happen when calling fit
and disabled when calling decision_function
,
predict_proba
, or predict
.
precision recall f1-score support
0 0.99 0.99 0.99 375
1 1.00 1.00 1.00 875
accuracy 0.99 1250
macro avg 0.99 0.99 0.99 1250
weighted avg 0.99 0.99 0.99 1250
Total running time of the script: (0 minutes 1.485 seconds)
Estimated memory usage: 199 MB