skmultilearn.missing package¶
The skmultilearn.missing module provides classifiers and methods for dealing with missing labels in multi-label classification problems.
Currently the following algorithm adaptation classification schemes are available in scikit-multilearn:
Classifier |
Description |
---|---|
Semi-supervised multi-label classification using incomplete label information. |
- class skmultilearn.missing.SMiLE(s=0.5, alpha=0.35, k=5)¶
Bases:
object
SMiLE algorithm for multi label with missing labels (Semi-supervised multi-label classification using incomplete label information)
Parameters¶
- sfloat, optional, default0.5
Smoothness parameter for class imbalance
- alphafloat, optional, default0.35
Smoothness assumption parameter, ensures similar instances having similar predicted output. This parameter balances the importance of the two terms of the equation to optimize
- kint, optional, default5
Neighbours parameter for clustering during the algorithm. It will indicate the number of clusters we want to create for the k nearest neighbor (kNN)
Attributes¶
- Larray, [n_labels, n_labels]
Correlation matrix between labels
- Warray, [n_samples, n_samples]
Weighted matrix created by kNN for instances
- estimate_matrixarray-like (n_samples, n_labels)
Label estimation matrix y~ic = yiT * L(.,c) if yic == 0 y~ic = 1 otherwise
- Harray-like (n_samples, n_samples)
Diagonal matrix indicating if an element of X is labeled or not
- diagonal_lambdaarray-like (n_samples, n_samples)
Diagonal matrix having the sum of weights of the weighted matrix
- Marray-like (n_samples, n_samples)
Graph laplacian matrix
- Hcarray-like (n_samples, n_samples)
Hc = H - (H*1*1t*Ht)/(N)
- Parray-like (n_features, n_labels)
P = (X*Hc*Xt + alpha*X*M*Xt)-1 * X*Hc*YPred R = dxc
- barray-like (n_labels)
Label bias as the second item of the equation b = ((estimate_matrix - Pt*X)*H*1)/N
References¶
If used, please cite the scikit-multilearn library and the relevant paper:
@article{TAN2017192, title = {Semi-supervised multi-label classification using incomplete label information}, author = {Qiaoyu Tan and Yanming Yu and Guoxian Yu and Jun Wang}, journal = {Neurocomputing}, volume = {260}, pages = {192-202}, year = {2017}, issn = {0925-2312}, doi = {https://doi.org/10.1016/j.neucom.2017.04.033}, url = {https://www.sciencedirect.com/science/article/pii/S092523121730704X}, }
Examples¶
An example use case for SMiLE algorithm:
from skmultilearn.missing import SMiLE # initialize SMiLE algorithm with parameters classifier = SMiLE(s=0.6, alpha=0.4, k=8) # train classifier.fit(X,y) # predict prediction = classifier.predict(X)
- fit(X, y)¶
Fits the model to training data
Parameters¶
- Xarray-like or sparse matrix, shape=(n_samples, n_features)
Training instances.
- yarray-like, shape=(n_samples, n_labels)
Training labels.
- getParams()¶
Returns the parameters of this model
Returns¶
- sfloat, optional, default0.5
Smoothness parameter for class imbalance
- alphafloat, optional, default0.35
Smoothness assumption parameter, ensures similar instances having similar predicted output. This parameter balances the importance of the two terms of the equation to optimize
- kint, optional, default5
Neighbours parameter for clustering during the algorithm. It will indicate the number of clusters we want to create for the k nearest neighbor (kNN)
- predict(X)¶
Predicts using the model
Parameters¶
- Xarray-like or sparse matrix, shape=(n_samples, n_features)
Test instances.
Returns¶
- predictionsarray-like, shape=(n_labels, n_samples)
Label predictions for the test instances. (As if it was a regression problem range[0,1])
- predictionsNormalizedarray-like, shape=(n_labels, n_samples)
Label predictions
- setParams(s, alpha, k)¶
Sets the parameters of this model
Parameters¶
- sfloat, optional, default0.5
Smoothness parameter for class imbalance
- alphafloat, optional, default0.35
Smoothness assumption parameter, ensures similar instances having similar predicted output. This parameter balances the importance of the two terms of the equation to optimize
- kint, optional, default5
Neighbours parameter for clustering during the algorithm. It will indicate the number of clusters we want to create for the k nearest neighbor (kNN)
Cite us
If you use scikit-multilearn-ng in your research and publish it, please consider citing scikit-multilearn:
@ARTICLE{2017arXiv170201460S,
author = {{Szyma{'n}ski}, P. and {Kajdanowicz}, T.},
title = "{A scikit-based Python environment for performing multi-label classification}",
journal = {ArXiv e-prints},
archivePrefix = "arXiv",
eprint = {1702.01460},
primaryClass = "cs.LG",
keywords = {Computer Science - Learning, Computer Science - Mathematical Software},
year = 2017,
month = feb,
}