skmultilearn.missing package

The skmultilearn.missing module provides classifiers and methods for dealing with missing labels in multi-label classification problems.

Currently the following algorithm adaptation classification schemes are available in scikit-multilearn:

Classifier

Description

SMiLE

Semi-supervised multi-label classification using incomplete label information.

class skmultilearn.missing.SMiLE(s=0.5, alpha=0.35, k=5)

Bases: object

SMiLE algorithm for multi label with missing labels (Semi-supervised multi-label classification using incomplete label information)

Parameters

sfloat, optional, default0.5

Smoothness parameter for class imbalance

alphafloat, optional, default0.35

Smoothness assumption parameter, ensures similar instances having similar predicted output. This parameter balances the importance of the two terms of the equation to optimize

kint, optional, default5

Neighbours parameter for clustering during the algorithm. It will indicate the number of clusters we want to create for the k nearest neighbor (kNN)

Attributes

Larray, [n_labels, n_labels]

Correlation matrix between labels

Warray, [n_samples, n_samples]

Weighted matrix created by kNN for instances

estimate_matrixarray-like (n_samples, n_labels)

Label estimation matrix y~ic = yiT * L(.,c) if yic == 0 y~ic = 1 otherwise

Harray-like (n_samples, n_samples)

Diagonal matrix indicating if an element of X is labeled or not

diagonal_lambdaarray-like (n_samples, n_samples)

Diagonal matrix having the sum of weights of the weighted matrix

Marray-like (n_samples, n_samples)

Graph laplacian matrix

Hcarray-like (n_samples, n_samples)

Hc = H - (H*1*1t*Ht)/(N)

Parray-like (n_features, n_labels)

P = (X*Hc*Xt + alpha*X*M*Xt)-1 * X*Hc*YPred R = dxc

barray-like (n_labels)

Label bias as the second item of the equation b = ((estimate_matrix - Pt*X)*H*1)/N

References

If used, please cite the scikit-multilearn library and the relevant paper:

@article{TAN2017192,
  title = {Semi-supervised multi-label classification using incomplete label information},
  author = {Qiaoyu Tan and Yanming Yu and Guoxian Yu and Jun Wang},
  journal = {Neurocomputing},
  volume = {260},
  pages = {192-202},
  year = {2017},
  issn = {0925-2312},
  doi = {https://doi.org/10.1016/j.neucom.2017.04.033},
  url = {https://www.sciencedirect.com/science/article/pii/S092523121730704X},
}

Examples

An example use case for SMiLE algorithm:

from skmultilearn.missing import SMiLE

# initialize SMiLE algorithm with parameters
classifier = SMiLE(s=0.6, alpha=0.4, k=8)

# train
classifier.fit(X,y)

# predict
prediction = classifier.predict(X)
fit(X, y)

Fits the model to training data

Parameters

Xarray-like or sparse matrix, shape=(n_samples, n_features)

Training instances.

yarray-like, shape=(n_samples, n_labels)

Training labels.

getParams()

Returns the parameters of this model

Returns

sfloat, optional, default0.5

Smoothness parameter for class imbalance

alphafloat, optional, default0.35

Smoothness assumption parameter, ensures similar instances having similar predicted output. This parameter balances the importance of the two terms of the equation to optimize

kint, optional, default5

Neighbours parameter for clustering during the algorithm. It will indicate the number of clusters we want to create for the k nearest neighbor (kNN)

predict(X)

Predicts using the model

Parameters

Xarray-like or sparse matrix, shape=(n_samples, n_features)

Test instances.

Returns

predictionsarray-like, shape=(n_labels, n_samples)

Label predictions for the test instances. (As if it was a regression problem range[0,1])

predictionsNormalizedarray-like, shape=(n_labels, n_samples)

Label predictions

setParams(s, alpha, k)

Sets the parameters of this model

Parameters

sfloat, optional, default0.5

Smoothness parameter for class imbalance

alphafloat, optional, default0.35

Smoothness assumption parameter, ensures similar instances having similar predicted output. This parameter balances the importance of the two terms of the equation to optimize

kint, optional, default5

Neighbours parameter for clustering during the algorithm. It will indicate the number of clusters we want to create for the k nearest neighbor (kNN)


Cite us

If you use scikit-multilearn-ng in your research and publish it, please consider citing scikit-multilearn:

@ARTICLE{2017arXiv170201460S,
    author = {{Szyma{'n}ski}, P. and {Kajdanowicz}, T.},
    title = "{A scikit-based Python environment for performing multi-label classification}",
    journal = {ArXiv e-prints},
    archivePrefix = "arXiv",
    eprint = {1702.01460},
    primaryClass = "cs.LG",
    keywords = {Computer Science - Learning, Computer Science - Mathematical Software},
    year = 2017,
    month = feb,
}