skmultilearn.adapt package

The skmultilearn.adapt module implements algorithm adaptation approaches to multi-label classification.

Algorithm adaptation methods for multi-label classification concentrate on adapting single-label classification algorithms to the multi-label case usually by changes in cost/decision functions.

Currently the following algorithm adaptation classification schemes are available in scikit-multilearn:

Classifier

Description

BRkNNaClassifier

a Binary Relevance kNN classifier that assigns a label if at least half of the neighbors are also classified with the label

BRkNNbClassifier

a Binary Relevance kNN classifier that assigns top m labels of neighbors with m - average number of labels assigned to neighbors

MLkNN

a multi-label adapted kNN classifier with bayesian prior corrections

MLARAM

a multi-Label Hierarchical ARAM Neural Network

MLTSVM

twin multi-Label Support Vector Machines

class skmultilearn.adapt.BRkNNaClassifier(k=10)

Bases: _BinaryRelevanceKNN

Binary Relevance multi-label classifier based on k-Nearest Neighbors method.

This version of the classifier assigns the labels that are assigned to at least half of the neighbors.

Parameters

kint

number of neighbours

Attributes

knn_an instance of sklearn.NearestNeighbors

the nearest neighbors single-label classifier used underneath

neighbors_array of arrays of int, shape = (n_samples, k)

k neighbors of each sample

confidences_matrix of int, shape = (n_samples, n_labels)

label assignment confidences

References

If you use this method please cite the relevant paper:

@inproceedings{EleftheriosSpyromitros2008,
   author = {Eleftherios Spyromitros, Grigorios Tsoumakas, Ioannis Vlahavas},
   booktitle = {Proc. 5th Hellenic Conference on Artificial Intelligence (SETN 2008)},
   title = {An Empirical Study of Lazy Multilabel Classification Algorithms},
   year = {2008},
   location = {Syros, Greece}
}

Examples

Here’s a very simple example of using BRkNNaClassifier with a fixed number of neighbors:

from skmultilearn.adapt import BRkNNaClassifier

classifier = BRkNNaClassifier(k=3)

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

You can also use GridSearchCV to find an optimal set of parameters:

from skmultilearn.adapt import BRkNNaClassifier
from sklearn.model_selection import GridSearchCV

parameters = {'k': range(1,3)}
score = 'f1_macro'

clf = GridSearchCV(BRkNNaClassifier(), parameters, scoring=score)
clf.fit(X, y)
set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BRkNNaClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class skmultilearn.adapt.BRkNNbClassifier(k=10)

Bases: _BinaryRelevanceKNN

Binary Relevance multi-label classifier based on k-Nearest Neighbors method.

This version of the classifier assigns the most popular m labels of the neighbors, where m is the average number of labels assigned to the object’s neighbors.

Parameters

kint

number of neighbours

Attributes

knn_an instance of sklearn.NearestNeighbors

the nearest neighbors single-label classifier used underneath

neighbors_array of arrays of int, shape = (n_samples, k)

k neighbors of each sample

confidences_matrix of int, shape = (n_samples, n_labels)

label assignment confidences

References

If you use this method please cite the relevant paper:

@inproceedings{EleftheriosSpyromitros2008,
   author = {Eleftherios Spyromitros, Grigorios Tsoumakas, Ioannis Vlahavas},
   booktitle = {Proc. 5th Hellenic Conference on Artificial Intelligence (SETN 2008)},
   title = {An Empirical Study of Lazy Multilabel Classification Algorithms},
   year = {2008},
   location = {Syros, Greece}
}

Examples

Here’s a very simple example of using BRkNNbClassifier with a fixed number of neighbors:

from skmultilearn.adapt import BRkNNbClassifier

classifier = BRkNNbClassifier(k=3)

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

You can also use GridSearchCV to find an optimal set of parameters:

from skmultilearn.adapt import BRkNNbClassifier
from sklearn.model_selection import GridSearchCV

parameters = {'k': range(1,3)}
score = 'f1-macro

clf = GridSearchCV(BRkNNbClassifier(), parameters, scoring=score)
clf.fit(X, y)
set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BRkNNbClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class skmultilearn.adapt.MLARAM(vigilance=0.9, threshold=0.02, neurons=None)

Bases: MLClassifierBase

HARAM: A Hierarchical ARAM Neural Network for Large-Scale Text Classification

This method aims at increasing the classification speed by adding an extra ART layer for clustering learned prototypes into large clusters. In this case the activation of all prototypes can be replaced by the activation of a small fraction of them, leading to a significant reduction of the classification time.

Parameters

vigilancefloat (default is 0.9)

parameter for adaptive resonance theory networks, controls how large a hyperbox can be, 1 it is small (no compression), 0 should assume all range. Normally set between 0.8 and 0.999, it is dataset dependent. It is responsible for the creation of the prototypes, therefore training of the network.

thresholdfloat (default is 0.02)

controls how many prototypes participate by the prediction, can be changed for the testing phase.

neuronslist

the neurons in the network

References

Published work available here.

@INPROCEEDINGS{7395756,
    author={F. Benites and E. Sapozhnikova},
    booktitle={2015 IEEE International Conference on Data Mining Workshop (ICDMW)},
    title={HARAM: A Hierarchical ARAM Neural Network for Large-Scale Text Classification},
    year={2015},
    volume={},
    number={},
    pages={847-854},
    doi={10.1109/ICDMW.2015.14},
    ISSN={2375-9259},
    month={Nov},
}

Examples

Here’s an example code with a 5% threshold and vigilance of 0.95:

from skmultilearn.adapt import MLARAM

classifier = MLARAM(threshold=0.05, vigilance=0.95)
classifier.fit(X_train, y_train)
prediction = classifier.predict(X_test)
fit(X, y)

Fit classifier with training data

Parameters

Xnumpy.ndarray or scipy.sparse

input features, can be a dense or sparse matrix of size (n_samples, n_features)

ynumpy.ndarray or scipy.sparse {0,1}

binary indicator matrix with label assignments.

Returns

skmultilearn.MLARAMfast.MLARAM

fitted instance of self

predict(X)

Predict labels for X

Parameters

Xnumpy.ndarray or scipy.sparse.csc_matrix

input features of shape (n_samples, n_features)

Returns

scipy.sparse of int

binary indicator matrix with label assignments with shape (n_samples, n_labels)

predict_proba(X)

Predict probabilities of label assignments for X

Parameters

Xnumpy.ndarray or scipy.sparse.csc_matrix

input features of shape (n_samples, n_features)

Returns

array of arrays of float

matrix with label assignment probabilities of shape (n_samples, n_labels)

reset()

Resets the labels and neurons

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MLARAM

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class skmultilearn.adapt.MLTSVM(c_k=0, sor_omega=1.0, threshold=1e-06, lambda_param=1.0, max_iteration=500)

Bases: MLClassifierBase

Twin multi-Label Support Vector Machines

Parameters

c_kint

the empirical risk penalty parameter that determines the trade-off between the loss terms

sor_omega: float (default is 1.0)

the smoothing parameter

thresholdint (default is 1e-6)

threshold above which a label should be assigned

lambda_paramfloat (default is 1.0)

the regularization parameter

max_iterationint (default is 500)

maximum number of iterations to use in successive overrelaxation

References

If you use this classifier please cite the original paper introducing the method:

@article{chen2016mltsvm,
  title={MLTSVM: a novel twin support vector machine to multi-label learning},
  author={Chen, Wei-Jie and Shao, Yuan-Hai and Li, Chun-Na and Deng, Nai-Yang},
  journal={Pattern Recognition},
  volume={52},
  pages={61--74},
  year={2016},
  publisher={Elsevier}
}

Examples

Here’s a very simple example of using MLTSVM with a fixed number of neighbors:

from skmultilearn.adapt import MLTSVM

classifier = MLTSVM(c_k = 2**-1)

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

You can also use GridSearchCV to find an optimal set of parameters:

from skmultilearn.adapt import MLTSVM
from sklearn.model_selection import GridSearchCV

parameters = {'c_k': [2**i for i in range(-5, 5, 2)]}
score = 'f1-macro

clf = GridSearchCV(MLTSVM(), parameters, scoring=score)
clf.fit(X, y)

print (clf.best_params_, clf.best_score_)

# output
{'c_k': 0.03125} 0.347518217573
fit(X, Y)

Abstract method to fit classifier with training data

It must return a fitted instance of self.

Parameters

Xnumpy.ndarray or scipy.sparse

input features, can be a dense or sparse matrix of size (n_samples, n_features)

ynumpy.ndaarray or scipy.sparse {0,1}

binary indicator matrix with label assignments.

Returns

object

fitted instance of self

Raises

NotImplementedError

this is just an abstract method

predict(X)

Abstract method to predict labels

Parameters

Xnumpy.ndarray or scipy.sparse.csc_matrix

input features of shape (n_samples, n_features)

Returns

scipy.sparse of int

binary indicator matrix with label assignments with shape (n_samples, n_labels)

Raises

NotImplementedError

this is just an abstract method

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MLTSVM

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class skmultilearn.adapt.MLkNN(k=10, s=1.0, ignore_first_neighbours=0, n_jobs=None)

Bases: MLClassifierBase

kNN classification method adapted for multi-label classification

MLkNN builds uses k-NearestNeighbors find nearest examples to a test class and uses Bayesian inference to select assigned labels.

Parameters

kint

number of neighbours of each input instance to take into account

s: float (default is 1.0)

the smoothing parameter

ignore_first_neighboursint (default is 0)

ability to ignore first N neighbours, useful for comparing with other classification software.

n_jobs: int or None, optional (default=None)

The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

Attributes

knn_an instance of sklearn.NearestNeighbors

the nearest neighbors single-label classifier used underneath

Note

If you don’t know what ignore_first_neighbours does, the default is safe. Please see this issue.

References

If you use this classifier please cite the original paper introducing the method:

@article{zhang2007ml,
  title={ML-KNN: A lazy learning approach to multi-label learning},
  author={Zhang, Min-Ling and Zhou, Zhi-Hua},
  journal={Pattern recognition},
  volume={40},
  number={7},
  pages={2038--2048},
  year={2007},
  publisher={Elsevier}
}

Examples

Here’s a very simple example of using MLkNN with a fixed number of neighbors:

from skmultilearn.adapt import MLkNN

classifier = MLkNN(k=3)

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

You can also use GridSearchCV to find an optimal set of parameters:

from skmultilearn.adapt import MLkNN
from sklearn.model_selection import GridSearchCV

parameters = {'k': range(1,3), 's': [0.5, 0.7, 1.0]}
score = 'f1_macro'

clf = GridSearchCV(MLkNN(), parameters, scoring=score)
clf.fit(X, y)

print (clf.best_params_, clf.best_score_)

# output
({'k': 1, 's': 0.5}, 0.78988303374297597)
fit(X, y)

Fit classifier with training data

Parameters

Xnumpy.ndarray or scipy.sparse

input features, can be a dense or sparse matrix of size (n_samples, n_features)

ynumpy.ndaarray or scipy.sparse {0,1}

binary indicator matrix with label assignments.

Returns

self

fitted instance of self

predict(X)

Predict labels for X

Parameters

Xnumpy.ndarray or scipy.sparse.csc_matrix

input features of shape (n_samples, n_features)

Returns

scipy.sparse matrix of int

binary indicator matrix with label assignments with shape (n_samples, n_labels)

predict_proba(X)

Predict probabilities of label assignments for X

Parameters

Xnumpy.ndarray or scipy.sparse.csc_matrix

input features of shape (n_samples, n_features)

Returns

scipy.sparse matrix of int

binary indicator matrix with label assignment probabilities with shape (n_samples, n_labels)

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MLkNN

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.


Cite us

If you use scikit-multilearn-ng in your research and publish it, please consider citing scikit-multilearn:

@ARTICLE{2017arXiv170201460S,
    author = {{Szyma{'n}ski}, P. and {Kajdanowicz}, T.},
    title = "{A scikit-based Python environment for performing multi-label classification}",
    journal = {ArXiv e-prints},
    archivePrefix = "arXiv",
    eprint = {1702.01460},
    primaryClass = "cs.LG",
    keywords = {Computer Science - Learning, Computer Science - Mathematical Software},
    year = 2017,
    month = feb,
}