skmultilearn.adapt package¶

The skmultilearn.adapt module implements algorithm adaptation approaches to multi-label classification.

Algorithm adaptation methods for multi-label classification concentrate on adapting single-label classification algorithms to the multi-label case usually by changes in cost/decision functions.

Currently the following algorithm adaptation classification schemes are available in scikit-multilearn:

Classifier	Description
`BRkNNaClassifier`	a Binary Relevance kNN classifier that assigns a label if at least half of the neighbors are also classified with the label
`BRkNNbClassifier`	a Binary Relevance kNN classifier that assigns top m labels of neighbors with m - average number of labels assigned to neighbors
`MLkNN`	a multi-label adapted kNN classifier with bayesian prior corrections
`MLARAM`	a multi-Label Hierarchical ARAM Neural Network
`MLTSVM`	twin multi-Label Support Vector Machines

class skmultilearn.adapt.BRkNNaClassifier(k=10)¶

Bases: _BinaryRelevanceKNN

Binary Relevance multi-label classifier based on k-Nearest Neighbors method.

This version of the classifier assigns the labels that are assigned to at least half of the neighbors.

Parameters¶

kint: number of neighbours

Attributes¶

knn_an instance of sklearn.NearestNeighbors: the nearest neighbors single-label classifier used underneath
neighbors_array of arrays of int, shape = (n_samples, k): k neighbors of each sample
confidences_matrix of int, shape = (n_samples, n_labels): label assignment confidences

References¶

If you use this method please cite the relevant paper:

@inproceedings{EleftheriosSpyromitros2008,
   author = {Eleftherios Spyromitros, Grigorios Tsoumakas, Ioannis Vlahavas},
   booktitle = {Proc. 5th Hellenic Conference on Artificial Intelligence (SETN 2008)},
   title = {An Empirical Study of Lazy Multilabel Classification Algorithms},
   year = {2008},
   location = {Syros, Greece}
}

Examples¶

Here’s a very simple example of using BRkNNaClassifier with a fixed number of neighbors:

from skmultilearn.adapt import BRkNNaClassifier

classifier = BRkNNaClassifier(k=3)

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

You can also use GridSearchCV to find an optimal set of parameters:

from skmultilearn.adapt import BRkNNaClassifier
from sklearn.model_selection import GridSearchCV

parameters = {'k': range(1,3)}
score = 'f1_macro'

clf = GridSearchCV(BRkNNaClassifier(), parameters, scoring=score)
clf.fit(X, y)

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → BRkNNaClassifier¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters¶

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns¶

selfobject: The updated object.

class skmultilearn.adapt.BRkNNbClassifier(k=10)¶

Bases: _BinaryRelevanceKNN

Binary Relevance multi-label classifier based on k-Nearest Neighbors method.

This version of the classifier assigns the most popular m labels of the neighbors, where m is the average number of labels assigned to the object’s neighbors.

Parameters¶

kint: number of neighbours

Attributes¶

knn_an instance of sklearn.NearestNeighbors: the nearest neighbors single-label classifier used underneath
neighbors_array of arrays of int, shape = (n_samples, k): k neighbors of each sample
confidences_matrix of int, shape = (n_samples, n_labels): label assignment confidences

References¶

If you use this method please cite the relevant paper:

@inproceedings{EleftheriosSpyromitros2008,
   author = {Eleftherios Spyromitros, Grigorios Tsoumakas, Ioannis Vlahavas},
   booktitle = {Proc. 5th Hellenic Conference on Artificial Intelligence (SETN 2008)},
   title = {An Empirical Study of Lazy Multilabel Classification Algorithms},
   year = {2008},
   location = {Syros, Greece}
}

Examples¶

Here’s a very simple example of using BRkNNbClassifier with a fixed number of neighbors:

from skmultilearn.adapt import BRkNNbClassifier

classifier = BRkNNbClassifier(k=3)

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

You can also use GridSearchCV to find an optimal set of parameters:

from skmultilearn.adapt import BRkNNbClassifier
from sklearn.model_selection import GridSearchCV

parameters = {'k': range(1,3)}
score = 'f1-macro

clf = GridSearchCV(BRkNNbClassifier(), parameters, scoring=score)
clf.fit(X, y)

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → BRkNNbClassifier¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters¶

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns¶

selfobject: The updated object.

class skmultilearn.adapt.MLARAM(vigilance=0.9, threshold=0.02, neurons=None)¶

Bases: MLClassifierBase

HARAM: A Hierarchical ARAM Neural Network for Large-Scale Text Classification

This method aims at increasing the classification speed by adding an extra ART layer for clustering learned prototypes into large clusters. In this case the activation of all prototypes can be replaced by the activation of a small fraction of them, leading to a significant reduction of the classification time.

Parameters¶

vigilancefloat (default is 0.9): parameter for adaptive resonance theory networks, controls how large a hyperbox can be, 1 it is small (no compression), 0 should assume all range. Normally set between 0.8 and 0.999, it is dataset dependent. It is responsible for the creation of the prototypes, therefore training of the network.
thresholdfloat (default is 0.02): controls how many prototypes participate by the prediction, can be changed for the testing phase.
neuronslist: the neurons in the network

References¶

Published work available here.

@INPROCEEDINGS{7395756,
    author={F. Benites and E. Sapozhnikova},
    booktitle={2015 IEEE International Conference on Data Mining Workshop (ICDMW)},
    title={HARAM: A Hierarchical ARAM Neural Network for Large-Scale Text Classification},
    year={2015},
    volume={},
    number={},
    pages={847-854},
    doi={10.1109/ICDMW.2015.14},
    ISSN={2375-9259},
    month={Nov},
}

Examples¶

Here’s an example code with a 5% threshold and vigilance of 0.95:

from skmultilearn.adapt import MLARAM

classifier = MLARAM(threshold=0.05, vigilance=0.95)
classifier.fit(X_train, y_train)
prediction = classifier.predict(X_test)

fit(X, y)¶

Fit classifier with training data

Parameters¶

Xnumpy.ndarray or scipy.sparse: input features, can be a dense or sparse matrix of size (n_samples, n_features)
ynumpy.ndarray or scipy.sparse {0,1}: binary indicator matrix with label assignments.

Returns¶

skmultilearn.MLARAMfast.MLARAM: fitted instance of self

predict(X)¶

Predict labels for X

Parameters¶

Xnumpy.ndarray or scipy.sparse.csc_matrix: input features of shape (n_samples, n_features)

Returns¶

scipy.sparse of int: binary indicator matrix with label assignments with shape (n_samples, n_labels)

predict_proba(X)¶

Predict probabilities of label assignments for X

Parameters¶

Xnumpy.ndarray or scipy.sparse.csc_matrix: input features of shape (n_samples, n_features)

Returns¶

array of arrays of float: matrix with label assignment probabilities of shape (n_samples, n_labels)

reset()¶: Resets the labels and neurons

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → MLARAM¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters¶

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns¶

selfobject: The updated object.

class skmultilearn.adapt.MLTSVM(c_k=0, sor_omega=1.0, threshold=1e-06, lambda_param=1.0, max_iteration=500)¶

Bases: MLClassifierBase

Twin multi-Label Support Vector Machines

Parameters¶

c_kint: the empirical risk penalty parameter that determines the trade-off between the loss terms
sor_omega: float (default is 1.0): the smoothing parameter
thresholdint (default is 1e-6): threshold above which a label should be assigned
lambda_paramfloat (default is 1.0): the regularization parameter
max_iterationint (default is 500): maximum number of iterations to use in successive overrelaxation

References¶

If you use this classifier please cite the original paper introducing the method:

@article{chen2016mltsvm,
  title={MLTSVM: a novel twin support vector machine to multi-label learning},
  author={Chen, Wei-Jie and Shao, Yuan-Hai and Li, Chun-Na and Deng, Nai-Yang},
  journal={Pattern Recognition},
  volume={52},
  pages={61--74},
  year={2016},
  publisher={Elsevier}
}

Examples¶

Here’s a very simple example of using MLTSVM with a fixed number of neighbors:

from skmultilearn.adapt import MLTSVM

classifier = MLTSVM(c_k = 2**-1)

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

You can also use GridSearchCV to find an optimal set of parameters:

from skmultilearn.adapt import MLTSVM
from sklearn.model_selection import GridSearchCV

parameters = {'c_k': [2**i for i in range(-5, 5, 2)]}
score = 'f1-macro

clf = GridSearchCV(MLTSVM(), parameters, scoring=score)
clf.fit(X, y)

print (clf.best_params_, clf.best_score_)

# output
{'c_k': 0.03125} 0.347518217573

fit(X, Y)¶

Abstract method to fit classifier with training data

It must return a fitted instance of self.

Parameters¶

Xnumpy.ndarray or scipy.sparse: input features, can be a dense or sparse matrix of size (n_samples, n_features)
ynumpy.ndaarray or scipy.sparse {0,1}: binary indicator matrix with label assignments.

Returns¶

object: fitted instance of self

Raises¶

NotImplementedError: this is just an abstract method

predict(X)¶

Abstract method to predict labels

Parameters¶

Xnumpy.ndarray or scipy.sparse.csc_matrix: input features of shape (n_samples, n_features)

Returns¶

scipy.sparse of int: binary indicator matrix with label assignments with shape (n_samples, n_labels)

Raises¶

NotImplementedError: this is just an abstract method

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → MLTSVM¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters¶

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns¶

selfobject: The updated object.

class skmultilearn.adapt.MLkNN(k=10, s=1.0, ignore_first_neighbours=0, n_jobs=None)¶

Bases: MLClassifierBase

kNN classification method adapted for multi-label classification

MLkNN builds uses k-NearestNeighbors find nearest examples to a test class and uses Bayesian inference to select assigned labels.

Parameters¶

kint: number of neighbours of each input instance to take into account
s: float (default is 1.0): the smoothing parameter
ignore_first_neighboursint (default is 0): ability to ignore first N neighbours, useful for comparing with other classification software.
n_jobs: int or None, optional (default=None): The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

Attributes¶

knn_an instance of sklearn.NearestNeighbors: the nearest neighbors single-label classifier used underneath

Note

If you don’t know what ignore_first_neighbours does, the default is safe. Please see this issue.

References¶

If you use this classifier please cite the original paper introducing the method:

@article{zhang2007ml,
  title={ML-KNN: A lazy learning approach to multi-label learning},
  author={Zhang, Min-Ling and Zhou, Zhi-Hua},
  journal={Pattern recognition},
  volume={40},
  number={7},
  pages={2038--2048},
  year={2007},
  publisher={Elsevier}
}

Examples¶

Here’s a very simple example of using MLkNN with a fixed number of neighbors:

from skmultilearn.adapt import MLkNN

classifier = MLkNN(k=3)

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

You can also use GridSearchCV to find an optimal set of parameters:

from skmultilearn.adapt import MLkNN
from sklearn.model_selection import GridSearchCV

parameters = {'k': range(1,3), 's': [0.5, 0.7, 1.0]}
score = 'f1_macro'

clf = GridSearchCV(MLkNN(), parameters, scoring=score)
clf.fit(X, y)

print (clf.best_params_, clf.best_score_)

# output
({'k': 1, 's': 0.5}, 0.78988303374297597)

fit(X, y)¶

Fit classifier with training data

Parameters¶

Xnumpy.ndarray or scipy.sparse: input features, can be a dense or sparse matrix of size (n_samples, n_features)
ynumpy.ndaarray or scipy.sparse {0,1}: binary indicator matrix with label assignments.

Returns¶

self: fitted instance of self

predict(X)¶

Predict labels for X

Parameters¶

Xnumpy.ndarray or scipy.sparse.csc_matrix: input features of shape (n_samples, n_features)

Returns¶

scipy.sparse matrix of int: binary indicator matrix with label assignments with shape (n_samples, n_labels)

predict_proba(X)¶

Predict probabilities of label assignments for X

Parameters¶

Xnumpy.ndarray or scipy.sparse.csc_matrix: input features of shape (n_samples, n_features)

Returns¶

scipy.sparse matrix of int: binary indicator matrix with label assignment probabilities with shape (n_samples, n_labels)

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → MLkNN¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters¶

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns¶

selfobject: The updated object.

Cite us

If you use scikit-multilearn-ng in your research and publish it, please consider citing scikit-multilearn:

@ARTICLE{2017arXiv170201460S,
    author = {{Szyma{'n}ski}, P. and {Kajdanowicz}, T.},
    title = "{A scikit-based Python environment for performing multi-label classification}",
    journal = {ArXiv e-prints},
    archivePrefix = "arXiv",
    eprint = {1702.01460},
    primaryClass = "cs.LG",
    keywords = {Computer Science - Learning, Computer Science - Mathematical Software},
    year = 2017,
    month = feb,
}