skmultilearn.adapt package¶
The skmultilearn.adapt
module implements algorithm
adaptation approaches to multi-label classification.
Algorithm adaptation methods for multi-label classification concentrate on adapting single-label classification algorithms to the multi-label case usually by changes in cost/decision functions.
Currently the following algorithm adaptation classification schemes are available in scikit-multilearn:
Classifier |
Description |
---|---|
a Binary Relevance kNN classifier that assigns a label if at least half of the neighbors are also classified with the label |
|
a Binary Relevance kNN classifier that assigns top m labels of neighbors with m - average number of labels assigned to neighbors |
|
a multi-label adapted kNN classifier with bayesian prior corrections |
|
a multi-Label Hierarchical ARAM Neural Network |
|
twin multi-Label Support Vector Machines |
- class skmultilearn.adapt.BRkNNaClassifier(k=10)¶
Bases:
_BinaryRelevanceKNN
Binary Relevance multi-label classifier based on k-Nearest Neighbors method.
This version of the classifier assigns the labels that are assigned to at least half of the neighbors.
Parameters¶
- kint
number of neighbours
Attributes¶
- knn_an instance of sklearn.NearestNeighbors
the nearest neighbors single-label classifier used underneath
- neighbors_array of arrays of int, shape = (n_samples, k)
k neighbors of each sample
- confidences_matrix of int, shape = (n_samples, n_labels)
label assignment confidences
References¶
If you use this method please cite the relevant paper:
@inproceedings{EleftheriosSpyromitros2008, author = {Eleftherios Spyromitros, Grigorios Tsoumakas, Ioannis Vlahavas}, booktitle = {Proc. 5th Hellenic Conference on Artificial Intelligence (SETN 2008)}, title = {An Empirical Study of Lazy Multilabel Classification Algorithms}, year = {2008}, location = {Syros, Greece} }
Examples¶
Here’s a very simple example of using BRkNNaClassifier with a fixed number of neighbors:
from skmultilearn.adapt import BRkNNaClassifier classifier = BRkNNaClassifier(k=3) # train classifier.fit(X_train, y_train) # predict predictions = classifier.predict(X_test)
You can also use
GridSearchCV
to find an optimal set of parameters:from skmultilearn.adapt import BRkNNaClassifier from sklearn.model_selection import GridSearchCV parameters = {'k': range(1,3)} score = 'f1_macro' clf = GridSearchCV(BRkNNaClassifier(), parameters, scoring=score) clf.fit(X, y)
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BRkNNaClassifier ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns¶
- selfobject
The updated object.
- class skmultilearn.adapt.BRkNNbClassifier(k=10)¶
Bases:
_BinaryRelevanceKNN
Binary Relevance multi-label classifier based on k-Nearest Neighbors method.
This version of the classifier assigns the most popular m labels of the neighbors, where m is the average number of labels assigned to the object’s neighbors.
Parameters¶
- kint
number of neighbours
Attributes¶
- knn_an instance of sklearn.NearestNeighbors
the nearest neighbors single-label classifier used underneath
- neighbors_array of arrays of int, shape = (n_samples, k)
k neighbors of each sample
- confidences_matrix of int, shape = (n_samples, n_labels)
label assignment confidences
References¶
If you use this method please cite the relevant paper:
@inproceedings{EleftheriosSpyromitros2008, author = {Eleftherios Spyromitros, Grigorios Tsoumakas, Ioannis Vlahavas}, booktitle = {Proc. 5th Hellenic Conference on Artificial Intelligence (SETN 2008)}, title = {An Empirical Study of Lazy Multilabel Classification Algorithms}, year = {2008}, location = {Syros, Greece} }
Examples¶
Here’s a very simple example of using BRkNNbClassifier with a fixed number of neighbors:
from skmultilearn.adapt import BRkNNbClassifier classifier = BRkNNbClassifier(k=3) # train classifier.fit(X_train, y_train) # predict predictions = classifier.predict(X_test)
You can also use
GridSearchCV
to find an optimal set of parameters:from skmultilearn.adapt import BRkNNbClassifier from sklearn.model_selection import GridSearchCV parameters = {'k': range(1,3)} score = 'f1-macro clf = GridSearchCV(BRkNNbClassifier(), parameters, scoring=score) clf.fit(X, y)
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BRkNNbClassifier ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns¶
- selfobject
The updated object.
- class skmultilearn.adapt.MLARAM(vigilance=0.9, threshold=0.02, neurons=None)¶
Bases:
MLClassifierBase
HARAM: A Hierarchical ARAM Neural Network for Large-Scale Text Classification
This method aims at increasing the classification speed by adding an extra ART layer for clustering learned prototypes into large clusters. In this case the activation of all prototypes can be replaced by the activation of a small fraction of them, leading to a significant reduction of the classification time.
Parameters¶
- vigilancefloat (default is 0.9)
parameter for adaptive resonance theory networks, controls how large a hyperbox can be, 1 it is small (no compression), 0 should assume all range. Normally set between 0.8 and 0.999, it is dataset dependent. It is responsible for the creation of the prototypes, therefore training of the network.
- thresholdfloat (default is 0.02)
controls how many prototypes participate by the prediction, can be changed for the testing phase.
- neuronslist
the neurons in the network
References¶
Published work available here.
@INPROCEEDINGS{7395756, author={F. Benites and E. Sapozhnikova}, booktitle={2015 IEEE International Conference on Data Mining Workshop (ICDMW)}, title={HARAM: A Hierarchical ARAM Neural Network for Large-Scale Text Classification}, year={2015}, volume={}, number={}, pages={847-854}, doi={10.1109/ICDMW.2015.14}, ISSN={2375-9259}, month={Nov}, }
Examples¶
Here’s an example code with a 5% threshold and vigilance of 0.95:
from skmultilearn.adapt import MLARAM classifier = MLARAM(threshold=0.05, vigilance=0.95) classifier.fit(X_train, y_train) prediction = classifier.predict(X_test)
- fit(X, y)¶
Fit classifier with training data
Parameters¶
- Xnumpy.ndarray or scipy.sparse
input features, can be a dense or sparse matrix of size
(n_samples, n_features)
- ynumpy.ndarray or scipy.sparse {0,1}
binary indicator matrix with label assignments.
Returns¶
- skmultilearn.MLARAMfast.MLARAM
fitted instance of self
- predict(X)¶
Predict labels for X
Parameters¶
- Xnumpy.ndarray or scipy.sparse.csc_matrix
input features of shape
(n_samples, n_features)
Returns¶
- scipy.sparse of int
binary indicator matrix with label assignments with shape
(n_samples, n_labels)
- predict_proba(X)¶
Predict probabilities of label assignments for X
Parameters¶
- Xnumpy.ndarray or scipy.sparse.csc_matrix
input features of shape
(n_samples, n_features)
Returns¶
- array of arrays of float
matrix with label assignment probabilities of shape
(n_samples, n_labels)
- reset()¶
Resets the labels and neurons
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MLARAM ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns¶
- selfobject
The updated object.
- class skmultilearn.adapt.MLTSVM(c_k=0, sor_omega=1.0, threshold=1e-06, lambda_param=1.0, max_iteration=500)¶
Bases:
MLClassifierBase
Twin multi-Label Support Vector Machines
Parameters¶
- c_kint
the empirical risk penalty parameter that determines the trade-off between the loss terms
- sor_omega: float (default is 1.0)
the smoothing parameter
- thresholdint (default is 1e-6)
threshold above which a label should be assigned
- lambda_paramfloat (default is 1.0)
the regularization parameter
- max_iterationint (default is 500)
maximum number of iterations to use in successive overrelaxation
References¶
If you use this classifier please cite the original paper introducing the method:
@article{chen2016mltsvm, title={MLTSVM: a novel twin support vector machine to multi-label learning}, author={Chen, Wei-Jie and Shao, Yuan-Hai and Li, Chun-Na and Deng, Nai-Yang}, journal={Pattern Recognition}, volume={52}, pages={61--74}, year={2016}, publisher={Elsevier} }
Examples¶
Here’s a very simple example of using MLTSVM with a fixed number of neighbors:
from skmultilearn.adapt import MLTSVM classifier = MLTSVM(c_k = 2**-1) # train classifier.fit(X_train, y_train) # predict predictions = classifier.predict(X_test)
You can also use
GridSearchCV
to find an optimal set of parameters:from skmultilearn.adapt import MLTSVM from sklearn.model_selection import GridSearchCV parameters = {'c_k': [2**i for i in range(-5, 5, 2)]} score = 'f1-macro clf = GridSearchCV(MLTSVM(), parameters, scoring=score) clf.fit(X, y) print (clf.best_params_, clf.best_score_) # output {'c_k': 0.03125} 0.347518217573
- fit(X, Y)¶
Abstract method to fit classifier with training data
It must return a fitted instance of
self
.Parameters¶
- Xnumpy.ndarray or scipy.sparse
input features, can be a dense or sparse matrix of size
(n_samples, n_features)
- ynumpy.ndaarray or scipy.sparse {0,1}
binary indicator matrix with label assignments.
Returns¶
- object
fitted instance of self
Raises¶
- NotImplementedError
this is just an abstract method
- predict(X)¶
Abstract method to predict labels
Parameters¶
- Xnumpy.ndarray or scipy.sparse.csc_matrix
input features of shape
(n_samples, n_features)
Returns¶
- scipy.sparse of int
binary indicator matrix with label assignments with shape
(n_samples, n_labels)
Raises¶
- NotImplementedError
this is just an abstract method
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MLTSVM ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns¶
- selfobject
The updated object.
- class skmultilearn.adapt.MLkNN(k=10, s=1.0, ignore_first_neighbours=0, n_jobs=None)¶
Bases:
MLClassifierBase
kNN classification method adapted for multi-label classification
MLkNN builds uses k-NearestNeighbors find nearest examples to a test class and uses Bayesian inference to select assigned labels.
Parameters¶
- kint
number of neighbours of each input instance to take into account
- s: float (default is 1.0)
the smoothing parameter
- ignore_first_neighboursint (default is 0)
ability to ignore first N neighbours, useful for comparing with other classification software.
- n_jobs: int or None, optional (default=None)
The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.
Attributes¶
- knn_an instance of sklearn.NearestNeighbors
the nearest neighbors single-label classifier used underneath
Note
If you don’t know what
ignore_first_neighbours
does, the default is safe. Please see this issue.References¶
If you use this classifier please cite the original paper introducing the method:
@article{zhang2007ml, title={ML-KNN: A lazy learning approach to multi-label learning}, author={Zhang, Min-Ling and Zhou, Zhi-Hua}, journal={Pattern recognition}, volume={40}, number={7}, pages={2038--2048}, year={2007}, publisher={Elsevier} }
Examples¶
Here’s a very simple example of using MLkNN with a fixed number of neighbors:
from skmultilearn.adapt import MLkNN classifier = MLkNN(k=3) # train classifier.fit(X_train, y_train) # predict predictions = classifier.predict(X_test)
You can also use
GridSearchCV
to find an optimal set of parameters:from skmultilearn.adapt import MLkNN from sklearn.model_selection import GridSearchCV parameters = {'k': range(1,3), 's': [0.5, 0.7, 1.0]} score = 'f1_macro' clf = GridSearchCV(MLkNN(), parameters, scoring=score) clf.fit(X, y) print (clf.best_params_, clf.best_score_) # output ({'k': 1, 's': 0.5}, 0.78988303374297597)
- fit(X, y)¶
Fit classifier with training data
Parameters¶
- Xnumpy.ndarray or scipy.sparse
input features, can be a dense or sparse matrix of size
(n_samples, n_features)
- ynumpy.ndaarray or scipy.sparse {0,1}
binary indicator matrix with label assignments.
Returns¶
- self
fitted instance of self
- predict(X)¶
Predict labels for X
Parameters¶
- Xnumpy.ndarray or scipy.sparse.csc_matrix
input features of shape
(n_samples, n_features)
Returns¶
- scipy.sparse matrix of int
binary indicator matrix with label assignments with shape
(n_samples, n_labels)
- predict_proba(X)¶
Predict probabilities of label assignments for X
Parameters¶
- Xnumpy.ndarray or scipy.sparse.csc_matrix
input features of shape
(n_samples, n_features)
Returns¶
- scipy.sparse matrix of int
binary indicator matrix with label assignment probabilities with shape
(n_samples, n_labels)
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MLkNN ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns¶
- selfobject
The updated object.
Cite us
If you use scikit-multilearn-ng in your research and publish it, please consider citing scikit-multilearn:
@ARTICLE{2017arXiv170201460S,
author = {{Szyma{'n}ski}, P. and {Kajdanowicz}, T.},
title = "{A scikit-based Python environment for performing multi-label classification}",
journal = {ArXiv e-prints},
archivePrefix = "arXiv",
eprint = {1702.01460},
primaryClass = "cs.LG",
keywords = {Computer Science - Learning, Computer Science - Mathematical Software},
year = 2017,
month = feb,
}