skmultilearn.ensemble package¶
The skmultilearn.ensemble
module implements ensemble classification schemes
that construct an ensemble of base multi-label classifiers.
Currently the following ensemble classification schemes are available in scikit-multilearn:
Classifier name |
Description |
---|---|
Distinct RAndom k-labELsets multi-label classifier |
|
Overlapping RAndom k-labELsets multi-label classifier. |
|
a label space partitioning classifier that trains a
classifier per label subspace as clustered using methods
from |
|
a label space division classifier that trains a classifier
per label subspace as clustered using methods from
|
- class skmultilearn.ensemble.LabelSpacePartitioningClassifier(classifier=None, clusterer=None, require_dense=None)¶
Bases:
BinaryRelevance
Partition label space and classify each subspace separately
This classifier performs classification by:
1. partitioning the label space into separate, smaller multi-label sub problems, using the supplied label space clusterer
training an instance of the supplied base mult-label classifier for each label space subset in the partition
predicting the result with each of subclassifiers and returning the sum of their results
Parameters¶
- classifier
BaseEstimator
the base classifier that will be used in a class, will be automatically put under
self.classifier
.- clusterer
LabelSpaceClustererBase
object that partitions the output space, will be automatically put under
self.clusterer
.- require_dense[bool, bool]
whether the base classifier requires [input, output] matrices in dense representation, will be automatically put under
self.require_dense
.
Attributes¶
- model_count_int
number of trained models, in this classifier equal to the number of partitions
- partition_List[List[int]], shape=(model_count_,)
list of lists of label indexes, used to index the output space matrix, set in
_generate_partition()
viafit()
- classifiersList[
BaseEstimator
], shape=(model_count_,) list of classifiers trained per partition, set in
fit()
References¶
If you use this clusterer please cite the clustering paper:
@Article{datadriven, author = {Szymański, Piotr and Kajdanowicz, Tomasz and Kersting, Kristian}, title = {How Is a Data-Driven Approach Better than Random Choice in Label Space Division for Multi-Label Classification?}, journal = {Entropy}, volume = {18}, year = {2016}, number = {8}, article_number = {282}, url = {http://www.mdpi.com/1099-4300/18/8/282}, issn = {1099-4300}, doi = {10.3390/e18080282} }
Examples¶
Here’s an example of building a partitioned ensemble of Classifier Chains
from skmultilearn.ensemble import MajorityVotingClassifier from skmultilearn.cluster import FixedLabelSpaceClusterer from skmultilearn.problem_transform import ClassifierChain from sklearn.naive_bayes import GaussianNB classifier = MajorityVotingClassifier( clusterer = FixedLabelSpaceClusterer(clusters = [[1,3,4], [0, 2, 5]]), classifier = ClassifierChain(classifier=GaussianNB()) ) classifier.fit(X_train,y_train) predictions = classifier.predict(X_test)
More advanced examples can be found in the label relations exploration guide
- predict(X)¶
Predict labels for X
Parameters¶
- Xnumpy.ndarray or scipy.sparse.csc_matrix
input features of shape
(n_samples, n_features)
Returns¶
- scipy.sparse of int
binary indicator matrix with label assignments with shape
(n_samples, n_labels)
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LabelSpacePartitioningClassifier ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns¶
- selfobject
The updated object.
- class skmultilearn.ensemble.MajorityVotingClassifier(classifier=None, clusterer=None, require_dense=None)¶
Bases:
LabelSpacePartitioningClassifier
Majority Voting ensemble classifier
Divides the label space using provided clusterer class, trains a provided base classifier type classifier for each subset and assign a label to an instance if more than half of all classifiers (majority) from clusters that contain the label assigned the label to the instance.
Parameters¶
- classifier
BaseEstimator
the base classifier that will be used in a class, will be automatically put under
self.classifier
.- clusterer
LabelSpaceClustererBase
object that partitions the output space, will be automatically put under
self.clusterer
.- require_dense[bool, bool]
whether the base classifier requires [input, output] matrices in dense representation, will be automatically put under
self.require_dense
.
Attributes¶
- model_count_int
number of trained models, in this classifier equal to the number of partitions
- partition_List[List[int]], shape=(model_count_,)
list of lists of label indexes, used to index the output space matrix, set in
_generate_partition()
viafit()
- classifiersList[
BaseEstimator
], shape=(model_count_,) list of classifiers trained per partition, set in
fit()
Examples¶
Here’s an example of building an overlapping ensemble of chains
from skmultilearn.ensemble import MajorityVotingClassifier from skmultilearn.cluster import FixedLabelSpaceClusterer from skmultilearn.problem_transform import ClassifierChain from sklearn.naive_bayes import GaussianNB classifier = MajorityVotingClassifier( clusterer = FixedLabelSpaceClusterer(clusters = [[1,2,3], [0, 2, 5], [4, 5]]), classifier = ClassifierChain(classifier=GaussianNB()) ) classifier.fit(X_train,y_train) predictions = classifier.predict(X_test)
More advanced examples can be found in the label relations exploration guide
- predict(X)¶
Predict label assignments for X
Parameters¶
- Xnumpy.ndarray or scipy.sparse.csc_matrix
input features of shape
(n_samples, n_features)
Returns¶
- scipy.sparse of float
binary indicator matrix with label assignments with shape
(n_samples, n_labels)
- predict_proba(X)¶
Predict probabilities of label assignments for X
Parameters¶
- Xarray_like,
numpy.matrix
orscipy.sparse
matrix, shape=(n_samples, n_features) input feature matrix
Returns¶
scipy.sparse
matrix of float in [0.0, 1.0], shape=(n_samples, n_labels)matrix with label assignment probabilities
- Xarray_like,
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MajorityVotingClassifier ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns¶
- selfobject
The updated object.
- classifier
- class skmultilearn.ensemble.RakelD(base_classifier=None, labelset_size=3, base_classifier_require_dense=None)¶
Bases:
MLClassifierBase
Distinct RAndom k-labELsets multi-label classifier.
Divides the label space in to equal partitions of size k, trains a Label Powerset classifier per partition and predicts by summing the result of all trained classifiers.
Parameters¶
- base_classifiersklearn.base
the base classifier that will be used in a class, will be automatically put under
self.classifier
for future access.- base_classifier_require_dense[bool, bool]
whether the base classifier requires [input, output] matrices in dense representation, will be automatically put under
self.require_dense
- labelset_sizeint
the desired size of each of the partitions, parameter k according to paper Default is 3, according to paper it has the best results
Attributes¶
- _label_countint
the number of labels the classifier is fit to, set by
fit()
- model_count_int
the number of sub classifiers trained, set by
fit()
- classifier_:
skmultilearn.ensemble.LabelSpacePartitioningClassifier
the underneath classifier that perform the label space partitioning using a random clusterer
skmultilearn.ensemble.RandomLabelSpaceClusterer
References¶
If you use this class please cite the paper introducing the method:
@ARTICLE{5567103, author={G. Tsoumakas and I. Katakis and I. Vlahavas}, journal={IEEE Transactions on Knowledge and Data Engineering}, title={Random k-Labelsets for Multilabel Classification}, year={2011}, volume={23}, number={7}, pages={1079-1089}, doi={10.1109/TKDE.2010.164}, ISSN={1041-4347}, month={July}, }
Examples¶
Here’s a simple example of how to use this class with a base classifier from scikit-learn to teach non-overlapping classifiers each trained on at most four labels:
from sklearn.naive_bayes import GaussianNB from skmultilearn.ensemble import RakelD classifier = RakelD( base_classifier=GaussianNB(), base_classifier_require_dense=[True, True], labelset_size=4 ) classifier.fit(X_train, y_train) prediction = classifier.predict(X_test)
- fit(X, y)¶
Fit classifier to multi-label data
Parameters¶
- Xnumpy.ndarray or scipy.sparse
input features, can be a dense or sparse matrix of size
(n_samples, n_features)
- ynumpy.ndaarray or scipy.sparse {0,1}
binary indicator matrix with label assignments, shape
(n_samples, n_labels)
Returns¶
fitted instance of self
- predict(X)¶
Predict label assignments
Parameters¶
- Xnumpy.ndarray or scipy.sparse.csc_matrix
input features of shape
(n_samples, n_features)
Returns¶
- scipy.sparse of int
binary indicator matrix with label assignments with shape
(n_samples, n_labels)
- predict_proba(X)¶
Predict label probabilities
Parameters¶
- Xnumpy.ndarray or scipy.sparse.csc_matrix
input features of shape
(n_samples, n_features)
Returns¶
- scipy.sparse of float
binary indicator matrix with probability of label assignment with shape
(n_samples, n_labels)
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RakelD ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns¶
- selfobject
The updated object.
- class skmultilearn.ensemble.RakelO(base_classifier=None, model_count=None, labelset_size=3, base_classifier_require_dense=None)¶
Bases:
MLClassifierBase
Overlapping RAndom k-labELsets multi-label classifier
Divides the label space in to m subsets of size k, trains a Label Powerset classifier for each subset and assign a label to an instance if more than half of all classifiers (majority) from clusters that contain the label assigned the label to the instance.
Parameters¶
- base_classifier:
BaseEstimator
scikit-learn compatible base classifier, will be set under self.classifier.classifier.
- base_classifier_require_dense[bool, bool]
whether the base classifier requires [input, output] matrices in dense representation. Will be automatically set under self.classifier.require_dense
- labelset_sizeint
the desired size of each of the partitions, parameter k according to paper. According to paper, the best parameter is 3, so it’s set as default Will be automatically set under self.labelset_size
- model_countint
the desired number of classifiers, parameter m according to paper. According to paper, the best value for this parameter is 2M (being M the number of labels) Will be automatically set under
self.model_count_
.
Attributes¶
- classifier
MajorityVotingClassifier
the voting classifier initialized with
LabelPowerset
multi-label classifier with base_classifier andRandomLabelSpaceClusterer
References¶
If you use this class please cite the paper introducing the method:
@ARTICLE{5567103, author={G. Tsoumakas and I. Katakis and I. Vlahavas}, journal={IEEE Transactions on Knowledge and Data Engineering}, title={Random k-Labelsets for Multilabel Classification}, year={2011}, volume={23}, number={7}, pages={1079-1089}, doi={10.1109/TKDE.2010.164}, ISSN={1041-4347}, month={July}, }
Examples¶
Here’s a simple example of how to use this class with a base classifier from scikit-learn to teach 6 classifiers each trained on a quarter of labels, which is sure to overlap:
from sklearn.naive_bayes import GaussianNB from skmultilearn.ensemble import RakelO classifier = RakelO( base_classifier=GaussianNB(), base_classifier_require_dense=[True, True], labelset_size=y_train.shape[1] // 4, model_count_=6 ) classifier.fit(X_train, y_train) prediction = classifier.predict(X_train, y_train)
- fit(X, y)¶
Fits classifier to training data
Parameters¶
- Xarray_like,
numpy.matrix
orscipy.sparse
matrix, shape=(n_samples, n_features) input feature matrix
- yarray_like,
numpy.matrix
orscipy.sparse
matrix of {0, 1}, shape=(n_samples, n_labels) binary indicator matrix with label assignments
Returns¶
- self
fitted instance of self
- Xarray_like,
- predict(X)¶
Predict labels for X
Parameters¶
- Xarray_like,
numpy.matrix
orscipy.sparse
matrix, shape=(n_samples, n_features) input feature matrix
Returns¶
scipy.sparse
matrix of {0, 1}, shape=(n_samples, n_labels)binary indicator matrix with label assignments
- Xarray_like,
- predict_proba(X)¶
Predict probabilities of label assignments for X
Parameters¶
- Xarray_like,
numpy.matrix
orscipy.sparse
matrix, shape=(n_samples, n_features) input feature matrix
Returns¶
scipy.sparse
matrix of float in [0.0, 1.0], shape=(n_samples, n_labels)matrix with label assignment probabilities
- Xarray_like,
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RakelO ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns¶
- selfobject
The updated object.
- base_classifier:
Cite us
If you use scikit-multilearn-ng in your research and publish it, please consider citing scikit-multilearn:
@ARTICLE{2017arXiv170201460S,
author = {{Szyma{'n}ski}, P. and {Kajdanowicz}, T.},
title = "{A scikit-based Python environment for performing multi-label classification}",
journal = {ArXiv e-prints},
archivePrefix = "arXiv",
eprint = {1702.01460},
primaryClass = "cs.LG",
keywords = {Computer Science - Learning, Computer Science - Mathematical Software},
year = 2017,
month = feb,
}