skmultilearn.ensemble package¶

The skmultilearn.ensemble module implements ensemble classification schemes that construct an ensemble of base multi-label classifiers.

Currently the following ensemble classification schemes are available in scikit-multilearn:

Classifier name	Description
`RakelD`	Distinct RAndom k-labELsets multi-label classifier
`RakelO`	Overlapping RAndom k-labELsets multi-label classifier.
`LabelSpacePartitioningClassifier`	a label space partitioning classifier that trains a classifier per label subspace as clustered using methods from `skmultilearn.cluster`.
`MajorityVotingClassifier`	a label space division classifier that trains a classifier per label subspace as clustered using methods from `skmultilearn.cluster` and assign labels if the majority of classifiers that contain the label agree on the assignment.

class skmultilearn.ensemble.LabelSpacePartitioningClassifier(classifier=None, clusterer=None, require_dense=None)¶

Bases: BinaryRelevance

Partition label space and classify each subspace separately

This classifier performs classification by:

1. partitioning the label space into separate, smaller multi-label sub problems, using the supplied label space clusterer

training an instance of the supplied base mult-label classifier for each label space subset in the partition
predicting the result with each of subclassifiers and returning the sum of their results

Parameters¶

classifierBaseEstimator: the base classifier that will be used in a class, will be automatically put under self.classifier.
clustererLabelSpaceClustererBase: object that partitions the output space, will be automatically put under self.clusterer.
require_dense[bool, bool]: whether the base classifier requires [input, output] matrices in dense representation, will be automatically put under self.require_dense.

Attributes¶

model_count_int: number of trained models, in this classifier equal to the number of partitions
partition_List[List[int]], shape=(model_count_,): list of lists of label indexes, used to index the output space matrix, set in _generate_partition() via fit()
classifiersList[BaseEstimator], shape=(model_count_,): list of classifiers trained per partition, set in fit()

References¶

If you use this clusterer please cite the clustering paper:

@Article{datadriven,
    author = {Szymański, Piotr and Kajdanowicz, Tomasz and Kersting, Kristian},
    title = {How Is a Data-Driven Approach Better than Random Choice in
    Label Space Division for Multi-Label Classification?},
    journal = {Entropy},
    volume = {18},
    year = {2016},
    number = {8},
    article_number = {282},
    url = {http://www.mdpi.com/1099-4300/18/8/282},
    issn = {1099-4300},
    doi = {10.3390/e18080282}
}

Examples¶

Here’s an example of building a partitioned ensemble of Classifier Chains

from skmultilearn.ensemble import MajorityVotingClassifier
from skmultilearn.cluster import FixedLabelSpaceClusterer
from skmultilearn.problem_transform import ClassifierChain
from sklearn.naive_bayes import GaussianNB


classifier = MajorityVotingClassifier(
    clusterer = FixedLabelSpaceClusterer(clusters = [[1,3,4], [0, 2, 5]]),
    classifier = ClassifierChain(classifier=GaussianNB())
)
classifier.fit(X_train,y_train)
predictions = classifier.predict(X_test)

More advanced examples can be found in the label relations exploration guide

predict(X)¶

Predict labels for X

Parameters¶

Xnumpy.ndarray or scipy.sparse.csc_matrix: input features of shape (n_samples, n_features)

Returns¶

scipy.sparse of int: binary indicator matrix with label assignments with shape (n_samples, n_labels)

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → LabelSpacePartitioningClassifier¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters¶

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns¶

selfobject: The updated object.

class skmultilearn.ensemble.MajorityVotingClassifier(classifier=None, clusterer=None, require_dense=None)¶

Bases: LabelSpacePartitioningClassifier

Majority Voting ensemble classifier

Divides the label space using provided clusterer class, trains a provided base classifier type classifier for each subset and assign a label to an instance if more than half of all classifiers (majority) from clusters that contain the label assigned the label to the instance.

Parameters¶

classifierBaseEstimator: the base classifier that will be used in a class, will be automatically put under self.classifier.
clustererLabelSpaceClustererBase: object that partitions the output space, will be automatically put under self.clusterer.
require_dense[bool, bool]: whether the base classifier requires [input, output] matrices in dense representation, will be automatically put under self.require_dense.

Attributes¶

model_count_int: number of trained models, in this classifier equal to the number of partitions
partition_List[List[int]], shape=(model_count_,): list of lists of label indexes, used to index the output space matrix, set in _generate_partition() via fit()
classifiersList[BaseEstimator], shape=(model_count_,): list of classifiers trained per partition, set in fit()

Examples¶

Here’s an example of building an overlapping ensemble of chains

from skmultilearn.ensemble import MajorityVotingClassifier
from skmultilearn.cluster import FixedLabelSpaceClusterer
from skmultilearn.problem_transform import ClassifierChain
from sklearn.naive_bayes import GaussianNB


classifier = MajorityVotingClassifier(
    clusterer = FixedLabelSpaceClusterer(clusters = [[1,2,3], [0, 2, 5], [4, 5]]),
    classifier = ClassifierChain(classifier=GaussianNB())
)
classifier.fit(X_train,y_train)
predictions = classifier.predict(X_test)

More advanced examples can be found in the label relations exploration guide

predict(X)¶

Predict label assignments for X

Parameters¶

Xnumpy.ndarray or scipy.sparse.csc_matrix: input features of shape (n_samples, n_features)

Returns¶

scipy.sparse of float: binary indicator matrix with label assignments with shape (n_samples, n_labels)

predict_proba(X)¶

Predict probabilities of label assignments for X

Parameters¶

Xarray_like, numpy.matrix or scipy.sparse matrix, shape=(n_samples, n_features): input feature matrix

Returns¶

scipy.sparse matrix of float in [0.0, 1.0], shape=(n_samples, n_labels): matrix with label assignment probabilities

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → MajorityVotingClassifier¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters¶

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns¶

selfobject: The updated object.

class skmultilearn.ensemble.RakelD(base_classifier=None, labelset_size=3, base_classifier_require_dense=None)¶

Bases: MLClassifierBase

Distinct RAndom k-labELsets multi-label classifier.

Divides the label space in to equal partitions of size k, trains a Label Powerset classifier per partition and predicts by summing the result of all trained classifiers.

Parameters¶

base_classifiersklearn.base: the base classifier that will be used in a class, will be automatically put under self.classifier for future access.
base_classifier_require_dense[bool, bool]: whether the base classifier requires [input, output] matrices in dense representation, will be automatically put under self.require_dense
labelset_sizeint: the desired size of each of the partitions, parameter k according to paper Default is 3, according to paper it has the best results

Attributes¶

_label_countint: the number of labels the classifier is fit to, set by fit()
model_count_int: the number of sub classifiers trained, set by fit()
classifier_: skmultilearn.ensemble.LabelSpacePartitioningClassifier: the underneath classifier that perform the label space partitioning using a random clusterer skmultilearn.ensemble.RandomLabelSpaceClusterer

References¶

If you use this class please cite the paper introducing the method:

@ARTICLE{5567103,
    author={G. Tsoumakas and I. Katakis and I. Vlahavas},
    journal={IEEE Transactions on Knowledge and Data Engineering},
    title={Random k-Labelsets for Multilabel Classification},
    year={2011},
    volume={23},
    number={7},
    pages={1079-1089},
    doi={10.1109/TKDE.2010.164},
    ISSN={1041-4347},
    month={July},
}

Examples¶

Here’s a simple example of how to use this class with a base classifier from scikit-learn to teach non-overlapping classifiers each trained on at most four labels:

from sklearn.naive_bayes import GaussianNB
from skmultilearn.ensemble import RakelD

classifier = RakelD(
    base_classifier=GaussianNB(),
    base_classifier_require_dense=[True, True],
    labelset_size=4
)

classifier.fit(X_train, y_train)
prediction = classifier.predict(X_test)

fit(X, y)¶

Fit classifier to multi-label data

Parameters¶

Xnumpy.ndarray or scipy.sparse: input features, can be a dense or sparse matrix of size (n_samples, n_features)
ynumpy.ndaarray or scipy.sparse {0,1}: binary indicator matrix with label assignments, shape (n_samples, n_labels)

Returns¶

fitted instance of self

predict(X)¶

Predict label assignments

Parameters¶

Xnumpy.ndarray or scipy.sparse.csc_matrix: input features of shape (n_samples, n_features)

Returns¶

scipy.sparse of int: binary indicator matrix with label assignments with shape (n_samples, n_labels)

predict_proba(X)¶

Predict label probabilities

Parameters¶

Xnumpy.ndarray or scipy.sparse.csc_matrix: input features of shape (n_samples, n_features)

Returns¶

scipy.sparse of float: binary indicator matrix with probability of label assignment with shape (n_samples, n_labels)

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → RakelD¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters¶

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns¶

selfobject: The updated object.

class skmultilearn.ensemble.RakelO(base_classifier=None, model_count=None, labelset_size=3, base_classifier_require_dense=None)¶

Bases: MLClassifierBase

Overlapping RAndom k-labELsets multi-label classifier

Divides the label space in to m subsets of size k, trains a Label Powerset classifier for each subset and assign a label to an instance if more than half of all classifiers (majority) from clusters that contain the label assigned the label to the instance.

Parameters¶

base_classifier: BaseEstimator: scikit-learn compatible base classifier, will be set under self.classifier.classifier.
base_classifier_require_dense[bool, bool]: whether the base classifier requires [input, output] matrices in dense representation. Will be automatically set under self.classifier.require_dense
labelset_sizeint: the desired size of each of the partitions, parameter k according to paper. According to paper, the best parameter is 3, so it’s set as default Will be automatically set under self.labelset_size
model_countint: the desired number of classifiers, parameter m according to paper. According to paper, the best value for this parameter is 2M (being M the number of labels) Will be automatically set under self.model_count_.

Attributes¶

classifierMajorityVotingClassifier: the voting classifier initialized with LabelPowerset multi-label classifier with base_classifier and RandomLabelSpaceClusterer

References¶

If you use this class please cite the paper introducing the method:

@ARTICLE{5567103,
    author={G. Tsoumakas and I. Katakis and I. Vlahavas},
    journal={IEEE Transactions on Knowledge and Data Engineering},
    title={Random k-Labelsets for Multilabel Classification},
    year={2011},
    volume={23},
    number={7},
    pages={1079-1089},
    doi={10.1109/TKDE.2010.164},
    ISSN={1041-4347},
    month={July},
}

Examples¶

Here’s a simple example of how to use this class with a base classifier from scikit-learn to teach 6 classifiers each trained on a quarter of labels, which is sure to overlap:

from sklearn.naive_bayes import GaussianNB
from skmultilearn.ensemble import RakelO

classifier = RakelO(
    base_classifier=GaussianNB(),
    base_classifier_require_dense=[True, True],
    labelset_size=y_train.shape[1] // 4,
    model_count_=6
)

classifier.fit(X_train, y_train)
prediction = classifier.predict(X_train, y_train)

fit(X, y)¶

Fits classifier to training data

Parameters¶

Xarray_like, numpy.matrix or scipy.sparse matrix, shape=(n_samples, n_features): input feature matrix
yarray_like, numpy.matrix or scipy.sparse matrix of {0, 1}, shape=(n_samples, n_labels): binary indicator matrix with label assignments

Returns¶

self: fitted instance of self

predict(X)¶

Predict labels for X

Parameters¶

Xarray_like, numpy.matrix or scipy.sparse matrix, shape=(n_samples, n_features): input feature matrix

Returns¶

scipy.sparse matrix of {0, 1}, shape=(n_samples, n_labels): binary indicator matrix with label assignments

predict_proba(X)¶

Predict probabilities of label assignments for X

Parameters¶

Xarray_like, numpy.matrix or scipy.sparse matrix, shape=(n_samples, n_features): input feature matrix

Returns¶

scipy.sparse matrix of float in [0.0, 1.0], shape=(n_samples, n_labels): matrix with label assignment probabilities

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → RakelO¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters¶

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns¶

selfobject: The updated object.

Cite us

If you use scikit-multilearn-ng in your research and publish it, please consider citing scikit-multilearn:

@ARTICLE{2017arXiv170201460S,
    author = {{Szyma{'n}ski}, P. and {Kajdanowicz}, T.},
    title = "{A scikit-based Python environment for performing multi-label classification}",
    journal = {ArXiv e-prints},
    archivePrefix = "arXiv",
    eprint = {1702.01460},
    primaryClass = "cs.LG",
    keywords = {Computer Science - Learning, Computer Science - Mathematical Software},
    year = 2017,
    month = feb,
}