skmultilearn.ensemble package

The skmultilearn.ensemble module implements ensemble classification schemes that construct an ensemble of base multi-label classifiers.

Currently the following ensemble classification schemes are available in scikit-multilearn:

Classifier name

Description

RakelD

Distinct RAndom k-labELsets multi-label classifier

RakelO

Overlapping RAndom k-labELsets multi-label classifier.

LabelSpacePartitioningClassifier

a label space partitioning classifier that trains a classifier per label subspace as clustered using methods from skmultilearn.cluster.

MajorityVotingClassifier

a label space division classifier that trains a classifier per label subspace as clustered using methods from skmultilearn.cluster and assign labels if the majority of classifiers that contain the label agree on the assignment.

class skmultilearn.ensemble.LabelSpacePartitioningClassifier(classifier=None, clusterer=None, require_dense=None)

Bases: BinaryRelevance

Partition label space and classify each subspace separately

This classifier performs classification by:

1. partitioning the label space into separate, smaller multi-label sub problems, using the supplied label space clusterer

  1. training an instance of the supplied base mult-label classifier for each label space subset in the partition

  2. predicting the result with each of subclassifiers and returning the sum of their results

Parameters

classifierBaseEstimator

the base classifier that will be used in a class, will be automatically put under self.classifier.

clustererLabelSpaceClustererBase

object that partitions the output space, will be automatically put under self.clusterer.

require_dense[bool, bool]

whether the base classifier requires [input, output] matrices in dense representation, will be automatically put under self.require_dense.

Attributes

model_count_int

number of trained models, in this classifier equal to the number of partitions

partition_List[List[int]], shape=(model_count_,)

list of lists of label indexes, used to index the output space matrix, set in _generate_partition() via fit()

classifiersList[BaseEstimator], shape=(model_count_,)

list of classifiers trained per partition, set in fit()

References

If you use this clusterer please cite the clustering paper:

@Article{datadriven,
    author = {Szymański, Piotr and Kajdanowicz, Tomasz and Kersting, Kristian},
    title = {How Is a Data-Driven Approach Better than Random Choice in
    Label Space Division for Multi-Label Classification?},
    journal = {Entropy},
    volume = {18},
    year = {2016},
    number = {8},
    article_number = {282},
    url = {http://www.mdpi.com/1099-4300/18/8/282},
    issn = {1099-4300},
    doi = {10.3390/e18080282}
}

Examples

Here’s an example of building a partitioned ensemble of Classifier Chains

from skmultilearn.ensemble import MajorityVotingClassifier
from skmultilearn.cluster import FixedLabelSpaceClusterer
from skmultilearn.problem_transform import ClassifierChain
from sklearn.naive_bayes import GaussianNB


classifier = MajorityVotingClassifier(
    clusterer = FixedLabelSpaceClusterer(clusters = [[1,3,4], [0, 2, 5]]),
    classifier = ClassifierChain(classifier=GaussianNB())
)
classifier.fit(X_train,y_train)
predictions = classifier.predict(X_test)

More advanced examples can be found in the label relations exploration guide

predict(X)

Predict labels for X

Parameters

Xnumpy.ndarray or scipy.sparse.csc_matrix

input features of shape (n_samples, n_features)

Returns

scipy.sparse of int

binary indicator matrix with label assignments with shape (n_samples, n_labels)

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LabelSpacePartitioningClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class skmultilearn.ensemble.MajorityVotingClassifier(classifier=None, clusterer=None, require_dense=None)

Bases: LabelSpacePartitioningClassifier

Majority Voting ensemble classifier

Divides the label space using provided clusterer class, trains a provided base classifier type classifier for each subset and assign a label to an instance if more than half of all classifiers (majority) from clusters that contain the label assigned the label to the instance.

Parameters

classifierBaseEstimator

the base classifier that will be used in a class, will be automatically put under self.classifier.

clustererLabelSpaceClustererBase

object that partitions the output space, will be automatically put under self.clusterer.

require_dense[bool, bool]

whether the base classifier requires [input, output] matrices in dense representation, will be automatically put under self.require_dense.

Attributes

model_count_int

number of trained models, in this classifier equal to the number of partitions

partition_List[List[int]], shape=(model_count_,)

list of lists of label indexes, used to index the output space matrix, set in _generate_partition() via fit()

classifiersList[BaseEstimator], shape=(model_count_,)

list of classifiers trained per partition, set in fit()

Examples

Here’s an example of building an overlapping ensemble of chains

from skmultilearn.ensemble import MajorityVotingClassifier
from skmultilearn.cluster import FixedLabelSpaceClusterer
from skmultilearn.problem_transform import ClassifierChain
from sklearn.naive_bayes import GaussianNB


classifier = MajorityVotingClassifier(
    clusterer = FixedLabelSpaceClusterer(clusters = [[1,2,3], [0, 2, 5], [4, 5]]),
    classifier = ClassifierChain(classifier=GaussianNB())
)
classifier.fit(X_train,y_train)
predictions = classifier.predict(X_test)

More advanced examples can be found in the label relations exploration guide

predict(X)

Predict label assignments for X

Parameters

Xnumpy.ndarray or scipy.sparse.csc_matrix

input features of shape (n_samples, n_features)

Returns

scipy.sparse of float

binary indicator matrix with label assignments with shape (n_samples, n_labels)

predict_proba(X)

Predict probabilities of label assignments for X

Parameters

Xarray_like, numpy.matrix or scipy.sparse matrix, shape=(n_samples, n_features)

input feature matrix

Returns

scipy.sparse matrix of float in [0.0, 1.0], shape=(n_samples, n_labels)

matrix with label assignment probabilities

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MajorityVotingClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class skmultilearn.ensemble.RakelD(base_classifier=None, labelset_size=3, base_classifier_require_dense=None)

Bases: MLClassifierBase

Distinct RAndom k-labELsets multi-label classifier.

Divides the label space in to equal partitions of size k, trains a Label Powerset classifier per partition and predicts by summing the result of all trained classifiers.

Parameters

base_classifiersklearn.base

the base classifier that will be used in a class, will be automatically put under self.classifier for future access.

base_classifier_require_dense[bool, bool]

whether the base classifier requires [input, output] matrices in dense representation, will be automatically put under self.require_dense

labelset_sizeint

the desired size of each of the partitions, parameter k according to paper Default is 3, according to paper it has the best results

Attributes

_label_countint

the number of labels the classifier is fit to, set by fit()

model_count_int

the number of sub classifiers trained, set by fit()

classifier_: skmultilearn.ensemble.LabelSpacePartitioningClassifier

the underneath classifier that perform the label space partitioning using a random clusterer skmultilearn.ensemble.RandomLabelSpaceClusterer

References

If you use this class please cite the paper introducing the method:

@ARTICLE{5567103,
    author={G. Tsoumakas and I. Katakis and I. Vlahavas},
    journal={IEEE Transactions on Knowledge and Data Engineering},
    title={Random k-Labelsets for Multilabel Classification},
    year={2011},
    volume={23},
    number={7},
    pages={1079-1089},
    doi={10.1109/TKDE.2010.164},
    ISSN={1041-4347},
    month={July},
}

Examples

Here’s a simple example of how to use this class with a base classifier from scikit-learn to teach non-overlapping classifiers each trained on at most four labels:

from sklearn.naive_bayes import GaussianNB
from skmultilearn.ensemble import RakelD

classifier = RakelD(
    base_classifier=GaussianNB(),
    base_classifier_require_dense=[True, True],
    labelset_size=4
)

classifier.fit(X_train, y_train)
prediction = classifier.predict(X_test)
fit(X, y)

Fit classifier to multi-label data

Parameters

Xnumpy.ndarray or scipy.sparse

input features, can be a dense or sparse matrix of size (n_samples, n_features)

ynumpy.ndaarray or scipy.sparse {0,1}

binary indicator matrix with label assignments, shape (n_samples, n_labels)

Returns

fitted instance of self

predict(X)

Predict label assignments

Parameters

Xnumpy.ndarray or scipy.sparse.csc_matrix

input features of shape (n_samples, n_features)

Returns

scipy.sparse of int

binary indicator matrix with label assignments with shape (n_samples, n_labels)

predict_proba(X)

Predict label probabilities

Parameters

Xnumpy.ndarray or scipy.sparse.csc_matrix

input features of shape (n_samples, n_features)

Returns

scipy.sparse of float

binary indicator matrix with probability of label assignment with shape (n_samples, n_labels)

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RakelD

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

class skmultilearn.ensemble.RakelO(base_classifier=None, model_count=None, labelset_size=3, base_classifier_require_dense=None)

Bases: MLClassifierBase

Overlapping RAndom k-labELsets multi-label classifier

Divides the label space in to m subsets of size k, trains a Label Powerset classifier for each subset and assign a label to an instance if more than half of all classifiers (majority) from clusters that contain the label assigned the label to the instance.

Parameters

base_classifier: BaseEstimator

scikit-learn compatible base classifier, will be set under self.classifier.classifier.

base_classifier_require_dense[bool, bool]

whether the base classifier requires [input, output] matrices in dense representation. Will be automatically set under self.classifier.require_dense

labelset_sizeint

the desired size of each of the partitions, parameter k according to paper. According to paper, the best parameter is 3, so it’s set as default Will be automatically set under self.labelset_size

model_countint

the desired number of classifiers, parameter m according to paper. According to paper, the best value for this parameter is 2M (being M the number of labels) Will be automatically set under self.model_count_.

Attributes

classifierMajorityVotingClassifier

the voting classifier initialized with LabelPowerset multi-label classifier with base_classifier and RandomLabelSpaceClusterer

References

If you use this class please cite the paper introducing the method:

@ARTICLE{5567103,
    author={G. Tsoumakas and I. Katakis and I. Vlahavas},
    journal={IEEE Transactions on Knowledge and Data Engineering},
    title={Random k-Labelsets for Multilabel Classification},
    year={2011},
    volume={23},
    number={7},
    pages={1079-1089},
    doi={10.1109/TKDE.2010.164},
    ISSN={1041-4347},
    month={July},
}

Examples

Here’s a simple example of how to use this class with a base classifier from scikit-learn to teach 6 classifiers each trained on a quarter of labels, which is sure to overlap:

from sklearn.naive_bayes import GaussianNB
from skmultilearn.ensemble import RakelO

classifier = RakelO(
    base_classifier=GaussianNB(),
    base_classifier_require_dense=[True, True],
    labelset_size=y_train.shape[1] // 4,
    model_count_=6
)

classifier.fit(X_train, y_train)
prediction = classifier.predict(X_train, y_train)
fit(X, y)

Fits classifier to training data

Parameters

Xarray_like, numpy.matrix or scipy.sparse matrix, shape=(n_samples, n_features)

input feature matrix

yarray_like, numpy.matrix or scipy.sparse matrix of {0, 1}, shape=(n_samples, n_labels)

binary indicator matrix with label assignments

Returns

self

fitted instance of self

predict(X)

Predict labels for X

Parameters

Xarray_like, numpy.matrix or scipy.sparse matrix, shape=(n_samples, n_features)

input feature matrix

Returns

scipy.sparse matrix of {0, 1}, shape=(n_samples, n_labels)

binary indicator matrix with label assignments

predict_proba(X)

Predict probabilities of label assignments for X

Parameters

Xarray_like, numpy.matrix or scipy.sparse matrix, shape=(n_samples, n_features)

input feature matrix

Returns

scipy.sparse matrix of float in [0.0, 1.0], shape=(n_samples, n_labels)

matrix with label assignment probabilities

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RakelO

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.


Cite us

If you use scikit-multilearn-ng in your research and publish it, please consider citing scikit-multilearn:

@ARTICLE{2017arXiv170201460S,
    author = {{Szyma{'n}ski}, P. and {Kajdanowicz}, T.},
    title = "{A scikit-based Python environment for performing multi-label classification}",
    journal = {ArXiv e-prints},
    archivePrefix = "arXiv",
    eprint = {1702.01460},
    primaryClass = "cs.LG",
    keywords = {Computer Science - Learning, Computer Science - Mathematical Software},
    year = 2017,
    month = feb,
}