skmultilearn.ensemble package

The skmultilearn.ensemble module implements ensemble classification schemes that construct an ensemble of base multi-label classifiers.

Currently the following ensemble classification schemes are available in scikit-multilearn:

Classifier name



Distinct RAndom k-labELsets multi-label classifier


Overlapping RAndom k-labELsets multi-label classifier.


a label space partitioning classifier that trains a classifier per label subspace as clustered using methods from skmultilearn.cluster.


a label space division classifier that trains a classifier per label subspace as clustered using methods from skmultilearn.cluster and assign labels if the majority of classifiers that contain the label agree on the assignment.

class skmultilearn.ensemble.LabelSpacePartitioningClassifier(classifier=None, clusterer=None, require_dense=None)

Bases: BinaryRelevance

Partition label space and classify each subspace separately

This classifier performs classification by:

1. partitioning the label space into separate, smaller multi-label sub problems, using the supplied label space clusterer

  1. training an instance of the supplied base mult-label classifier for each label space subset in the partition

  2. predicting the result with each of subclassifiers and returning the sum of their results



the base classifier that will be used in a class, will be automatically put under self.classifier.


object that partitions the output space, will be automatically put under self.clusterer.

require_dense[bool, bool]

whether the base classifier requires [input, output] matrices in dense representation, will be automatically put under self.require_dense.



number of trained models, in this classifier equal to the number of partitions

partition_List[List[int]], shape=(model_count_,)

list of lists of label indexes, used to index the output space matrix, set in _generate_partition() via fit()

classifiersList[BaseEstimator], shape=(model_count_,)

list of classifiers trained per partition, set in fit()


If you use this clusterer please cite the clustering paper:

    author = {Szymański, Piotr and Kajdanowicz, Tomasz and Kersting, Kristian},
    title = {How Is a Data-Driven Approach Better than Random Choice in
    Label Space Division for Multi-Label Classification?},
    journal = {Entropy},
    volume = {18},
    year = {2016},
    number = {8},
    article_number = {282},
    url = {},
    issn = {1099-4300},
    doi = {10.3390/e18080282}


Here’s an example of building a partitioned ensemble of Classifier Chains

from skmultilearn.ensemble import MajorityVotingClassifier
from skmultilearn.cluster import FixedLabelSpaceClusterer
from skmultilearn.problem_transform import ClassifierChain
from sklearn.naive_bayes import GaussianNB

classifier = MajorityVotingClassifier(
    clusterer = FixedLabelSpaceClusterer(clusters = [[1,3,4], [0, 2, 5]]),
    classifier = ClassifierChain(classifier=GaussianNB())
predictions = classifier.predict(X_test)

More advanced examples can be found in the label relations exploration guide


Predict labels for X


Xnumpy.ndarray or scipy.sparse.csc_matrix

input features of shape (n_samples, n_features)


scipy.sparse of int

binary indicator matrix with label assignments with shape (n_samples, n_labels)

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LabelSpacePartitioningClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.


This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.


sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.



The updated object.

class skmultilearn.ensemble.MajorityVotingClassifier(classifier=None, clusterer=None, require_dense=None)

Bases: LabelSpacePartitioningClassifier

Majority Voting ensemble classifier

Divides the label space using provided clusterer class, trains a provided base classifier type classifier for each subset and assign a label to an instance if more than half of all classifiers (majority) from clusters that contain the label assigned the label to the instance.



the base classifier that will be used in a class, will be automatically put under self.classifier.


object that partitions the output space, will be automatically put under self.clusterer.

require_dense[bool, bool]

whether the base classifier requires [input, output] matrices in dense representation, will be automatically put under self.require_dense.



number of trained models, in this classifier equal to the number of partitions

partition_List[List[int]], shape=(model_count_,)

list of lists of label indexes, used to index the output space matrix, set in _generate_partition() via fit()

classifiersList[BaseEstimator], shape=(model_count_,)

list of classifiers trained per partition, set in fit()


Here’s an example of building an overlapping ensemble of chains

from skmultilearn.ensemble import MajorityVotingClassifier
from skmultilearn.cluster import FixedLabelSpaceClusterer
from skmultilearn.problem_transform import ClassifierChain
from sklearn.naive_bayes import GaussianNB

classifier = MajorityVotingClassifier(
    clusterer = FixedLabelSpaceClusterer(clusters = [[1,2,3], [0, 2, 5], [4, 5]]),
    classifier = ClassifierChain(classifier=GaussianNB())
predictions = classifier.predict(X_test)

More advanced examples can be found in the label relations exploration guide


Predict label assignments for X


Xnumpy.ndarray or scipy.sparse.csc_matrix

input features of shape (n_samples, n_features)


scipy.sparse of float

binary indicator matrix with label assignments with shape (n_samples, n_labels)


Predict probabilities of label assignments for X


Xarray_like, numpy.matrix or scipy.sparse matrix, shape=(n_samples, n_features)

input feature matrix


scipy.sparse matrix of float in [0.0, 1.0], shape=(n_samples, n_labels)

matrix with label assignment probabilities

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MajorityVotingClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.


This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.


sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.



The updated object.

class skmultilearn.ensemble.RakelD(base_classifier=None, labelset_size=3, base_classifier_require_dense=None)

Bases: MLClassifierBase

Distinct RAndom k-labELsets multi-label classifier.

Divides the label space in to equal partitions of size k, trains a Label Powerset classifier per partition and predicts by summing the result of all trained classifiers.



the base classifier that will be used in a class, will be automatically put under self.classifier for future access.

base_classifier_require_dense[bool, bool]

whether the base classifier requires [input, output] matrices in dense representation, will be automatically put under self.require_dense


the desired size of each of the partitions, parameter k according to paper Default is 3, according to paper it has the best results



the number of labels the classifier is fit to, set by fit()


the number of sub classifiers trained, set by fit()

classifier_: skmultilearn.ensemble.LabelSpacePartitioningClassifier

the underneath classifier that perform the label space partitioning using a random clusterer skmultilearn.ensemble.RandomLabelSpaceClusterer


If you use this class please cite the paper introducing the method:

    author={G. Tsoumakas and I. Katakis and I. Vlahavas},
    journal={IEEE Transactions on Knowledge and Data Engineering},
    title={Random k-Labelsets for Multilabel Classification},


Here’s a simple example of how to use this class with a base classifier from scikit-learn to teach non-overlapping classifiers each trained on at most four labels:

from sklearn.naive_bayes import GaussianNB
from skmultilearn.ensemble import RakelD

classifier = RakelD(
    base_classifier_require_dense=[True, True],
), y_train)
prediction = classifier.predict(X_test)
fit(X, y)

Fit classifier to multi-label data


Xnumpy.ndarray or scipy.sparse

input features, can be a dense or sparse matrix of size (n_samples, n_features)

ynumpy.ndaarray or scipy.sparse {0,1}

binary indicator matrix with label assignments, shape (n_samples, n_labels)


fitted instance of self


Predict label assignments


Xnumpy.ndarray or scipy.sparse.csc_matrix

input features of shape (n_samples, n_features)


scipy.sparse of int

binary indicator matrix with label assignments with shape (n_samples, n_labels)


Predict label probabilities


Xnumpy.ndarray or scipy.sparse.csc_matrix

input features of shape (n_samples, n_features)


scipy.sparse of float

binary indicator matrix with probability of label assignment with shape (n_samples, n_labels)

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RakelD

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.


This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.


sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.



The updated object.

class skmultilearn.ensemble.RakelO(base_classifier=None, model_count=None, labelset_size=3, base_classifier_require_dense=None)

Bases: MLClassifierBase

Overlapping RAndom k-labELsets multi-label classifier

Divides the label space in to m subsets of size k, trains a Label Powerset classifier for each subset and assign a label to an instance if more than half of all classifiers (majority) from clusters that contain the label assigned the label to the instance.


base_classifier: BaseEstimator

scikit-learn compatible base classifier, will be set under self.classifier.classifier.

base_classifier_require_dense[bool, bool]

whether the base classifier requires [input, output] matrices in dense representation. Will be automatically set under self.classifier.require_dense


the desired size of each of the partitions, parameter k according to paper. According to paper, the best parameter is 3, so it’s set as default Will be automatically set under self.labelset_size


the desired number of classifiers, parameter m according to paper. According to paper, the best value for this parameter is 2M (being M the number of labels) Will be automatically set under self.model_count_.



the voting classifier initialized with LabelPowerset multi-label classifier with base_classifier and RandomLabelSpaceClusterer


If you use this class please cite the paper introducing the method:

    author={G. Tsoumakas and I. Katakis and I. Vlahavas},
    journal={IEEE Transactions on Knowledge and Data Engineering},
    title={Random k-Labelsets for Multilabel Classification},


Here’s a simple example of how to use this class with a base classifier from scikit-learn to teach 6 classifiers each trained on a quarter of labels, which is sure to overlap:

from sklearn.naive_bayes import GaussianNB
from skmultilearn.ensemble import RakelO

classifier = RakelO(
    base_classifier_require_dense=[True, True],
    labelset_size=y_train.shape[1] // 4,
), y_train)
prediction = classifier.predict(X_train, y_train)
fit(X, y)

Fits classifier to training data


Xarray_like, numpy.matrix or scipy.sparse matrix, shape=(n_samples, n_features)

input feature matrix

yarray_like, numpy.matrix or scipy.sparse matrix of {0, 1}, shape=(n_samples, n_labels)

binary indicator matrix with label assignments



fitted instance of self


Predict labels for X


Xarray_like, numpy.matrix or scipy.sparse matrix, shape=(n_samples, n_features)

input feature matrix


scipy.sparse matrix of {0, 1}, shape=(n_samples, n_labels)

binary indicator matrix with label assignments


Predict probabilities of label assignments for X


Xarray_like, numpy.matrix or scipy.sparse matrix, shape=(n_samples, n_features)

input feature matrix


scipy.sparse matrix of float in [0.0, 1.0], shape=(n_samples, n_labels)

matrix with label assignment probabilities

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RakelO

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.


This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.


sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.



The updated object.

Cite us

If you use scikit-multilearn-ng in your research and publish it, please consider citing scikit-multilearn:

    author = {{Szyma{'n}ski}, P. and {Kajdanowicz}, T.},
    title = "{A scikit-based Python environment for performing multi-label classification}",
    journal = {ArXiv e-prints},
    archivePrefix = "arXiv",
    eprint = {1702.01460},
    primaryClass = "cs.LG",
    keywords = {Computer Science - Learning, Computer Science - Mathematical Software},
    year = 2017,
    month = feb,