skmultilearn.tree package

The skmultilearn.problem_transform.tree submodule provides tree-based multi-label classifiers.

Available Classifier: +—————————-+———————————————————————————+ | Classifier | Description | |============================|=================================================================================| | PredictiveClusteringTree | A predictive clustering tree algorithm for multi-label classification. | +—————————-+———————————————————————————+

Available criteria’s: +————————-+———————————————————————————+ | Criterion | Description | |=========================|=================================================================================| | GiniCriterion | Gini impurity criterion. | +————————-+———————————————————————————+ | EntropyCriterion | Information gain criterion. | +————————-+———————————————————————————+ | CorrelationCriterion | Correlation criterion. | +————————-+———————————————————————————+

class skmultilearn.tree.CorrelationCriterion

Bases: SplitCriterion

calculate_gain(base_impurity, left_labels, right_labels)

Calculate the information gain from a split.

calculate_impurity(labels)

Calculate the impurity of a dataset.

class skmultilearn.tree.EntropyCriterion

Bases: SplitCriterion

calculate_gain(base_impurity, left_labels, right_labels)

Calculate the information gain from a split.

calculate_impurity(labels)

Calculate the impurity of a dataset.

class skmultilearn.tree.GiniCriterion

Bases: SplitCriterion

calculate_gain(base_impurity, left_labels, right_labels)

Calculate the information gain from a split.

calculate_impurity(labels)

Calculate the impurity of a dataset.

class skmultilearn.tree.PredictiveClusteringTree(classifier=DecisionTreeClassifier(), criterion=<skmultilearn.tree.pct.GiniCriterion object>, max_depth=5, min_samples_split=2, min_samples_leaf=1)

Bases: BaseEstimator, ClassifierMixin

A predictive clustering tree (PCT) algorithm for multi-label classification that supports multiple split criteria.

This algorithm constructs a decision tree where each leaf node represents a multi-label classifier trained on a subset of the data. It partitions the feature space recursively, aiming to find splits that lead to optimal separation based on a specified impurity criterion, thereby potentially capturing complex label dependencies more effectively.

The flexibility to choose between different splitting criteria (e.g., Gini, entropy, correlation) allows for tailored approaches to handling multi-label data, enabling the algorithm to better accommodate the specific characteristics and correlations present in the labels.

Parameters

classifierestimator, default=DecisionTreeClassifier()

The base classifier used at each leaf node of the tree. This classifier is trained on the subsets of data determined by the tree splits.¨

criterionSplitCriterion instance, default=GiniCriterion()

The criterion used to evaluate splits. Must be an instance of a class that extends the SplitCriterion abstract base class.

max_depthint, default=5

The maximum depth of the tree. Limits the number of recursive splits to prevent overfitting.

min_samples_splitint, default=2

The minimum number of samples required to consider splitting an internal node. Helps prevent creating nodes with too few samples.

min_samples_leafint, default=1

The minimum number of samples a leaf node must have. Ensures that each leaf has a minimum size,impacting the granularity of the model.

Attributes

n_features_in_int

The number of features in the input data upon fitting the model.

tree_Node

The root node of the decision tree. Each node in the tree represents a decision point or a leaf with an associated classifier.

Methods

fit(X, y):

Fit the predictive clustering tree model to the training data.

predict(X):

Predict multi-label outputs for the input data using the trained tree.

Notes

Note

The tree-building process relies heavily on the chosen split criterion’s ability to evaluate and select the most informative splits. Custom split criteria can be implemented by extending the SplitCriterion abstract base class.

Note

Currenlty only dense input data is supported.

Examples

from sklearn.datasets import make_multilabel_classification
X, y = make_multilabel_classification(n_samples=100, n_features=20, n_classes=3, n_labels=2, random_state=42)
pct = PredictiveClusteringTree(criterion=GiniCriterion(), max_depth=4, min_samples_split=2, min_samples_leaf=1)
pct.fit(X, y)
pct.predict(X[0:5])
class Node

Bases: object

Inner class representing a node in the predictive clustering tree.

Attributes

feature_indexint or None

The index of the feature used for splitting at this node.

thresholdfloat or None

The threshold value for the split.

leftNode or None

The left child node.

rightNode or None

The right child node.

classifierestimator or None

The classifier associated with the leaf node.

fit(X, y)

Fit the predictive clustering tree to the training data.

Parameters

Xarray-like of shape (n_samples, n_features)

The input feature matrix.

yarray-like of shape (n_samples, n_labels)

The binary indicator matrix with label assignments.

Returns

selfobject

The fitted instance of the classifier.

predict(X)

Predict multi-label outputs for the input data.

Parameters

Xarray-like of shape (n_samples, n_features)

The input feature matrix.

Returns

predictionsarray-like of shape (n_samples, n_labels)

The binary indicator matrix with predicted label assignments.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PredictiveClusteringTree

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.


Cite us

If you use scikit-multilearn-ng in your research and publish it, please consider citing scikit-multilearn:

@ARTICLE{2017arXiv170201460S,
    author = {{Szyma{'n}ski}, P. and {Kajdanowicz}, T.},
    title = "{A scikit-based Python environment for performing multi-label classification}",
    journal = {ArXiv e-prints},
    archivePrefix = "arXiv",
    eprint = {1702.01460},
    primaryClass = "cs.LG",
    keywords = {Computer Science - Learning, Computer Science - Mathematical Software},
    year = 2017,
    month = feb,
}