skmultilearn.tree package¶

The skmultilearn.problem_transform.tree submodule provides tree-based multi-label classifiers.

class skmultilearn.tree.CorrelationCriterion¶

Bases: SplitCriterion

calculate_gain(base_impurity, left_labels, right_labels)¶: Calculate the information gain from a split.

calculate_impurity(labels)¶: Calculate the impurity of a dataset.

class skmultilearn.tree.EntropyCriterion¶

Bases: SplitCriterion

calculate_gain(base_impurity, left_labels, right_labels)¶: Calculate the information gain from a split.

calculate_impurity(labels)¶: Calculate the impurity of a dataset.

class skmultilearn.tree.GiniCriterion¶

Bases: SplitCriterion

calculate_gain(base_impurity, left_labels, right_labels)¶: Calculate the information gain from a split.

calculate_impurity(labels)¶: Calculate the impurity of a dataset.

class skmultilearn.tree.PredictiveClusteringTree(classifier=DecisionTreeClassifier(), criterion=<skmultilearn.tree.pct.GiniCriterion object>, max_depth=5, min_samples_split=2, min_samples_leaf=1)¶

Bases: BaseEstimator, ClassifierMixin

A predictive clustering tree (PCT) algorithm for multi-label classification that supports multiple split criteria.

This algorithm constructs a decision tree where each leaf node represents a multi-label classifier trained on a subset of the data. It partitions the feature space recursively, aiming to find splits that lead to optimal separation based on a specified impurity criterion, thereby potentially capturing complex label dependencies more effectively.

The flexibility to choose between different splitting criteria (e.g., Gini, entropy, correlation) allows for tailored approaches to handling multi-label data, enabling the algorithm to better accommodate the specific characteristics and correlations present in the labels.

Parameters¶

classifierestimator, default=DecisionTreeClassifier(): The base classifier used at each leaf node of the tree. This classifier is trained on the subsets of data determined by the tree splits.¨
criterionSplitCriterion instance, default=GiniCriterion(): The criterion used to evaluate splits. Must be an instance of a class that extends the SplitCriterion abstract base class.
max_depthint, default=5: The maximum depth of the tree. Limits the number of recursive splits to prevent overfitting.
min_samples_splitint, default=2: The minimum number of samples required to consider splitting an internal node. Helps prevent creating nodes with too few samples.
min_samples_leafint, default=1: The minimum number of samples a leaf node must have. Ensures that each leaf has a minimum size,impacting the granularity of the model.

Attributes¶

n_features_in_int: The number of features in the input data upon fitting the model.
tree_Node: The root node of the decision tree. Each node in the tree represents a decision point or a leaf with an associated classifier.

Methods¶

fit(X, y):: Fit the predictive clustering tree model to the training data.
predict(X):: Predict multi-label outputs for the input data using the trained tree.

Notes¶

Note

The tree-building process relies heavily on the chosen split criterion’s ability to evaluate and select the most informative splits. Custom split criteria can be implemented by extending the SplitCriterion abstract base class.

Note

Currenlty only dense input data is supported.

Examples¶

from sklearn.datasets import make_multilabel_classification
X, y = make_multilabel_classification(n_samples=100, n_features=20, n_classes=3, n_labels=2, random_state=42)
pct = PredictiveClusteringTree(criterion=GiniCriterion(), max_depth=4, min_samples_split=2, min_samples_leaf=1)
pct.fit(X, y)
pct.predict(X[0:5])

class Node¶

Bases: object

Inner class representing a node in the predictive clustering tree.

Attributes¶

feature_indexint or None: The index of the feature used for splitting at this node.
thresholdfloat or None: The threshold value for the split.
leftNode or None: The left child node.
rightNode or None: The right child node.
classifierestimator or None: The classifier associated with the leaf node.

fit(X, y)¶

Fit the predictive clustering tree to the training data.

Parameters¶

Xarray-like of shape (n_samples, n_features): The input feature matrix.
yarray-like of shape (n_samples, n_labels): The binary indicator matrix with label assignments.

Returns¶

selfobject: The fitted instance of the classifier.

predict(X)¶

Predict multi-label outputs for the input data.

Parameters¶

Xarray-like of shape (n_samples, n_features): The input feature matrix.

Returns¶

predictionsarray-like of shape (n_samples, n_labels): The binary indicator matrix with predicted label assignments.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → PredictiveClusteringTree¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters¶

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns¶

selfobject: The updated object.

Cite us

If you use scikit-multilearn-ng in your research and publish it, please consider citing scikit-multilearn:

@ARTICLE{2017arXiv170201460S,
    author = {{Szyma{'n}ski}, P. and {Kajdanowicz}, T.},
    title = "{A scikit-based Python environment for performing multi-label classification}",
    journal = {ArXiv e-prints},
    archivePrefix = "arXiv",
    eprint = {1702.01460},
    primaryClass = "cs.LG",
    keywords = {Computer Science - Learning, Computer Science - Mathematical Software},
    year = 2017,
    month = feb,
}

skmultilearn.tree package¶

Parameters¶

Attributes¶

Methods¶

Notes¶

Examples¶

Attributes¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

scikit-multilearn-ng

Navigation

Related Topics