skmultilearn package¶

Multi-label classification module for Python¶

Scikit-multilearn-ng is a BSD-licensed library for multi-label classification that is built on top of the well-known scikit-learn ecosystem.

Subpackages¶

Submodules¶

skmultilearn.dataset module¶

skmultilearn.dataset.available_data_sets()¶

Lists available data sets and their variants

Returns¶

dict[(set_name, variant_name)] -> [md5, file_name]: available datasets and their variants with the key pertaining to the (set_name, variant_name) and values include md5 and file name on server

skmultilearn.dataset.clear_data_home(data_home=None)¶

Delete all the content of the data home cache.

Parameters¶

data_homestr (default is None): the path to the directory in which scikit-multilearn data sets should be stored.

skmultilearn.dataset.download_dataset(set_name, variant, data_home=None)¶

Downloads a data set

Parameters¶

set_namestr: name of set from available_data_sets()
variantstr: variant of the data set from available_data_sets()
data_homedefault None, str: custom base folder for data, if None, default is used

Returns¶

str: path to the downloaded data set file on disk

skmultilearn.dataset.get_data_home(data_home=None, subdirectory='')¶

Return the path of the scikit-multilearn data dir.

This folder is used by some large dataset loaders to avoid downloading the data several times.

By default the data_home is set to a folder named 'scikit_ml_learn_data' in the user home folder.

Alternatively, it can be set by the 'SCIKIT_ML_LEARN_DATA' environment variable or programmatically by giving an explicit folder path. The '~' symbol is expanded to the user home folder.

If the folder does not already exist, it is automatically created.

Parameters¶

data_homestr (default is None): the path to the directory in which scikit-multilearn data sets should be stored, if None the path is generated as stated above
subdirectorystr, default ‘’: return path subdirectory under data_home if data_home passed or under default if not passed

Returns¶

str: the path to the data home

skmultilearn.dataset.load_dataset(set_name, variant, data_home=None)¶

Loads a selected variant of the given data set

Parameters¶

set_namestr: name of set from available_data_sets()
variantstr: variant of the data set
data_homedefault None, str: custom base folder for data, if None, default is used

Returns¶

dict: the loaded multilabel data set variant in the scikit-multilearn format, see data_sets

skmultilearn.dataset.load_dataset_dump(filename)¶

Loads a compressed data set dump

Parameters¶

filenamestr: path to dump file, if without .bz2 ending, the .bz2 extension will be appended.

Returns¶

Xarray_like, numpy.matrix or scipy.sparse matrix, shape=(n_samples, n_features): input feature matrix
yarray_like, numpy.matrix or scipy.sparse matrix of {0, 1}, shape=(n_samples, n_labels): binary indicator matrix with label assignments
names of attributes: List[str]: list of attribute names for X columns
names of labels: List[str]: list of label names for y columns

skmultilearn.dataset.load_from_arff(filename, label_count, label_location='end', input_feature_type='float', encode_nominal=True, load_sparse=False, return_attribute_definitions=False)¶

Method for loading ARFF files as numpy array

Parameters¶

filenamestr: path to ARFF file
label_count: integer: number of labels in the ARFF file
label_location: str {“start”, “end”} (default is “end”): whether the ARFF file contains labels at the beginning of the attributes list (“start”, MEKA format) or at the end (“end”, MULAN format)
input_feature_type: numpy.type as string (default is “float”): the desire type of the contents of the return ‘X’ array-likes, default ‘i8’, should be a numpy type, see http://docs.scipy.org/doc/numpy/user/basics.types.html
encode_nominal: bool (default is True): whether convert categorical data into numeric factors - required for some scikit classifiers that can’t handle non-numeric input features.
load_sparse: boolean (default is False): whether to read arff file as a sparse file format, liac-arff breaks if sparse reading is enabled for non-sparse ARFFs.
return_attribute_definitions: boolean (default is False): whether to return the definitions for each attribute in the dataset

Returns¶

Xscipy.sparse.lil_matrix of input_feature_type, shape=(n_samples, n_features): input feature matrix
yscipy.sparse.lil_matrix of {0, 1}, shape=(n_samples, n_labels): binary indicator matrix with label assignments
names of attributesList[str]: list of attribute names from ARFF file

skmultilearn.dataset.save_dataset_dump(input_space, labels, feature_names, label_names, filename=None)¶

Saves a compressed data set dump

Parameters¶

input_space: array-like of array-likes: Input space array-like of input feature vectors
labels: array-like of binary label vectors: Array-like of labels assigned to each input vector, as a binary indicator vector (i.e. if 5th position has value 1 then the input vector has label no. 5)
feature_names: array-like,optional: names of features
label_names: array-like, optional: names of labels
filenamestr, optional: Path to dump file, if without .bz2, the .bz2 extension will be appended.

skmultilearn.dataset.save_to_arff(X, y, label_location='end', save_sparse=True, filename=None)¶

Method for dumping data to ARFF files

Parameters¶

Xarray_like, numpy.matrix or scipy.sparse matrix, shape=(n_samples, n_features): input feature matrix
yarray_like, numpy.matrix or scipy.sparse matrix of {0, 1}, shape=(n_samples, n_labels): binary indicator matrix with label assignments
label_location: string {“start”, “end”} (default is “end”): whether the ARFF file will contain labels at the beginning of the attributes list (“start”, MEKA format) or at the end (“end”, MULAN format)
save_sparse: boolean: Whether to save in ARFF’s sparse dictionary-like format instead of listing all zeroes within file, very useful in multi-label classification.
filenamestr or None: Path to ARFF file, if None, the ARFF representation is returned as string

Returns¶

str or None: the ARFF dump string, if filename is None

skmultilearn.utils module¶

skmultilearn.utils.get_matrix_in_format(original_matrix, matrix_format)¶

Converts matrix to format

Parameters¶

original_matrixnp.matrix or scipy matrix or np.array of np. arrays: matrix to convert
matrix_formatstring: format

Returns¶

matrixscipy matrix: matrix in given format

skmultilearn.utils.matrix_creation_function_for_format(sparse_format)¶

skmultilearn.utils.measure_per_label(measure, y_true, y_predicted)¶

Return per label results of a scikit-learn compatible quality measure

Parameters¶

measurecallable: scikit-compatible quality measure function
y_truesparse matrix: ground truth
y_predictedsparse matrix: the predicted result

Returns¶

List[int or float]: score from a given measure depending on what the measure returns

Cite us

If you use scikit-multilearn-ng in your research and publish it, please consider citing scikit-multilearn:

@ARTICLE{2017arXiv170201460S,
    author = {{Szyma{'n}ski}, P. and {Kajdanowicz}, T.},
    title = "{A scikit-based Python environment for performing multi-label classification}",
    journal = {ArXiv e-prints},
    archivePrefix = "arXiv",
    eprint = {1702.01460},
    primaryClass = "cs.LG",
    keywords = {Computer Science - Learning, Computer Science - Mathematical Software},
    year = 2017,
    month = feb,
}

skmultilearn package¶

Multi-label classification module for Python¶

Subpackages¶

Submodules¶

skmultilearn.dataset module¶

Returns¶

Parameters¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Parameters¶

Returns¶

skmultilearn.utils module¶

Parameters¶

Returns¶

Parameters¶

Returns¶

scikit-multilearn-ng

Navigation

Related Topics