skmultilearn package¶
Multi-label classification module for Python¶
Scikit-multilearn-ng is a BSD-licensed library for multi-label classification that is built on top of the well-known scikit-learn ecosystem.
Subpackages¶
- skmultilearn.adapt package
 - skmultilearn.base package
 - skmultilearn.cluster package
 - skmultilearn.embedding package
 - skmultilearn.ensemble package
 - skmultilearn.ext package
 - skmultilearn.missing package
 - skmultilearn.model_selection package
 - skmultilearn.problem_transform package
 - skmultilearn.tree package
 
Submodules¶
skmultilearn.dataset module¶
- skmultilearn.dataset.available_data_sets()¶
 Lists available data sets and their variants
Returns¶
- dict[(set_name, variant_name)] -> [md5, file_name]
 available datasets and their variants with the key pertaining to the
(set_name, variant_name)and values include md5 and file name on server
- skmultilearn.dataset.clear_data_home(data_home=None)¶
 Delete all the content of the data home cache.
Parameters¶
- data_homestr (default is None)
 the path to the directory in which scikit-multilearn data sets should be stored.
- skmultilearn.dataset.download_dataset(set_name, variant, data_home=None)¶
 Downloads a data set
Parameters¶
- set_namestr
 name of set from
available_data_sets()- variantstr
 variant of the data set from
available_data_sets()- data_homedefault None, str
 custom base folder for data, if None, default is used
Returns¶
- str
 path to the downloaded data set file on disk
- skmultilearn.dataset.get_data_home(data_home=None, subdirectory='')¶
 Return the path of the scikit-multilearn data dir.
This folder is used by some large dataset loaders to avoid downloading the data several times.
By default the
data_homeis set to a folder named'scikit_ml_learn_data'in the user home folder.Alternatively, it can be set by the
'SCIKIT_ML_LEARN_DATA'environment variable or programmatically by giving an explicit folder path. The'~'symbol is expanded to the user home folder.If the folder does not already exist, it is automatically created.
Parameters¶
- data_homestr (default is None)
 the path to the directory in which scikit-multilearn data sets should be stored, if None the path is generated as stated above
- subdirectorystr, default ‘’
 return path subdirectory under data_home if data_home passed or under default if not passed
Returns¶
- str
 the path to the data home
- skmultilearn.dataset.load_dataset(set_name, variant, data_home=None)¶
 Loads a selected variant of the given data set
Parameters¶
- set_namestr
 name of set from
available_data_sets()- variantstr
 variant of the data set
- data_homedefault None, str
 custom base folder for data, if None, default is used
Returns¶
- dict
 the loaded multilabel data set variant in the scikit-multilearn format, see data_sets
- skmultilearn.dataset.load_dataset_dump(filename)¶
 Loads a compressed data set dump
Parameters¶
- filenamestr
 path to dump file, if without .bz2 ending, the .bz2 extension will be appended.
Returns¶
- Xarray_like, 
numpy.matrixorscipy.sparsematrix, shape=(n_samples, n_features) input feature matrix
- yarray_like, 
numpy.matrixorscipy.sparsematrix of {0, 1}, shape=(n_samples, n_labels) binary indicator matrix with label assignments
- names of attributes: List[str]
 list of attribute names for X columns
- names of labels: List[str]
 list of label names for y columns
- skmultilearn.dataset.load_from_arff(filename, label_count, label_location='end', input_feature_type='float', encode_nominal=True, load_sparse=False, return_attribute_definitions=False)¶
 Method for loading ARFF files as numpy array
Parameters¶
- filenamestr
 path to ARFF file
- label_count: integer
 number of labels in the ARFF file
- label_location: str {“start”, “end”} (default is “end”)
 whether the ARFF file contains labels at the beginning of the attributes list (“start”, MEKA format) or at the end (“end”, MULAN format)
- input_feature_type: numpy.type as string (default is “float”)
 the desire type of the contents of the return ‘X’ array-likes, default ‘i8’, should be a numpy type, see http://docs.scipy.org/doc/numpy/user/basics.types.html
- encode_nominal: bool (default is True)
 whether convert categorical data into numeric factors - required for some scikit classifiers that can’t handle non-numeric input features.
- load_sparse: boolean (default is False)
 whether to read arff file as a sparse file format, liac-arff breaks if sparse reading is enabled for non-sparse ARFFs.
- return_attribute_definitions: boolean (default is False)
 whether to return the definitions for each attribute in the dataset
Returns¶
- X
scipy.sparse.lil_matrixof input_feature_type, shape=(n_samples, n_features) input feature matrix
- y
scipy.sparse.lil_matrixof {0, 1}, shape=(n_samples, n_labels) binary indicator matrix with label assignments
- names of attributesList[str]
 list of attribute names from ARFF file
- skmultilearn.dataset.save_dataset_dump(input_space, labels, feature_names, label_names, filename=None)¶
 Saves a compressed data set dump
Parameters¶
- input_space: array-like of array-likes
 Input space array-like of input feature vectors
- labels: array-like of binary label vectors
 Array-like of labels assigned to each input vector, as a binary indicator vector (i.e. if 5th position has value 1 then the input vector has label no. 5)
- feature_names: array-like,optional
 names of features
- label_names: array-like, optional
 names of labels
- filenamestr, optional
 Path to dump file, if without .bz2, the .bz2 extension will be appended.
- skmultilearn.dataset.save_to_arff(X, y, label_location='end', save_sparse=True, filename=None)¶
 Method for dumping data to ARFF files
Parameters¶
- Xarray_like, 
numpy.matrixorscipy.sparsematrix, shape=(n_samples, n_features) input feature matrix
- yarray_like, 
numpy.matrixorscipy.sparsematrix of {0, 1}, shape=(n_samples, n_labels) binary indicator matrix with label assignments
- label_location: string {“start”, “end”} (default is “end”)
 whether the ARFF file will contain labels at the beginning of the attributes list (“start”, MEKA format) or at the end (“end”, MULAN format)
- save_sparse: boolean
 Whether to save in ARFF’s sparse dictionary-like format instead of listing all zeroes within file, very useful in multi-label classification.
- filenamestr or None
 Path to ARFF file, if None, the ARFF representation is returned as string
Returns¶
- str or None
 the ARFF dump string, if filename is None
- Xarray_like, 
 
skmultilearn.utils module¶
- skmultilearn.utils.get_matrix_in_format(original_matrix, matrix_format)¶
 Converts matrix to format
Parameters¶
- original_matrixnp.matrix or scipy matrix or np.array of np. arrays
 matrix to convert
- matrix_formatstring
 format
Returns¶
- matrixscipy matrix
 matrix in given format
- skmultilearn.utils.matrix_creation_function_for_format(sparse_format)¶
 
- skmultilearn.utils.measure_per_label(measure, y_true, y_predicted)¶
 Return per label results of a scikit-learn compatible quality measure
Parameters¶
- measurecallable
 scikit-compatible quality measure function
- y_truesparse matrix
 ground truth
- y_predictedsparse matrix
 the predicted result
Returns¶
- List[int or float]
 score from a given measure depending on what the measure returns
Cite us
If you use scikit-multilearn-ng in your research and publish it, please consider citing scikit-multilearn:
@ARTICLE{2017arXiv170201460S,
    author = {{Szyma{'n}ski}, P. and {Kajdanowicz}, T.},
    title = "{A scikit-based Python environment for performing multi-label classification}",
    journal = {ArXiv e-prints},
    archivePrefix = "arXiv",
    eprint = {1702.01460},
    primaryClass = "cs.LG",
    keywords = {Computer Science - Learning, Computer Science - Mathematical Software},
    year = 2017,
    month = feb,
}