Feature¶

Tools for handling features and feature labels during classification, data preparation and evaluation

best.feature.augment_features(x, feature_names=None, feature_indexes=[], operation=None, mutual=False, operation_str='')¶

Augments features with entered operations (mutual - between features such as *, /, +, -, ….; non mutual - log, exp, power, …)

Parameters

x (numpy ndarray) – shape[n_samples, n_features]
feature_names (list or numpy array of strings, optional) – names of features
feature_indexes (list or numpy array) – indexes of features which will be augmented
operation (function) – callable function which will be applied on existing features.
mutual (bool) – indicates whether operation is applied on single feature e.g. np.log10, or on 2 parameters e.g. np.divide if mutual = True, then applied on all feature combination specified in feature_indexes

Returns

Return type

numpy ndarray -> shape[n_samples, n_features]

best.feature.balance_classes(x, y, std_factor=0.0)¶

Balances unbalanced classes in dataset by extending the sample array with same samples, possibly with introduced noise. Detects classes from y variable and number of samples per category. Duplicates samples from the categories with lower number of samples. std_factor gives the level of noise introduced into duplicated samples relatively to the std of a given dimension for a given category.

Parameters

x (numpy ndarray) – shape[n_samples, n_features]
y (list or numpy array) – string or int indexes for each category
std_factor (float) – Amount of noise introduced into duplicated features relatively to std of a given feature within a category.

Returns

numpy ndarray – x - samples
list – y - categories

best.feature.find_category_outliers(x, y=None)¶

Finds outliers for each category within data. Check website: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.LocalOutlierFactor.html

Parameters

x (numpy ndarray) – shape[n_samples, n_features]
y (list or numpy array) – string or int indexes for each category

Returns

position index list with detected outliers

Return type

list

best.feature.get_classification_scores(Y, YY, labels=None)¶: Returns a classification report. All values are already in a formated string.

best.feature.print_classification_scores(Y, YY, N_merge=False)¶: Prints classification report for sleep scoring labels.

best.feature.remove_features(x, feature_names=None, to_del=None)¶

Removes features

Parameters

x (numpy ndarray) – shape[n_samples, n_features]
feature_names (list or numpy array, optional) – names of features
to_del –

best.feature.remove_samples(x, y=None, to_del=None)¶

Removes samples

Parameters

x (numpy ndarray / list / pd.DataFrame) – shape[n_samples, n_features]
y (list or numpy array, optional) – category reference for each sample
to_del –

best.feature.replace_annotations(Y, old_key=None, new_key=None)¶: Replaces annotation names in a numpy array or list

best.feature.zscore(x)¶

Calculates Z-score :param x: shape[n_samples, n_features] :type x: numpy ndarray

Returns: normalized_features
Return type: numpy ndarray