Feature

Feature

Tools for handling features and feature labels during classification, data preparation and evaluation

best.feature.augment_features(x, feature_names=None, feature_indexes=[], operation=None, mutual=False, operation_str='')

Augments features with entered operations (mutual - between features such as *, /, +, -, ….; non mutual - log, exp, power, …)

Parameters
  • x (numpy ndarray) – shape[n_samples, n_features]

  • feature_names (list or numpy array of strings, optional) – names of features

  • feature_indexes (list or numpy array) – indexes of features which will be augmented

  • operation (function) – callable function which will be applied on existing features.

  • mutual (bool) – indicates whether operation is applied on single feature e.g. np.log10, or on 2 parameters e.g. np.divide if mutual = True, then applied on all feature combination specified in feature_indexes

Returns

Return type

numpy ndarray -> shape[n_samples, n_features]

best.feature.balance_classes(x, y, std_factor=0.0)

Balances unbalanced classes in dataset by extending the sample array with same samples, possibly with introduced noise. Detects classes from y variable and number of samples per category. Duplicates samples from the categories with lower number of samples. std_factor gives the level of noise introduced into duplicated samples relatively to the std of a given dimension for a given category.

Parameters
  • x (numpy ndarray) – shape[n_samples, n_features]

  • y (list or numpy array) – string or int indexes for each category

  • std_factor (float) – Amount of noise introduced into duplicated features relatively to std of a given feature within a category.

Returns

  • numpy ndarray – x - samples

  • list – y - categories

best.feature.find_category_outliers(x, y=None)

Finds outliers for each category within data. Check website: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.LocalOutlierFactor.html

Parameters
  • x (numpy ndarray) – shape[n_samples, n_features]

  • y (list or numpy array) – string or int indexes for each category

Returns

position index list with detected outliers

Return type

list

best.feature.get_classification_scores(Y, YY, labels=None)

Returns a classification report. All values are already in a formated string.

best.feature.print_classification_scores(Y, YY, N_merge=False)

Prints classification report for sleep scoring labels.

best.feature.remove_features(x, feature_names=None, to_del=None)

Removes features

Parameters
  • x (numpy ndarray) – shape[n_samples, n_features]

  • feature_names (list or numpy array, optional) – names of features

  • to_del

best.feature.remove_samples(x, y=None, to_del=None)

Removes samples

Parameters
  • x (numpy ndarray / list / pd.DataFrame) – shape[n_samples, n_features]

  • y (list or numpy array, optional) – category reference for each sample

  • to_del

best.feature.replace_annotations(Y, old_key=None, new_key=None)

Replaces annotation names in a numpy array or list

best.feature.zscore(x)

Calculates Z-score :param x: shape[n_samples, n_features] :type x: numpy ndarray

Returns

normalized_features

Return type

numpy ndarray