Preprocessing package

Pre-processing module is a module responsible of analysing the raw energy data as provided from the NILMtk API. It contains a pre-processing sub-module that defines the different data transformations to be applied to the input data (e.g., data normalisation). It focuses on the input data, while the output data is included in the data loader as some models requires states generation.

Pre_processing module

deep_nilmtk.preprocessing.pre_processing.data_preprocessing(aggregate, targets=None, feature_type='mains', alpha=0.1, normalize=None, main_mu=329, main_std=450, q_filter={'q': 50, 'w': 10}, main_min=0, main_max=1500)[source]

Default pre-processing function. It performs normalization of the input. However, it leaves the target output normlization to the dataloader as some loaders require to also generate the states from the the original data.

Parameters
  • aggregate (list of DataFrames) -- The aggregate power

  • targets (list of DataFrames, optional) -- The target power, defaults to None

  • feature_type (str, optional) -- the type of input features to derive from the aggregate power, defaults to main

  • alpha (float, optional) -- reflection rate, defaults to 0.1

  • normalize ([type], optional) -- normalization type, defaults to None

  • main_mu (int, optional) -- the mean of the aggregate power data, defaults to 329

  • main_std (int, optional) -- the std of the aggregate power data, defaults to 450

  • q_filter (dict, optional) -- quantile filters, defaults to {"q":50, "w":10}

  • main_min (int, optional) -- the min of the aggregate power data, defaults to 0

  • main_max (int, optional) -- the max of the aggregate power data, defaults to 1500

Returns

aggregate power, submetered data all in one dataframe , submetered data as seperate datFrames

Return type

tuple

deep_nilmtk.preprocessing.pre_processing.get_differential_power(data)[source]

The differences between consecutive elements of an array.

Parameters

data (np.array) -- the input data

Returns

The differences.

Return type

np.array

deep_nilmtk.preprocessing.pre_processing.get_percentile(data, p=50)[source]

Calculates the percentile p of the data

Parameters
  • data (np.array) -- The power data

  • p (int, optional) -- The quantile , defaults to 50

Returns

The quantile values of the power data

Return type

np.array

deep_nilmtk.preprocessing.pre_processing.get_temporal_info(data)[source]

Generates the temporal information related power consumption

Parameters

data (list(DatetimeIndex)) -- a list of temporal information

Returns

Temporal contextual information of the energy data

Return type

np.array

deep_nilmtk.preprocessing.pre_processing.get_variant_power(data, alpha=0.1)[source]

Generate variant power which reduce noise that may impose negative influence on pattern identification

Parameters
  • data (np.array) -- power signal

  • alpha (float, optional) -- reflection rate, defaults to 0.1

Returns

The variant power generated

Return type

np.array

deep_nilmtk.preprocessing.pre_processing.over_lapping_sliding_window(data, seq_len=4, step_size=1)[source]

Generates overlappping sequences using the sliding sequence approach.

Parameters
  • data (np.array) -- Power data

  • seq_len (int, optional) -- The length of the sequences. Defaults to 4.

  • step_size (int, optional) -- The step size. Defaults to 1.

Returns

An array of the generated sequences.

Return type

np.array

deep_nilmtk.preprocessing.pre_processing.quantile_filter(data: numpy.array, sequence_length: int = 10, p: int = 50)[source]

Applies quantile filter on the input data.

Parameters
  • data (np.array) -- The input data power data.

  • sequence_length (int, optional) -- The length of sequence, defaults to 10

  • p (int, optional) -- The percentile. Defaults to 50.

Returns

array of values for correponding percentile

Return type

np.array

States module

deep_nilmtk.preprocessing.states.compute_status(appliances, thresholds=None, min_off=None, min_on=None, threshold_std=True, return_means=False, appliances_labels=[], threshold_method='at')[source]

Calculates the operational status of appliances using the specified thresholding method

Parameters
  • appliances (np.array) -- Power consumption of target applainces

  • thresholds (np.array, optional) -- Threhsold of each applaince, defaults to None

  • min_off (np.array, optional) -- Minimum off duration, defaults to None

  • min_on (np.array, optional) -- Minimum on duration, defaults to None

  • threshold_std (bool, optional) -- Decides about the use of STD to calcualte the thresholds, defaults to True

  • return_means (bool, optional) -- Specifiyies if the mean consumption of each status is required, defaults to False

  • appliances_labels (list, optional) -- Labels of the considered appliances, defaults to []

  • threshold_method (str, optional) -- The thresholding method to be used for status derivation, defaults to 'at'

Returns

Operational states with the thresholds used and the power consumption of each states

Return type

tuple

deep_nilmtk.preprocessing.states.get_status(ser, thresholds)[source]

[summary]

Parameters
  • ser (np.array) -- Target power consumption with shape = (num_series, series_len, num_meters)

  • thresholds (np.array) -- Thresholds of target power with shape = (num_meters,)

Returns

An array (num_series, series_len, num_meters) with binary values indicating ON (1) and OFF (0) states.

Return type

np.array

deep_nilmtk.preprocessing.states.get_status_by_duration(ser, thresholds, min_off, min_on)[source]

Calculates operational status of multiple meters using thresholds

Parameters
  • ser (np.array) -- Power consumption shape = (num_series, series_len, num_meters) - num_series : Amount of time series- series_len : Length of each time series - num_meters : Meters contained in the array.

  • thresholds (np.array) -- Thresholds of power consumption shape = (num_meters,)

  • min_off (np.array) -- Mimimum off duration with shape = (num_meters,)

  • min_on (np.array) -- Mimimum on duration with shape = (num_meters,)

Returns

Operational status with binary values indicating ON (1) and OFF (0) states with shape (num_series, series_len, num_meters).

Return type

np.array

deep_nilmtk.preprocessing.states.get_status_means(ser, status)[source]

Get means of ON/OFF status.

Parameters
  • ser (np.array) -- Power data

  • status (np.array) -- The operational status of the target power

Returns

Mean power consumption of each state

Return type

np.array

Threshold module

deep_nilmtk.preprocessing.threshold.get_threshold_params(appliances, threshold_method='at')[source]

Given the method name and list of appliances, this function results the necessary Args to use the method in ukdale_data.load_ukdale_meter

Parameters
  • appliances (list) -- List of aappliances

  • threshold_method (str, optional) -- Thresholding method, defaults to 'at'

Raises
  • ValueError -- Wrong thresholding method

  • ValueError -- Missing parameters of an applaince

Returns

thresholds, min_off, min_on, threshold_std

Return type

tuple

deep_nilmtk.preprocessing.threshold.get_thresholds(ser, use_std=True, return_mean=False)[source]

Returns the estimated thresholds that splits ON and OFF appliances states.

Parameters
  • ser (np.array) -- An array with shape = (num_series, series_len, num_meters) - num_series : Amount of time series. - series_len : Length of each time series. - num_meters : Meters contained in the array.

  • use_std (bool, optional) -- Consider the standard deviation of each cluster when computing the threshold. If not, the threshold is set in the middle point between cluster centroids., defaults to True

  • return_mean (bool, optional) -- If True, return the means as second parameter., defaults to False

Returns

thresholds and mean consumption for each appliance

Return type

tuple

Note

The eman values are only returned when return_mean is True (default False)