timesead.data.preprocessing.common

Functions

update_statistics_increment(→ Tuple[numpy.ndarray, ...)

Incrementally compute feature-wise mean, minimum, and maximum values for a dataset consisting of multiple

save_statistics(frame, path)

Compute feature-wise mean, standard deviation, minimum, and maximum values for a dataset consisting of a single

minmax_scaler(→ pandas.DataFrame)

Scale features in a dataset independently to the [0, 1] range using pre-computed minimum and maximum values.

standard_scaler(→ pandas.DataFrame)

Scale features in a dataset such that they have zero mean and unit standard deviation using pre-computed mean and

Module Contents

timesead.data.preprocessing.common.update_statistics_increment(frame: pandas.DataFrame, mean: numpy.ndarray = None, min_val: numpy.ndarray = None, max_val: numpy.ndarray = None, old_n: int = None) Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, int]

Incrementally compute feature-wise mean, minimum, and maximum values for a dataset consisting of multiple DataFrames.

Parameters:
  • frame (pandas.DataFrame) – The current data with wich the statistics should be updated.

  • mean (numpy.ndarray) – Mean value returned by previous calls to the function or None if this is the first call.

  • min_val (numpy.ndarray) – Minimum value returned by previous calls to the function or None if this is the first call.

  • max_val (numpy.ndarray) – Maximum value returned by previous calls to the function or None if this is the first call.

  • old_n (int) – Total number of data points processed in earlier calls to the function or None if this is the first call.

Returns:

A tuple (mean, min_val, max_val, old_n + n) that contains the updated statistics and the total number of processed data points.

Return type:

Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, int]

timesead.data.preprocessing.common.save_statistics(frame: pandas.DataFrame, path: str)

Compute feature-wise mean, standard deviation, minimum, and maximum values for a dataset consisting of a single DataFrame and save them as a .npz file.

Parameters:
timesead.data.preprocessing.common.minmax_scaler(frame: pandas.DataFrame | numpy.ndarray, stats: Dict[str, numpy.ndarray]) pandas.DataFrame

Scale features in a dataset independently to the [0, 1] range using pre-computed minimum and maximum values.

Parameters:
Returns:

The scaled DataFrame or a ndarray. Note that this will usually be a copy as this function does not modify any values in place.

Return type:

pandas.DataFrame

timesead.data.preprocessing.common.standard_scaler(frame: pandas.DataFrame | numpy.ndarray, stats: Dict[str, numpy.ndarray]) pandas.DataFrame

Scale features in a dataset such that they have zero mean and unit standard deviation using pre-computed mean and standard deviation values.

Parameters:
  • frame (Union[pandas.DataFrame, numpy.ndarray]) – The dataset to scale. This should either be a DataFrame or a ndarray of shape (N*, D).

  • stats (Dict[str, numpy.ndarray]) – A dictionary that contains pre-computed feature-wise mean and standard deviation values as ndarrays of shape (D,) in the keys ‘mean’ and ‘std’, respectively.

Returns:

The scaled DataFrame or a ndarray. Note that this will usually be a copy as this function does not modify any values in place.

Return type:

pandas.DataFrame