timesead.data.preprocessing

This package contains helper code to setup datasets for experiments.

Submodules

Functions

`update_statistics_increment`(→ Tuple[numpy.ndarray, ...)	Incrementally compute feature-wise mean, minimum, and maximum values for a dataset consisting of multiple
`save_statistics`(frame, path)	Compute feature-wise mean, standard deviation, minimum, and maximum values for a dataset consisting of a single
`minmax_scaler`(→ pandas.DataFrame)	Scale features in a dataset independently to the [0, 1] range using pre-computed minimum and maximum values.
`standard_scaler`(→ pandas.DataFrame)	Scale features in a dataset such that they have zero mean and unit standard deviation using pre-computed mean and

Package Contents

timesead.data.preprocessing.update_statistics_increment(frame: pandas.DataFrame, mean: numpy.ndarray = None, min_val: numpy.ndarray = None, max_val: numpy.ndarray = None, old_n: int = None) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, int]

Incrementally compute feature-wise mean, minimum, and maximum values for a dataset consisting of multiple DataFrames.

Parameters:

frame (pandas.DataFrame) – The current data with wich the statistics should be updated.
mean (numpy.ndarray) – Mean value returned by previous calls to the function or None if this is the first call.
min_val (numpy.ndarray) – Minimum value returned by previous calls to the function or None if this is the first call.
max_val (numpy.ndarray) – Maximum value returned by previous calls to the function or None if this is the first call.
old_n (int) – Total number of data points processed in earlier calls to the function or None if this is the first call.

Returns:

A tuple (mean, min_val, max_val, old_n + n) that contains the updated statistics and the total number of processed data points.

Return type:

Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, int]

timesead.data.preprocessing.save_statistics(frame: pandas.DataFrame, path: str)

Compute feature-wise mean, standard deviation, minimum, and maximum values for a dataset consisting of a single DataFrame and save them as a .npz file.

Parameters:

frame (pandas.DataFrame) – The dataset for which to compute and save statistics.
path (str) – Path to save the statistics via numpy.savez().

timesead.data.preprocessing.minmax_scaler(frame: pandas.DataFrame | numpy.ndarray, stats: Dict[str, numpy.ndarray]) → pandas.DataFrame

Scale features in a dataset independently to the [0, 1] range using pre-computed minimum and maximum values.

Parameters:

frame (Union[pandas.DataFrame, numpy.ndarray]) – The dataset to scale. This should either be a DataFrame or a ndarray of shape (N*, D).
stats (Dict[str, numpy.ndarray]) – A dictionary that contains pre-computed feature-wise minimum and maximum values as ndarrays of shape (D,) in the keys ‘min’ and ‘max’, respectively.

Returns:

The scaled DataFrame or a ndarray. Note that this will usually be a copy as this function does not modify any values in place.

Return type:

pandas.DataFrame

timesead.data.preprocessing.standard_scaler(frame: pandas.DataFrame | numpy.ndarray, stats: Dict[str, numpy.ndarray]) → pandas.DataFrame

Scale features in a dataset such that they have zero mean and unit standard deviation using pre-computed mean and standard deviation values.

Parameters:

frame (Union[pandas.DataFrame, numpy.ndarray]) – The dataset to scale. This should either be a DataFrame or a ndarray of shape (N*, D).
stats (Dict[str, numpy.ndarray]) – A dictionary that contains pre-computed feature-wise mean and standard deviation values as ndarrays of shape (D,) in the keys ‘mean’ and ‘std’, respectively.

Returns:

The scaled DataFrame or a ndarray. Note that this will usually be a copy as this function does not modify any values in place.

Return type:

pandas.DataFrame