timesead.data.preprocessing
This package contains helper code to setup datasets for experiments.
Submodules
Functions
|
Incrementally compute feature-wise mean, minimum, and maximum values for a dataset consisting of multiple |
|
Compute feature-wise mean, standard deviation, minimum, and maximum values for a dataset consisting of a single |
|
Scale features in a dataset independently to the [0, 1] range using pre-computed minimum and maximum values. |
|
Scale features in a dataset such that they have zero mean and unit standard deviation using pre-computed mean and |
Package Contents
- timesead.data.preprocessing.update_statistics_increment(frame: pandas.DataFrame, mean: numpy.ndarray = None, min_val: numpy.ndarray = None, max_val: numpy.ndarray = None, old_n: int = None) Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, int]
Incrementally compute feature-wise mean, minimum, and maximum values for a dataset consisting of multiple
DataFrames.- Parameters:
frame (pandas.DataFrame) – The current data with wich the statistics should be updated.
mean (numpy.ndarray) – Mean value returned by previous calls to the function or None if this is the first call.
min_val (numpy.ndarray) – Minimum value returned by previous calls to the function or None if this is the first call.
max_val (numpy.ndarray) – Maximum value returned by previous calls to the function or None if this is the first call.
old_n (int) – Total number of data points processed in earlier calls to the function or None if this is the first call.
- Returns:
A tuple (mean, min_val, max_val, old_n + n) that contains the updated statistics and the total number of processed data points.
- Return type:
Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, int]
- timesead.data.preprocessing.save_statistics(frame: pandas.DataFrame, path: str)
Compute feature-wise mean, standard deviation, minimum, and maximum values for a dataset consisting of a single
DataFrameand save them as a .npz file.- Parameters:
frame (pandas.DataFrame) – The dataset for which to compute and save statistics.
path (str) – Path to save the statistics via
numpy.savez().
- timesead.data.preprocessing.minmax_scaler(frame: pandas.DataFrame | numpy.ndarray, stats: Dict[str, numpy.ndarray]) pandas.DataFrame
Scale features in a dataset independently to the [0, 1] range using pre-computed minimum and maximum values.
- Parameters:
frame (Union[pandas.DataFrame, numpy.ndarray]) – The dataset to scale. This should either be a
DataFrameor andarrayof shape (N*, D).stats (Dict[str, numpy.ndarray]) – A dictionary that contains pre-computed feature-wise minimum and maximum values as
ndarrays of shape (D,) in the keys ‘min’ and ‘max’, respectively.
- Returns:
The scaled
DataFrameor andarray. Note that this will usually be a copy as this function does not modify any values in place.- Return type:
- timesead.data.preprocessing.standard_scaler(frame: pandas.DataFrame | numpy.ndarray, stats: Dict[str, numpy.ndarray]) pandas.DataFrame
Scale features in a dataset such that they have zero mean and unit standard deviation using pre-computed mean and standard deviation values.
- Parameters:
frame (Union[pandas.DataFrame, numpy.ndarray]) – The dataset to scale. This should either be a
DataFrameor andarrayof shape (N*, D).stats (Dict[str, numpy.ndarray]) – A dictionary that contains pre-computed feature-wise mean and standard deviation values as
ndarrays of shape (D,) in the keys ‘mean’ and ‘std’, respectively.
- Returns:
The scaled
DataFrameor andarray. Note that this will usually be a copy as this function does not modify any values in place.- Return type: