timesead.data.transforms
Submodules
Classes
Base class for all transforms. |
|
Subsample sequences by a specified factor. subsampling_factor consecutive datapoints in a sequence will be |
|
Caches the results from a previous |
|
Limits the amount of data points returned. |
|
Adds the current inputs as targets for reconstruction objectives. |
|
Transforms multi-class labels into binary labels for anomaly detection. |
|
Adds the last prediction_window points from the current inputs as targets for prediction objectives. |
|
Adds the sequence shifted by offset as the target. |
|
This |
|
This acts as a source |
|
Dataset that can be used with a |
Functions
|
Create |
|
Instantiates a |
Package Contents
- class timesead.data.transforms.Transform(parent: Transform | None)
Bases:
abc.ABCBase class for all transforms. A Transform processes one (or several) data points and outputs them. Transforms can be chained in a pull-based pipeline.
- Parameters:
parent (Optional[Transform]) – Another
Transformwhich is used as the data source for thisTransform. Can be None in the case of a source.
- parent
- get_datapoint(item: int) Tuple[Tuple[torch.Tensor, Ellipsis], Tuple[torch.Tensor, Ellipsis]]
Returns a datapoint (in our case this is a sequence) from this transform.
- Parameters:
item (int) – Must be 0<=item<len(self)
- Returns:
A datapoint of the form (inputs, targets), where inputs and targets are tuples of tensors.
- Return type:
Tuple[Tuple[torch.Tensor, Ellipsis], Tuple[torch.Tensor, Ellipsis]]
- __len__() int | None
This should return the number of available sequences after the transformation.
- Return type:
Optional[int]
- class timesead.data.transforms.SubsampleTransform(parent: timesead.data.transforms.transform_base.Transform, subsampling_factor: int, aggregation: str = 'first')
Bases:
timesead.data.transforms.transform_base.TransformSubsample sequences by a specified factor. subsampling_factor consecutive datapoints in a sequence will be aggregated into one point using the aggregation function.
- Parameters:
parent (timesead.data.transforms.transform_base.Transform) – Another
Transformwhich is used as the data source for thisTransform.subsampling_factor (int) – This specifies the number of consecutive data points that will be aggregated.
aggregation (str) – The function that should be applied to aggregate a window of data points. Can be either ‘mean’, ‘last’ or ‘first’.
- subsampling_factor
- property seq_len
- class timesead.data.transforms.CacheTransform(parent: timesead.data.transforms.transform_base.Transform)
Bases:
timesead.data.transforms.transform_base.TransformCaches the results from a previous
Transformin memory so that expensive calculations do not have to be recomputed.- Parameters:
parent (timesead.data.transforms.transform_base.Transform) – Another
Transformwhich is used as the data source for thisTransform.
- cache
- class timesead.data.transforms.LimitTransform(parent: timesead.data.transforms.transform_base.Transform, count: int)
Bases:
timesead.data.transforms.transform_base.TransformLimits the amount of data points returned.
- Parameters:
parent (timesead.data.transforms.transform_base.Transform) – Another
Transformwhich is used as the data source for thisTransform.count (int) – The max number of sequences that should be returned by this
Transform.
- max_count
- __len__()
- class timesead.data.transforms.ReconstructionTargetTransform(parent: timesead.data.transforms.transform_base.Transform, replace_labels: bool = False)
Bases:
timesead.data.transforms.transform_base.TransformAdds the current inputs as targets for reconstruction objectives.
- Parameters:
parent (timesead.data.transforms.transform_base.Transform) – Another
Transformwhich is used as the data source for this transform.replace_labels (bool) – Whether the original labels should be replaced by the reconstruction target. If False, the reconstruction target will be added to the tuple of original labels.
- replace_labels = False
- class timesead.data.transforms.OneVsRestTargetTransform(parent: timesead.data.transforms.transform_base.Transform, normal_class: Any | None = None, anomalous_class: Any | None = None, replace_labels: bool = False)
Bases:
timesead.data.transforms.transform_base.TransformTransforms multi-class labels into binary labels for anomaly detection. “Normal” data points will have label 0, others will have label 1.
- Parameters:
parent (timesead.data.transforms.transform_base.Transform) – Another
Transformwhich is used as the data source for thisTransform.normal_class (Optional[Any]) – The input class label that should be considered normal and will have label 0 in the output.
anomalous_class (Optional[Any]) – You can also specify an anomalous class that will have label 1. All other labels will be transformed to 0. Note that you cannot specify both normal_class and anomalous_class.
replace_labels (bool) – Whether the original labels should be replaced by the
Transform. If False, the additional labels will be added to the tuple of original labels.
- replace_labels = False
- normal_class = None
- anomalous_class = None
- class timesead.data.transforms.PredictionTargetTransform(parent: timesead.data.transforms.transform_base.Transform, window_size: int, prediction_horizon: int, replace_labels: bool = False, step_size: int = 1, reverse: bool = False)
Bases:
timesead.data.transforms.window_transform.WindowTransformAdds the last prediction_window points from the current inputs as targets for prediction objectives.
- Parameters:
parent (timesead.data.transforms.transform_base.Transform) – Another
Transformwhich is used as the data source for thisTransform.prediction_horizon (int) – Number of datapoints that should be predicted.
replace_labels (bool) – Whether the original labels should be replaced by the prediction target. If False, the prediction target will be added to the tuple of original labels.
window_size (int)
step_size (int)
reverse (bool)
- input_window_size
- prediction_horizon
- replace_labels = False
- class timesead.data.transforms.OverlapPredictionTargetTransform(parent: timesead.data.transforms.transform_base.Transform, offset: int, replace_labels: bool = False)
Bases:
timesead.data.transforms.transform_base.TransformAdds the sequence shifted by offset as the target.
- Parameters:
parent (timesead.data.transforms.transform_base.Transform) – Another
Transformwhich is used as the data source for thisTransform.offset (int) – Number of steps ahead that should be predicted.
replace_labels (bool) – Whether the original labels should be replaced by the prediction target. If False, the prediction target will be added to the tuple of original labels.
- offset
- replace_labels = False
- class timesead.data.transforms.WindowTransform(parent: timesead.data.transforms.transform_base.Transform, window_size: int, step_size: int = 1, reverse: bool = False)
Bases:
timesead.data.transforms.transform_base.Transform- This
Transformproduces sliding windows from input sequences. Incomplete windows (that can appear if
step_size>1) will not be returned.
- Parameters:
parent (timesead.data.transforms.transform_base.Transform) – Another
Transformwhich is used as the data source for thisTransform.window_size (int) – The size of each window.
step_size (int) – The step size at which the sliding window is moved along the sequence.
reverse (bool) – If this is True, start the sliding window at the end of a sequence, instead of the start. Note that this will not reverse the order of sequences in the dataset and only applies within a single sequence.
- window_size
- step_size = 1
- reverse = False
- __len__()
- property seq_len
- This
- class timesead.data.transforms.DatasetSource(dataset: timesead.data.dataset.BaseTSDataset, start: int | List[int] = None, end: int | List[int] = None, axis: str = 'batch')
Bases:
timesead.data.transforms.transform_base.TransformThis acts as a source
Transform(meaning it has no parent) that simply returns sequences from a given dataset. It can be constrained to return only a specific part of the data.- Parameters:
dataset (timesead.data.dataset.BaseTSDataset) – The dataset from which to take points.
start (Union[int, List[int]]) – Start index for this dataset. Please see below for a more detailed explanation.
end (Union[int, List[int]]) – End index for this dataset (exclusive). Please see below for a more detailed explanation.
axis (str) – Can be either ‘batch’ or ‘time’. In ‘batch’ mode, this simply returns only the sequences indexed from start to end. ‘time’ mode is used for datasets that contain only one long time series. That time series will be cut according to start and end.
- dataset
- axis = 'batch'
- __len__()
This should return the number of available sequences after the transformation.
- property seq_len
This should return the length of each time series. If the time series have different lengths, the return value should be a list that contains the length of each sequence. If all sequences are of equal length, this should return an int.
- property num_features
Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension.
- timesead.data.transforms.make_dataset_split(dataset: timesead.data.dataset.BaseTSDataset, *splits: float, axis: str = 'batch')
Create
DatasetSources for different parts of a given dataset.- Parameters:
dataset (timesead.data.dataset.BaseTSDataset) – The dataset, for which the split should be done.
splits (float) – This should be the percentages of the dataset in each split. Will be normalized to 100%.
axis (str) – The axis along which to split the dataset. Please see
DatasetSourcefor a more detailed explanation.
- Returns:
This will return a generator that yields
DatasetSources according to the specified splits.
- class timesead.data.transforms.PipelineDataset(sink_transform: timesead.data.transforms.transform_base.Transform)
Bases:
timesead.data.dataset.BaseTSDatasetDataset that can be used with a
torch.utils.data.DataLoaderand executes a pipeline of transforms to retrieve its datapoints.- Parameters:
sink_transform (timesead.data.transforms.transform_base.Transform) – The last
Transformin the pipeline that should be queried for data points.
- sink_transform
- __iter__()
- __getitem__(item) Tuple[Tuple[torch.Tensor, Ellipsis], Tuple[torch.Tensor, Ellipsis]]
Access the timeseries at position index and its corresponding label sequence. A call to this function should return a single time series that was sampled independently of the other time series in this dataset.
- Parameters:
index – The zero-based index of the time series to retrieve.
- Returns:
A tuple (inputs, targets), where inputs is again a tuple of
Tensors with shape (T, D*), where D* can very between the tensors. targets contains labels for the time series as tensors of shape (T,).- Return type:
Tuple[Tuple[torch.Tensor, Ellipsis], Tuple[torch.Tensor, Ellipsis]]
- __len__()
This should return the number of independent time series in the dataset
- property seq_len: int | List[int]
This should return the length of each time series. If the time series have different lengths, the return value should be a list that contains the length of each sequence. If all sequences are of equal length, this should return an int.
- property num_features: int | Tuple[int, Ellipsis]
Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension.
- static get_default_pipeline() Dict[str, Dict[str, Any]]
Return the default pipeline for this dataset that is used if the user does not specify a different pipeline. This must be a dict of the form:
{ '<name>': {'class': '<name-of-transform-class>', 'args': {'<args-for-constructor>', ...}}, ... }
- static get_feature_names() List[str]
- Abstractmethod:
- Return type:
List[str]
Return names for the features in the order they are present in the data tensors.
- Returns:
A list of strings with names for each feature.
- Return type:
List[str]
- save(path: str, chunk_size: int = 0, batch_dim: int = 0)
Save this dataset as it would be returned after all processing by its transforms is done.
- Parameters:
path (str) – The folder in which to save the dataset.
chunk_size (int) – The maximum number of data points that should be saved in one file. If there are more data points than this value, multiple files will be created. Set this to 0 to save the entire dataset in one file.
batch_dim (int) – All (or chunk_size) datapoints will be stacked along this axis in a new tensor that is then saved to disk.
- timesead.data.transforms.make_pipe_from_dict(pipeline: Dict[str, Dict[str, Any]], data_source: timesead.data.transforms.dataset_source.DatasetSource) PipelineDataset
Instantiates a
PipelineDatasetfrom a givenDatasetSourceand a pipeline specification.Warning
In case the specification of a
Transformin the pipeline is incomplete or its instantiation fails for some other reason, this function simply prints a warning and continues with the nextTransforminstead of raising the exception.- Parameters:
pipeline (Dict[str, Dict[str, Any]]) –
Specification of the pipeline as a dict in the following format:
{ '<name>': {'class': '<name-of-transform-class>', 'args': {'<args-for-constructor>': <value>, ...}}, ... }
The function respects the order of transforms specified in the dict. That is, the first transform specified in the dict will be the first transform added to the pipeline and so on.
data_source (timesead.data.transforms.dataset_source.DatasetSource) – The
DatasetSourcethat acts as a source transform for the pipeline.
- Returns:
A
PipelineDatasetthat retrieves data from the givenDatasetSourceand then executes the specified pipeline.- Return type: