timesead.data.transforms

Submodules

Classes

`Transform`	Base class for all transforms.
`SubsampleTransform`	Subsample sequences by a specified factor. subsampling_factor consecutive datapoints in a sequence will be
`CacheTransform`	Caches the results from a previous `Transform` in memory so that expensive
`LimitTransform`	Limits the amount of data points returned.
`ReconstructionTargetTransform`	Adds the current inputs as targets for reconstruction objectives.
`OneVsRestTargetTransform`	Transforms multi-class labels into binary labels for anomaly detection.
`PredictionTargetTransform`	Adds the last prediction_window points from the current inputs as targets for prediction objectives.
`OverlapPredictionTargetTransform`	Adds the sequence shifted by offset as the target.
`WindowTransform`	This `Transform` produces sliding windows from input sequences. Incomplete windows
`DatasetSource`	This acts as a source `Transform` (meaning it has no parent) that simply returns
`PipelineDataset`	Dataset that can be used with a `torch.utils.data.DataLoader` and executes a pipeline of transforms to

Functions

`make_dataset_split`(dataset, *splits[, axis])	Create `DatasetSource`s for different parts of a given dataset.
`make_pipe_from_dict`(→ PipelineDataset)	Instantiates a `PipelineDataset` from a given `DatasetSource` and a

Package Contents

class timesead.data.transforms.Transform(parent: Transform | None)

Bases: abc.ABC

Base class for all transforms. A Transform processes one (or several) data points and outputs them. Transforms can be chained in a pull-based pipeline.

Parameters:: parent (Optional[Transform]) – Another Transform which is used as the data source for this Transform. Can be None in the case of a source.

parent

get_datapoint(item: int) → Tuple[Tuple[torch.Tensor, Ellipsis], Tuple[torch.Tensor, Ellipsis]]

Returns a datapoint (in our case this is a sequence) from this transform.

Parameters:: item (int) – Must be 0<=item<len(self)
Returns:: A datapoint of the form (inputs, targets), where inputs and targets are tuples of tensors.
Return type:: Tuple[Tuple[torch.Tensor, Ellipsis], Tuple[torch.Tensor, Ellipsis]]

__len__() → int | None

This should return the number of available sequences after the transformation.

Return type:: Optional[int]

property seq_len: int | List[int]

This should return the length of each time series. If the time series have different lengths, the return value should be a list that contains the length of each sequence. If all sequences are of equal length, this should return an int.

Return type:: Union[int, List[int]]

property num_features: int | Tuple[int, Ellipsis]

Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension.

Return type:: Union[int, Tuple[int, Ellipsis]]

class timesead.data.transforms.SubsampleTransform(parent: timesead.data.transforms.transform_base.Transform, subsampling_factor: int, aggregation: str = 'first')

Bases: timesead.data.transforms.transform_base.Transform

Subsample sequences by a specified factor. subsampling_factor consecutive datapoints in a sequence will be aggregated into one point using the aggregation function.

Parameters:

parent (timesead.data.transforms.transform_base.Transform) – Another Transform which is used as the data source for this Transform.
subsampling_factor (int) – This specifies the number of consecutive data points that will be aggregated.
aggregation (str) – The function that should be applied to aggregate a window of data points. Can be either ‘mean’, ‘last’ or ‘first’.

subsampling_factor

property seq_len

class timesead.data.transforms.CacheTransform(parent: timesead.data.transforms.transform_base.Transform)

Bases: timesead.data.transforms.transform_base.Transform

Caches the results from a previous Transform in memory so that expensive calculations do not have to be recomputed.

Parameters:: parent (timesead.data.transforms.transform_base.Transform) – Another Transform which is used as the data source for this Transform.

cache

class timesead.data.transforms.LimitTransform(parent: timesead.data.transforms.transform_base.Transform, count: int)

Bases: timesead.data.transforms.transform_base.Transform

Limits the amount of data points returned.

Parameters:

parent (timesead.data.transforms.transform_base.Transform) – Another Transform which is used as the data source for this Transform.
count (int) – The max number of sequences that should be returned by this Transform.

max_count

__len__()

class timesead.data.transforms.ReconstructionTargetTransform(parent: timesead.data.transforms.transform_base.Transform, replace_labels: bool = False)

Bases: timesead.data.transforms.transform_base.Transform

Adds the current inputs as targets for reconstruction objectives.

Parameters:

parent (timesead.data.transforms.transform_base.Transform) – Another Transform which is used as the data source for this transform.
replace_labels (bool) – Whether the original labels should be replaced by the reconstruction target. If False, the reconstruction target will be added to the tuple of original labels.

replace_labels = False

class timesead.data.transforms.OneVsRestTargetTransform(parent: timesead.data.transforms.transform_base.Transform, normal_class: Any | None = None, anomalous_class: Any | None = None, replace_labels: bool = False)

Bases: timesead.data.transforms.transform_base.Transform

Transforms multi-class labels into binary labels for anomaly detection. “Normal” data points will have label 0, others will have label 1.

Parameters:

parent (timesead.data.transforms.transform_base.Transform) – Another Transform which is used as the data source for this Transform.
normal_class (Optional[Any]) – The input class label that should be considered normal and will have label 0 in the output.
anomalous_class (Optional[Any]) – You can also specify an anomalous class that will have label 1. All other labels will be transformed to 0. Note that you cannot specify both normal_class and anomalous_class.
replace_labels (bool) – Whether the original labels should be replaced by the Transform. If False, the additional labels will be added to the tuple of original labels.

replace_labels = False

normal_class = None

anomalous_class = None

class timesead.data.transforms.PredictionTargetTransform(parent: timesead.data.transforms.transform_base.Transform, window_size: int, prediction_horizon: int, replace_labels: bool = False, step_size: int = 1, reverse: bool = False)

Bases: timesead.data.transforms.window_transform.WindowTransform

Adds the last prediction_window points from the current inputs as targets for prediction objectives.

Parameters:

parent (timesead.data.transforms.transform_base.Transform) – Another Transform which is used as the data source for this Transform.
prediction_horizon (int) – Number of datapoints that should be predicted.
replace_labels (bool) – Whether the original labels should be replaced by the prediction target. If False, the prediction target will be added to the tuple of original labels.
window_size (int)
step_size (int)
reverse (bool)

input_window_size

prediction_horizon

replace_labels = False

property seq_len: int | List[int]

This should return the length of each time series. If the time series have different lengths, the return value should be a list that contains the length of each sequence. If all sequences are of equal length, this should return an int.

Return type:: Union[int, List[int]]

class timesead.data.transforms.OverlapPredictionTargetTransform(parent: timesead.data.transforms.transform_base.Transform, offset: int, replace_labels: bool = False)

Bases: timesead.data.transforms.transform_base.Transform

Adds the sequence shifted by offset as the target.

Parameters:

parent (timesead.data.transforms.transform_base.Transform) – Another Transform which is used as the data source for this Transform.
offset (int) – Number of steps ahead that should be predicted.
replace_labels (bool) – Whether the original labels should be replaced by the prediction target. If False, the prediction target will be added to the tuple of original labels.

offset

replace_labels = False

property seq_len: int | List[int]

This should return the length of each time series. If the time series have different lengths, the return value should be a list that contains the length of each sequence. If all sequences are of equal length, this should return an int.

Return type:: Union[int, List[int]]

class timesead.data.transforms.WindowTransform(parent: timesead.data.transforms.transform_base.Transform, window_size: int, step_size: int = 1, reverse: bool = False)

Bases: timesead.data.transforms.transform_base.Transform

This Transform produces sliding windows from input sequences. Incomplete windows: (that can appear if step_size>1) will not be returned.

Parameters:

parent (timesead.data.transforms.transform_base.Transform) – Another Transform which is used as the data source for this Transform.
window_size (int) – The size of each window.
step_size (int) – The step size at which the sliding window is moved along the sequence.
reverse (bool) – If this is True, start the sliding window at the end of a sequence, instead of the start. Note that this will not reverse the order of sequences in the dataset and only applies within a single sequence.

window_size

step_size = 1

reverse = False

inverse_transform_index(item) → Tuple[int, int]

Return type:: Tuple[int, int]

__len__()

property seq_len

class timesead.data.transforms.DatasetSource(dataset: timesead.data.dataset.BaseTSDataset, start: int | List[int] = None, end: int | List[int] = None, axis: str = 'batch')

Bases: timesead.data.transforms.transform_base.Transform

This acts as a source Transform (meaning it has no parent) that simply returns sequences from a given dataset. It can be constrained to return only a specific part of the data.

Parameters:

dataset (timesead.data.dataset.BaseTSDataset) – The dataset from which to take points.
start (Union[int, List[int]]) – Start index for this dataset. Please see below for a more detailed explanation.
end (Union[int, List[int]]) – End index for this dataset (exclusive). Please see below for a more detailed explanation.
axis (str) – Can be either ‘batch’ or ‘time’. In ‘batch’ mode, this simply returns only the sequences indexed from start to end. ‘time’ mode is used for datasets that contain only one long time series. That time series will be cut according to start and end.

dataset

axis = 'batch'

__len__(): This should return the number of available sequences after the transformation.

property seq_len: This should return the length of each time series. If the time series have different lengths, the return value should be a list that contains the length of each sequence. If all sequences are of equal length, this should return an int.

property num_features: Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension.

timesead.data.transforms.make_dataset_split(dataset: timesead.data.dataset.BaseTSDataset, *splits: float, axis: str = 'batch')

Create DatasetSources for different parts of a given dataset.

Parameters:

dataset (timesead.data.dataset.BaseTSDataset) – The dataset, for which the split should be done.
splits (float) – This should be the percentages of the dataset in each split. Will be normalized to 100%.
axis (str) – The axis along which to split the dataset. Please see DatasetSource for a more detailed explanation.

Returns:

This will return a generator that yields DatasetSources according to the specified splits.

class timesead.data.transforms.PipelineDataset(sink_transform: timesead.data.transforms.transform_base.Transform)

Bases: timesead.data.dataset.BaseTSDataset

Dataset that can be used with a torch.utils.data.DataLoader and executes a pipeline of transforms to retrieve its datapoints.

Parameters:: sink_transform (timesead.data.transforms.transform_base.Transform) – The last Transform in the pipeline that should be queried for data points.

sink_transform

__iter__()

__getitem__(item) → Tuple[Tuple[torch.Tensor, Ellipsis], Tuple[torch.Tensor, Ellipsis]]

Access the timeseries at position index and its corresponding label sequence. A call to this function should return a single time series that was sampled independently of the other time series in this dataset.

Parameters:: index – The zero-based index of the time series to retrieve.
Returns:: A tuple (inputs, targets), where inputs is again a tuple of Tensors with shape (T, D*), where D* can very between the tensors. targets contains labels for the time series as tensors of shape (T,).
Return type:: Tuple[Tuple[torch.Tensor, Ellipsis], Tuple[torch.Tensor, Ellipsis]]

__len__(): This should return the number of independent time series in the dataset

property seq_len: int | List[int]

This should return the length of each time series. If the time series have different lengths, the return value should be a list that contains the length of each sequence. If all sequences are of equal length, this should return an int.

Return type:: Union[int, List[int]]

property num_features: int | Tuple[int, Ellipsis]

Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension.

Return type:: Union[int, Tuple[int, Ellipsis]]

static get_default_pipeline() → Dict[str, Dict[str, Any]]

Return the default pipeline for this dataset that is used if the user does not specify a different pipeline. This must be a dict of the form:

{
    '<name>': {'class': '<name-of-transform-class>', 'args': {'<args-for-constructor>', ...}},
    ...
}

Return type:: Dict[str, Dict[str, Any]]

static get_feature_names() → List[str]

Abstractmethod:
Return type:: List[str]

Return names for the features in the order they are present in the data tensors.

Returns:: A list of strings with names for each feature.
Return type:: List[str]

save(path: str, chunk_size: int = 0, batch_dim: int = 0)

Save this dataset as it would be returned after all processing by its transforms is done.

Parameters:

path (str) – The folder in which to save the dataset.
chunk_size (int) – The maximum number of data points that should be saved in one file. If there are more data points than this value, multiple files will be created. Set this to 0 to save the entire dataset in one file.
batch_dim (int) – All (or chunk_size) datapoints will be stacked along this axis in a new tensor that is then saved to disk.

timesead.data.transforms.make_pipe_from_dict(pipeline: Dict[str, Dict[str, Any]], data_source: timesead.data.transforms.dataset_source.DatasetSource) → PipelineDataset

Instantiates a PipelineDataset from a given DatasetSource and a pipeline specification.

Warning

In case the specification of a Transform in the pipeline is incomplete or its instantiation fails for some other reason, this function simply prints a warning and continues with the next Transform instead of raising the exception.

Parameters:

pipeline (Dict[str, Dict[str, Any]]) –
Specification of the pipeline as a dict in the following format:
```
{
    '<name>': {'class': '<name-of-transform-class>', 'args': {'<args-for-constructor>': <value>, ...}},
    ...
}
```
The function respects the order of transforms specified in the dict. That is, the first transform specified in the dict will be the first transform added to the pipeline and so on.
data_source (timesead.data.transforms.dataset_source.DatasetSource) – The DatasetSource that acts as a source transform for the pipeline.

Returns:

A PipelineDataset that retrieves data from the given DatasetSource and then executes the specified pipeline.

Return type:

PipelineDataset