timesead.data.transforms.pipeline_dataset

Classes

PipelineDataset

Dataset that can be used with a torch.utils.data.DataLoader and executes a pipeline of transforms to

Functions

make_pipe_from_dict(→ PipelineDataset)

Instantiates a PipelineDataset from a given DatasetSource and a

Module Contents

timesead.data.transforms.pipeline_dataset.make_pipe_from_dict(pipeline: Dict[str, Dict[str, Any]], data_source: timesead.data.transforms.dataset_source.DatasetSource) PipelineDataset

Instantiates a PipelineDataset from a given DatasetSource and a pipeline specification.

Warning

In case the specification of a Transform in the pipeline is incomplete or its instantiation fails for some other reason, this function simply prints a warning and continues with the next Transform instead of raising the exception.

Parameters:
  • pipeline (Dict[str, Dict[str, Any]]) –

    Specification of the pipeline as a dict in the following format:

    {
        '<name>': {'class': '<name-of-transform-class>', 'args': {'<args-for-constructor>': <value>, ...}},
        ...
    }
    

    The function respects the order of transforms specified in the dict. That is, the first transform specified in the dict will be the first transform added to the pipeline and so on.

  • data_source (timesead.data.transforms.dataset_source.DatasetSource) – The DatasetSource that acts as a source transform for the pipeline.

Returns:

A PipelineDataset that retrieves data from the given DatasetSource and then executes the specified pipeline.

Return type:

PipelineDataset

class timesead.data.transforms.pipeline_dataset.PipelineDataset(sink_transform: timesead.data.transforms.transform_base.Transform)

Bases: timesead.data.dataset.BaseTSDataset

Dataset that can be used with a torch.utils.data.DataLoader and executes a pipeline of transforms to retrieve its datapoints.

Parameters:

sink_transform (timesead.data.transforms.transform_base.Transform) – The last Transform in the pipeline that should be queried for data points.

sink_transform
__iter__()
__getitem__(item) Tuple[Tuple[torch.Tensor, Ellipsis], Tuple[torch.Tensor, Ellipsis]]

Access the timeseries at position index and its corresponding label sequence. A call to this function should return a single time series that was sampled independently of the other time series in this dataset.

Parameters:

index – The zero-based index of the time series to retrieve.

Returns:

A tuple (inputs, targets), where inputs is again a tuple of Tensors with shape (T, D*), where D* can very between the tensors. targets contains labels for the time series as tensors of shape (T,).

Return type:

Tuple[Tuple[torch.Tensor, Ellipsis], Tuple[torch.Tensor, Ellipsis]]

__len__()

This should return the number of independent time series in the dataset

property seq_len: int | List[int]

This should return the length of each time series. If the time series have different lengths, the return value should be a list that contains the length of each sequence. If all sequences are of equal length, this should return an int.

Return type:

Union[int, List[int]]

property num_features: int | Tuple[int, Ellipsis]

Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension.

Return type:

Union[int, Tuple[int, Ellipsis]]

static get_default_pipeline() Dict[str, Dict[str, Any]]

Return the default pipeline for this dataset that is used if the user does not specify a different pipeline. This must be a dict of the form:

{
    '<name>': {'class': '<name-of-transform-class>', 'args': {'<args-for-constructor>', ...}},
    ...
}
Return type:

Dict[str, Dict[str, Any]]

static get_feature_names() List[str]
Abstractmethod:

Return type:

List[str]

Return names for the features in the order they are present in the data tensors.

Returns:

A list of strings with names for each feature.

Return type:

List[str]

save(path: str, chunk_size: int = 0, batch_dim: int = 0)

Save this dataset as it would be returned after all processing by its transforms is done.

Parameters:
  • path (str) – The folder in which to save the dataset.

  • chunk_size (int) – The maximum number of data points that should be saved in one file. If there are more data points than this value, multiple files will be created. Set this to 0 to save the entire dataset in one file.

  • batch_dim (int) – All (or chunk_size) datapoints will be stacked along this axis in a new tensor that is then saved to disk.