timesead.data.transforms.pipeline_dataset
Classes
Dataset that can be used with a |
Functions
|
Instantiates a |
Module Contents
- timesead.data.transforms.pipeline_dataset.make_pipe_from_dict(pipeline: Dict[str, Dict[str, Any]], data_source: timesead.data.transforms.dataset_source.DatasetSource) PipelineDataset
Instantiates a
PipelineDatasetfrom a givenDatasetSourceand a pipeline specification.Warning
In case the specification of a
Transformin the pipeline is incomplete or its instantiation fails for some other reason, this function simply prints a warning and continues with the nextTransforminstead of raising the exception.- Parameters:
pipeline (Dict[str, Dict[str, Any]]) –
Specification of the pipeline as a dict in the following format:
{ '<name>': {'class': '<name-of-transform-class>', 'args': {'<args-for-constructor>': <value>, ...}}, ... }
The function respects the order of transforms specified in the dict. That is, the first transform specified in the dict will be the first transform added to the pipeline and so on.
data_source (timesead.data.transforms.dataset_source.DatasetSource) – The
DatasetSourcethat acts as a source transform for the pipeline.
- Returns:
A
PipelineDatasetthat retrieves data from the givenDatasetSourceand then executes the specified pipeline.- Return type:
- class timesead.data.transforms.pipeline_dataset.PipelineDataset(sink_transform: timesead.data.transforms.transform_base.Transform)
Bases:
timesead.data.dataset.BaseTSDatasetDataset that can be used with a
torch.utils.data.DataLoaderand executes a pipeline of transforms to retrieve its datapoints.- Parameters:
sink_transform (timesead.data.transforms.transform_base.Transform) – The last
Transformin the pipeline that should be queried for data points.
- sink_transform
- __iter__()
- __getitem__(item) Tuple[Tuple[torch.Tensor, Ellipsis], Tuple[torch.Tensor, Ellipsis]]
Access the timeseries at position index and its corresponding label sequence. A call to this function should return a single time series that was sampled independently of the other time series in this dataset.
- Parameters:
index – The zero-based index of the time series to retrieve.
- Returns:
A tuple (inputs, targets), where inputs is again a tuple of
Tensors with shape (T, D*), where D* can very between the tensors. targets contains labels for the time series as tensors of shape (T,).- Return type:
Tuple[Tuple[torch.Tensor, Ellipsis], Tuple[torch.Tensor, Ellipsis]]
- __len__()
This should return the number of independent time series in the dataset
- property seq_len: int | List[int]
This should return the length of each time series. If the time series have different lengths, the return value should be a list that contains the length of each sequence. If all sequences are of equal length, this should return an int.
- property num_features: int | Tuple[int, Ellipsis]
Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension.
- static get_default_pipeline() Dict[str, Dict[str, Any]]
Return the default pipeline for this dataset that is used if the user does not specify a different pipeline. This must be a dict of the form:
{ '<name>': {'class': '<name-of-transform-class>', 'args': {'<args-for-constructor>', ...}}, ... }
- static get_feature_names() List[str]
- Abstractmethod:
- Return type:
List[str]
Return names for the features in the order they are present in the data tensors.
- Returns:
A list of strings with names for each feature.
- Return type:
List[str]
- save(path: str, chunk_size: int = 0, batch_dim: int = 0)
Save this dataset as it would be returned after all processing by its transforms is done.
- Parameters:
path (str) – The folder in which to save the dataset.
chunk_size (int) – The maximum number of data points that should be saved in one file. If there are more data points than this value, multiple files will be created. Set this to 0 to save the entire dataset in one file.
batch_dim (int) – All (or chunk_size) datapoints will be stacked along this axis in a new tensor that is then saved to disk.