timesead.data.transforms.pipeline_dataset ========================================= .. py:module:: timesead.data.transforms.pipeline_dataset Classes ------- .. autoapisummary:: timesead.data.transforms.pipeline_dataset.PipelineDataset Functions --------- .. autoapisummary:: timesead.data.transforms.pipeline_dataset.make_pipe_from_dict Module Contents --------------- .. py:function:: make_pipe_from_dict(pipeline: Dict[str, Dict[str, Any]], data_source: timesead.data.transforms.dataset_source.DatasetSource) -> PipelineDataset Instantiates a :class:`PipelineDataset` from a given :class:`~timesead.data.transforms.DatasetSource` and a pipeline specification. .. warning:: In case the specification of a :class:`~timesead.data.transforms.Transform` in the pipeline is incomplete or its instantiation fails for some other reason, this function simply prints a warning and continues with the next :class:`~timesead.data.transforms.Transform` instead of raising the exception. :param pipeline: Specification of the pipeline as a dict in the following format:: { '': {'class': '', 'args': {'': , ...}}, ... } The function respects the order of transforms specified in the dict. That is, the first transform specified in the dict will be the first transform added to the pipeline and so on. :param data_source: The :class:`~timesead.data.transforms.DatasetSource` that acts as a source transform for the pipeline. :return: A :class:`PipelineDataset` that retrieves data from the given :class:`DatasetSource` and then executes the specified pipeline. .. py:class:: PipelineDataset(sink_transform: timesead.data.transforms.transform_base.Transform) Bases: :py:obj:`timesead.data.dataset.BaseTSDataset` Dataset that can be used with a :class:`torch.utils.data.DataLoader` and executes a pipeline of transforms to retrieve its datapoints. :param sink_transform: The last :class:`~timesead.data.transforms.Transform` in the pipeline that should be queried for data points. .. py:attribute:: sink_transform .. py:method:: __iter__() .. py:method:: __getitem__(item) -> Tuple[Tuple[torch.Tensor, Ellipsis], Tuple[torch.Tensor, Ellipsis]] Access the timeseries at position `index` and its corresponding label sequence. A call to this function should return a single time series that was sampled independently of the other time series in this dataset. :param index: The zero-based index of the time series to retrieve. :return: A tuple `(inputs, targets)`, where inputs is again a tuple of :class:`~torch.Tensor`\s with shape `(T, D*)`, where `D*` can very between the tensors. `targets` contains labels for the time series as tensors of shape `(T,)`. .. py:method:: __len__() This should return the number of independent time series in the dataset .. py:property:: seq_len :type: Union[int, List[int]] This should return the length of each time series. If the time series have different lengths, the return value should be a list that contains the length of each sequence. If all sequences are of equal length, this should return an int. .. py:property:: num_features :type: Union[int, Tuple[int, Ellipsis]] Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension. .. py:method:: get_default_pipeline() -> Dict[str, Dict[str, Any]] :staticmethod: Return the default pipeline for this dataset that is used if the user does not specify a different pipeline. This must be a dict of the form:: { '': {'class': '', 'args': {'', ...}}, ... } .. py:method:: get_feature_names() -> List[str] :staticmethod: :abstractmethod: Return names for the features in the order they are present in the data tensors. :return: A list of strings with names for each feature. .. py:method:: save(path: str, chunk_size: int = 0, batch_dim: int = 0) Save this dataset as it would be returned after all processing by its transforms is done. :param path: The folder in which to save the dataset. :param chunk_size: The maximum number of data points that should be saved in one file. If there are more data points than this value, multiple files will be created. Set this to 0 to save the entire dataset in one file. :param batch_dim: All (or `chunk_size`) datapoints will be stacked along this axis in a new tensor that is then saved to disk.