timesead.data.transforms.pipeline_dataset
=========================================

.. py:module:: timesead.data.transforms.pipeline_dataset


Classes
-------

.. autoapisummary::

   timesead.data.transforms.pipeline_dataset.PipelineDataset


Functions
---------

.. autoapisummary::

   timesead.data.transforms.pipeline_dataset.make_pipe_from_dict


Module Contents
---------------

.. py:function:: make_pipe_from_dict(pipeline: Dict[str, Dict[str, Any]], data_source: timesead.data.transforms.dataset_source.DatasetSource) -> PipelineDataset

   Instantiates a :class:`PipelineDataset` from a given :class:`~timesead.data.transforms.DatasetSource` and a
   pipeline specification.

   .. warning::
       In case the specification of a :class:`~timesead.data.transforms.Transform` in the pipeline is incomplete or its
       instantiation fails for some other reason, this function simply prints a warning and continues with the next
       :class:`~timesead.data.transforms.Transform` instead of raising the exception.

   :param pipeline: Specification of the pipeline as a dict in the following format::

           {
               '<name>': {'class': '<name-of-transform-class>', 'args': {'<args-for-constructor>': <value>, ...}},
               ...
           }

       The function respects the order of transforms specified in the dict. That is, the first transform specified in
       the dict will be the first transform added to the pipeline and so on.
   :param data_source: The :class:`~timesead.data.transforms.DatasetSource` that acts as a source transform for the
       pipeline.
   :return: A :class:`PipelineDataset` that retrieves data from the given :class:`DatasetSource` and then executes the
       specified pipeline.


.. py:class:: PipelineDataset(sink_transform: timesead.data.transforms.transform_base.Transform)

   Bases: :py:obj:`timesead.data.dataset.BaseTSDataset`


   Dataset that can be used with a :class:`torch.utils.data.DataLoader` and executes a pipeline of transforms to
   retrieve its datapoints.

   :param sink_transform: The last :class:`~timesead.data.transforms.Transform` in the pipeline that should be
       queried for data points.


   .. py:attribute:: sink_transform


   .. py:method:: __iter__()


   .. py:method:: __getitem__(item) -> Tuple[Tuple[torch.Tensor, Ellipsis], Tuple[torch.Tensor, Ellipsis]]

      Access the timeseries at position `index` and its corresponding label sequence. A call to this function should
      return a single time series that was sampled independently of the other time series in this dataset.

      :param index: The zero-based index of the time series to retrieve.
      :return: A tuple `(inputs, targets)`, where inputs is again a tuple of :class:`~torch.Tensor`\s with shape
          `(T, D*)`, where `D*` can very between the tensors. `targets` contains labels for the time series as tensors
          of shape `(T,)`.


   .. py:method:: __len__()

      This should return the number of independent time series in the dataset


   .. py:property:: seq_len
      :type: Union[int, List[int]]


      This should return the length of each time series. If the time series have different lengths, the return
      value should be a list that contains the length of each sequence. If all sequences are of equal length,
      this should return an int.


   .. py:property:: num_features
      :type: Union[int, Tuple[int, Ellipsis]]


      Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension.


   .. py:method:: get_default_pipeline() -> Dict[str, Dict[str, Any]]
      :staticmethod:


      Return the default pipeline for this dataset that is used if the user does not specify a different pipeline.
      This must be a dict of the form::

          {
              '<name>': {'class': '<name-of-transform-class>', 'args': {'<args-for-constructor>', ...}},
              ...
          }


   .. py:method:: get_feature_names() -> List[str]
      :staticmethod:

      :abstractmethod:


      Return names for the features in the order they are present in the data tensors.

      :return: A list of strings with names for each feature.


   .. py:method:: save(path: str, chunk_size: int = 0, batch_dim: int = 0)

      Save this dataset as it would be returned after all processing by its transforms is done.

      :param path: The folder in which to save the dataset.
      :param chunk_size: The maximum number of data points that should be saved in one file. If there are more data
          points than this value, multiple files will be created. Set this to 0 to save the entire dataset in one file.
      :param batch_dim: All (or `chunk_size`) datapoints will be stacked along this axis in a new tensor that is
         then saved to disk.