timesead.data.wadi_dataset
==========================

.. py:module:: timesead.data.wadi_dataset


Classes
-------

.. autoapisummary::

   timesead.data.wadi_dataset.WADIDataset


Module Contents
---------------

.. py:class:: WADIDataset(path: str = os.path.join(DATA_DIRECTORY, 'wadi', 'WADI.A2_19 Nov 2019'), training: bool = True, standardize: Union[bool, Callable[[pandas.DataFrame, Dict], pandas.DataFrame]] = True, remove_startup: bool = True, split: bool = True, preprocess: bool = True)

   Bases: :py:obj:`timesead.data.dataset.BaseTSDataset`


   Implementation of the WAter DIstribution Dataset [Ahmed2017]_.
   This dataset was recorded from a miniature water distribution network over the course of several weeks.
   Both training and test set consist of a single long time series, or two time series, see details about the `split`
   parameter. During testing, several attacks (cyber and physical) were carried out against the plant.

   .. note::
      Due to licensing issues, we cannot offer an automatic download option for this dataset. Please visit
      https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/ and fill in the form to request a download link.
      The required files are in the folder `WADI.A2_19 Nov 2019`.

   .. [Ahmed2017] Ahmed, Chuadhry Mujeeb, Venkata Reddy Palleti, and Aditya P. Mathur.
       "WADI: a water distribution testbed for research in the design of secure cyber physical systems."
       Proceedings of the 3rd international workshop on cyber-physical systems for smart water networks. 2017.

   :param path: Folder from which to load the dataset.
   :param training: Whether to load the training or the test set.
   :param standardize: Can be either a bool that decides whether to apply the dataset-dependent default
       standardization or a function with signature (dataframe, stats) -> dataframe, where stats is a dictionary of
       common statistics on the training dataset (i.e., mean, std, median, etc. for each feature)
   :param remove_startup: This removes the first 5 hours of the training set, during which the plant is starting.
   :param split: The authors removed some data points in v2 of the training dataset. Thus, there is a clear split
       at index 335998. Setting this to true will return 2 TS split at this location. Otherwise, one long TS is
       returned.
   :param preprocess: Whether to setup the dataset for experiments.


   .. py:attribute:: path


   .. py:attribute:: processed_dir


   .. py:attribute:: training
      :value: True


   .. py:attribute:: remove_startup
      :value: True


   .. py:attribute:: split
      :value: True


   .. py:attribute:: startup_remove_amount
      :value: 18000


   .. py:attribute:: split_index
      :value: 335999


   .. py:attribute:: inputs
      :value: None


   .. py:attribute:: targets
      :value: None


   .. py:method:: load_data() -> Tuple[numpy.ndarray, numpy.ndarray]


   .. py:method:: __getitem__(item: int) -> Tuple[Tuple[torch.Tensor], Tuple[torch.Tensor]]

      This should return the time series of the dataset. I.e., if the dataset has 5 independent time-series,
      passing 0, ..., 4 as item should return these time series. The format is (inputs, targets), where inputs
      and targets are tupples of torch.Tensors.

      :param item: Index of the time series to return.
      :return:


   .. py:method:: __len__() -> Optional[int]

      This should return the number of independent time series in the dataset


   .. py:property:: seq_len
      :type: Optional[int]


      This should return the length of each time series. If the time series have different lengths, the return
      value should be a list that contains the length of each sequence. If all sequences are of equal length,
      this should return an int.


   .. py:property:: num_features
      :type: int


      Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension.


   .. py:method:: get_default_pipeline() -> Dict[str, Dict[str, Any]]
      :staticmethod:


      Return the default pipeline for this dataset that is used if the user does not specify a different pipeline.
      This must be a dict of the form::

          {
              '<name>': {'class': '<name-of-transform-class>', 'args': {'<args-for-constructor>', ...}},
              ...
          }


   .. py:method:: get_feature_names()
      :staticmethod:


      Return names for the features in the order they are present in the data tensors.

      :return: A list of strings with names for each feature.