timesead.data.wadi_dataset ========================== .. py:module:: timesead.data.wadi_dataset Classes ------- .. autoapisummary:: timesead.data.wadi_dataset.WADIDataset Module Contents --------------- .. py:class:: WADIDataset(path: str = os.path.join(DATA_DIRECTORY, 'wadi', 'WADI.A2_19 Nov 2019'), training: bool = True, standardize: Union[bool, Callable[[pandas.DataFrame, Dict], pandas.DataFrame]] = True, remove_startup: bool = True, split: bool = True, preprocess: bool = True) Bases: :py:obj:`timesead.data.dataset.BaseTSDataset` Implementation of the WAter DIstribution Dataset [Ahmed2017]_. This dataset was recorded from a miniature water distribution network over the course of several weeks. Both training and test set consist of a single long time series, or two time series, see details about the `split` parameter. During testing, several attacks (cyber and physical) were carried out against the plant. .. note:: Due to licensing issues, we cannot offer an automatic download option for this dataset. Please visit https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/ and fill in the form to request a download link. The required files are in the folder `WADI.A2_19 Nov 2019`. .. [Ahmed2017] Ahmed, Chuadhry Mujeeb, Venkata Reddy Palleti, and Aditya P. Mathur. "WADI: a water distribution testbed for research in the design of secure cyber physical systems." Proceedings of the 3rd international workshop on cyber-physical systems for smart water networks. 2017. :param path: Folder from which to load the dataset. :param training: Whether to load the training or the test set. :param standardize: Can be either a bool that decides whether to apply the dataset-dependent default standardization or a function with signature (dataframe, stats) -> dataframe, where stats is a dictionary of common statistics on the training dataset (i.e., mean, std, median, etc. for each feature) :param remove_startup: This removes the first 5 hours of the training set, during which the plant is starting. :param split: The authors removed some data points in v2 of the training dataset. Thus, there is a clear split at index 335998. Setting this to true will return 2 TS split at this location. Otherwise, one long TS is returned. :param preprocess: Whether to setup the dataset for experiments. .. py:attribute:: path .. py:attribute:: processed_dir .. py:attribute:: training :value: True .. py:attribute:: remove_startup :value: True .. py:attribute:: split :value: True .. py:attribute:: startup_remove_amount :value: 18000 .. py:attribute:: split_index :value: 335999 .. py:attribute:: inputs :value: None .. py:attribute:: targets :value: None .. py:method:: load_data() -> Tuple[numpy.ndarray, numpy.ndarray] .. py:method:: __getitem__(item: int) -> Tuple[Tuple[torch.Tensor], Tuple[torch.Tensor]] This should return the time series of the dataset. I.e., if the dataset has 5 independent time-series, passing 0, ..., 4 as item should return these time series. The format is (inputs, targets), where inputs and targets are tupples of torch.Tensors. :param item: Index of the time series to return. :return: .. py:method:: __len__() -> Optional[int] This should return the number of independent time series in the dataset .. py:property:: seq_len :type: Optional[int] This should return the length of each time series. If the time series have different lengths, the return value should be a list that contains the length of each sequence. If all sequences are of equal length, this should return an int. .. py:property:: num_features :type: int Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension. .. py:method:: get_default_pipeline() -> Dict[str, Dict[str, Any]] :staticmethod: Return the default pipeline for this dataset that is used if the user does not specify a different pipeline. This must be a dict of the form:: { '': {'class': '', 'args': {'', ...}}, ... } .. py:method:: get_feature_names() :staticmethod: Return names for the features in the order they are present in the data tensors. :return: A list of strings with names for each feature.