timesead.data.wadi_dataset

Classes

WADIDataset

Implementation of the WAter DIstribution Dataset [Ahmed2017].

Module Contents

class timesead.data.wadi_dataset.WADIDataset(path: str = os.path.join(DATA_DIRECTORY, 'wadi', 'WADI.A2_19 Nov 2019'), training: bool = True, standardize: bool | Callable[[pandas.DataFrame, Dict], pandas.DataFrame] = True, remove_startup: bool = True, split: bool = True, preprocess: bool = True)

Bases: timesead.data.dataset.BaseTSDataset

Implementation of the WAter DIstribution Dataset [Ahmed2017]. This dataset was recorded from a miniature water distribution network over the course of several weeks. Both training and test set consist of a single long time series, or two time series, see details about the split parameter. During testing, several attacks (cyber and physical) were carried out against the plant.

Note

Due to licensing issues, we cannot offer an automatic download option for this dataset. Please visit https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/ and fill in the form to request a download link. The required files are in the folder WADI.A2_19 Nov 2019.

[Ahmed2017] (1,2)

Ahmed, Chuadhry Mujeeb, Venkata Reddy Palleti, and Aditya P. Mathur. “WADI: a water distribution testbed for research in the design of secure cyber physical systems.” Proceedings of the 3rd international workshop on cyber-physical systems for smart water networks. 2017.

Parameters:
  • path (str) – Folder from which to load the dataset.

  • training (bool) – Whether to load the training or the test set.

  • standardize (Union[bool, Callable[[pandas.DataFrame, Dict], pandas.DataFrame]]) – Can be either a bool that decides whether to apply the dataset-dependent default standardization or a function with signature (dataframe, stats) -> dataframe, where stats is a dictionary of common statistics on the training dataset (i.e., mean, std, median, etc. for each feature)

  • remove_startup (bool) – This removes the first 5 hours of the training set, during which the plant is starting.

  • split (bool) – The authors removed some data points in v2 of the training dataset. Thus, there is a clear split at index 335998. Setting this to true will return 2 TS split at this location. Otherwise, one long TS is returned.

  • preprocess (bool) – Whether to setup the dataset for experiments.

path
processed_dir
training = True
remove_startup = True
split = True
startup_remove_amount = 18000
split_index = 335999
inputs = None
targets = None
load_data() Tuple[numpy.ndarray, numpy.ndarray]
Return type:

Tuple[numpy.ndarray, numpy.ndarray]

__getitem__(item: int) Tuple[Tuple[torch.Tensor], Tuple[torch.Tensor]]

This should return the time series of the dataset. I.e., if the dataset has 5 independent time-series, passing 0, …, 4 as item should return these time series. The format is (inputs, targets), where inputs and targets are tupples of torch.Tensors.

Parameters:

item (int) – Index of the time series to return.

Returns:

Return type:

Tuple[Tuple[torch.Tensor], Tuple[torch.Tensor]]

__len__() int | None

This should return the number of independent time series in the dataset

Return type:

Optional[int]

property seq_len: int | None

This should return the length of each time series. If the time series have different lengths, the return value should be a list that contains the length of each sequence. If all sequences are of equal length, this should return an int.

Return type:

Optional[int]

property num_features: int

Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension.

Return type:

int

static get_default_pipeline() Dict[str, Dict[str, Any]]

Return the default pipeline for this dataset that is used if the user does not specify a different pipeline. This must be a dict of the form:

{
    '<name>': {'class': '<name-of-transform-class>', 'args': {'<args-for-constructor>', ...}},
    ...
}
Return type:

Dict[str, Dict[str, Any]]

static get_feature_names()

Return names for the features in the order they are present in the data tensors.

Returns:

A list of strings with names for each feature.