timesead.data.swat_dataset

Classes

SWaTDataset

Implementation of the Secure WAter Treatment Dataset [Goh2016].

Module Contents

class timesead.data.swat_dataset.SWaTDataset(path: str = os.path.join(DATA_DIRECTORY, 'SWaT', 'SWaT.A1 & A2_Dec 2015', 'Physical'), training: bool = True, standardize: bool | Callable = True, remove_startup: bool = True, preprocess: bool = True)

Bases: timesead.data.dataset.BaseTSDataset

Implementation of the Secure WAter Treatment Dataset [Goh2016]. This dataset was recorded from a miniature water treatment plant over the course of several weeks. Both training and test set consist of a single long time series, each. During testing, several attacks (cyber and physical) were carried out against the plant.

Note

Due to licensing issues, we cannot offer an automatic download option for this dataset. Please visit https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/ and fill in the form to request a download link. The required files are in the folder SWaT.A1 & A2_Dec 2015/Physical.

Warning

This dataset relies on preprocessing to be done on the data. Preprocessing can be done by setting the preprocess argument to True. The class will fail giving an error without preprocessing.

[Goh2016] (1,2)

Goh, Jonathan, et al. “A dataset to support research in the design of secure water treatment systems.” Critical Information Infrastructures Security: 11th International Conference, CRITIS 2016, Paris, France, October 10–12, 2016, Revised Selected Papers 11. Springer International Publishing, 2017.

Parameters:

path (str) – Path where the files “SWaT_Dataset_Normal_v1.csv” and “SWaT_Dataset_Attack_v0.csv” are located.
training (bool) – If True, this will load the training set consisting only of normal samples. Otherwise, loads the test set, which includes attacks.
standardize (Union[bool, Callable]) – If True, apply min-max scaling (based on the training set). This can also be a function that accepts a DataFrame as its positional argument and a keyword argument stats: a dictionary of training data statistics.
remove_startup (bool) – If True, this will remove the first 5 hours from the training set, as during this time the system was starting from an empty state. To be more exact, this removes only 4.5 hours, since the first 30 minutes were already removed in v1 of the Dataset.
preprocess (bool) – If True, setup dataset to run experiments.

path

processed_dir

training = True

remove_startup = True

inputs = None

targets = None

load_data() → Tuple[numpy.ndarray, numpy.ndarray]

Return type:: Tuple[numpy.ndarray, numpy.ndarray]

__getitem__(item: int) → Tuple[Tuple[torch.Tensor], Tuple[torch.Tensor]]

Parameters:: item (int)
Return type:: Tuple[Tuple[torch.Tensor], Tuple[torch.Tensor]]

__len__() → int | None

Return type:: Optional[int]

property seq_len: int | None

Return type:: Optional[int]

property num_features: int

Return type:: int

static get_default_pipeline() → Dict[str, Dict[str, Any]]

Return type:: Dict[str, Dict[str, Any]]

static get_feature_names()