timesead.data.smd_dataset

Attributes

FILENAMES

TRAIN_LENS

TEST_LENS

Classes

SMDDataset

Implementation of the Server Machine Dataset [Su2019].

Module Contents

timesead.data.smd_dataset.FILENAMES = ['machine-1-1.txt', 'machine-1-2.txt', 'machine-1-3.txt', 'machine-1-4.txt', 'machine-1-5.txt',...
timesead.data.smd_dataset.TRAIN_LENS = [28479, 23694, 23702, 23706, 23705, 23688, 23697, 23698, 23693, 23699, 23688, 23689, 23688,...
timesead.data.smd_dataset.TEST_LENS = [28479, 23694, 23703, 23707, 23706, 23689, 23697, 23699, 23694, 23700, 23689, 23689, 23689,...
class timesead.data.smd_dataset.SMDDataset(server_id: int, path: str = os.path.join(DATA_DIRECTORY, 'smd'), training: bool = True, standardize: bool | Callable = True, download: bool = True, preprocess: bool = True)

Bases: timesead.data.dataset.BaseTSDataset

Implementation of the Server Machine Dataset [Su2019]. The data consists of traces from 28 different servers recorded over several weeks. We consider each trace to be a separate dataset.

Note

Automatically downloading the dataset currently requires that you have git installed on your system!

[Su2019] (1,2)

Y. Su, Y. Zhao, C. Niu, R. Liu, W. Sun, D. Pei. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019 Jul 25 (pp. 2828-2837).

Parameters:
  • path (str) – Folder from which to load the dataset.

  • server_id (int) – Data from which machine to load. Must be in [0, …, 27].

  • training (bool) – Whether to load the training or the test set.

  • standardize (Union[bool, Callable]) – Can be either a bool that decides whether to apply the dataset-dependent default standardization or a function with signature (dataframe, stats) -> dataframe, where stats is a dictionary of common statistics on the training dataset (i.e., mean, std, median, etc. for each feature)

  • download (bool) – Whether to download the dataset if it doesn’t exist.

  • preprocess (bool) – Whether to setup the dataset for experiments.

server_id
path
processed_dir
training = True
standardize = True
inputs = None
targets = None
load_data() Tuple[numpy.ndarray, numpy.ndarray]
Return type:

Tuple[numpy.ndarray, numpy.ndarray]

__getitem__(item: int) Tuple[Tuple[torch.Tensor], Tuple[torch.Tensor]]
Parameters:

item (int)

Return type:

Tuple[Tuple[torch.Tensor], Tuple[torch.Tensor]]

__len__() int | None
Return type:

Optional[int]

property seq_len: int | List[int]
Return type:

Union[int, List[int]]

property num_features: int
Return type:

int

static get_default_pipeline() Dict[str, Dict[str, Any]]
Return type:

Dict[str, Dict[str, Any]]

static get_feature_names()
download()