timesead.data.smd_dataset

Attributes

`FILENAMES`
`TRAIN_LENS`
`TEST_LENS`

Classes

SMDDataset

Implementation of the Server Machine Dataset [Su2019].

Module Contents

timesead.data.smd_dataset.FILENAMES = ['machine-1-1.txt', 'machine-1-2.txt', 'machine-1-3.txt', 'machine-1-4.txt', 'machine-1-5.txt',...

timesead.data.smd_dataset.TRAIN_LENS = [28479, 23694, 23702, 23706, 23705, 23688, 23697, 23698, 23693, 23699, 23688, 23689, 23688,...

timesead.data.smd_dataset.TEST_LENS = [28479, 23694, 23703, 23707, 23706, 23689, 23697, 23699, 23694, 23700, 23689, 23689, 23689,...

class timesead.data.smd_dataset.SMDDataset(server_id: int, path: str = os.path.join(DATA_DIRECTORY, 'smd'), training: bool = True, standardize: bool | Callable = True, download: bool = True, preprocess: bool = True)

Bases: timesead.data.dataset.BaseTSDataset

Implementation of the Server Machine Dataset [Su2019]. The data consists of traces from 28 different servers recorded over several weeks. We consider each trace to be a separate dataset.

Note

Automatically downloading the dataset currently requires that you have git installed on your system!

[Su2019] (1,2)

Y. Su, Y. Zhao, C. Niu, R. Liu, W. Sun, D. Pei. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019 Jul 25 (pp. 2828-2837).

Parameters:

path (str) – Folder from which to load the dataset.
server_id (int) – Data from which machine to load. Must be in [0, …, 27].
training (bool) – Whether to load the training or the test set.
standardize (Union[bool, Callable]) – Can be either a bool that decides whether to apply the dataset-dependent default standardization or a function with signature (dataframe, stats) -> dataframe, where stats is a dictionary of common statistics on the training dataset (i.e., mean, std, median, etc. for each feature)
download (bool) – Whether to download the dataset if it doesn’t exist.
preprocess (bool) – Whether to setup the dataset for experiments.

GITHUB_LINK = 'https://github.com/NetManAIOps/OmniAnomaly.git'

server_id

path

processed_dir

training = True

standardize = True

inputs = None

targets = None

load_data() → Tuple[numpy.ndarray, numpy.ndarray]

Return type:: Tuple[numpy.ndarray, numpy.ndarray]

__getitem__(item: int) → Tuple[Tuple[torch.Tensor], Tuple[torch.Tensor]]

Parameters:: item (int)
Return type:: Tuple[Tuple[torch.Tensor], Tuple[torch.Tensor]]

__len__() → int | None

Return type:: Optional[int]

property seq_len: int | List[int]

Return type:: Union[int, List[int]]

property num_features: int

Return type:: int

static get_default_pipeline() → Dict[str, Dict[str, Any]]

Return type:: Dict[str, Dict[str, Any]]

static get_feature_names()

download()