timesead.data.smap_dataset

Attributes

DATASET_URL

LABELS_URL

BUFFER_SIZE

ZIP_CHECKSUM

FILE_CHECKSUMS

Classes

SMAPDownloader

Class that downloads and extracts the SMAP and MSL datasets [Hundman2018].

SMAPDataset

Implementation of the SMAP dataset [Hundman2018].

MSLDataset

Implementation of the MSL dataset [Hundman2018].

Module Contents

timesead.data.smap_dataset.DATASET_URL = 'https://s3-us-west-2.amazonaws.com/telemanom/data.zip'
timesead.data.smap_dataset.LABELS_URL = 'https://raw.githubusercontent.com/khundman/telemanom/master/labeled_anomalies.csv'
timesead.data.smap_dataset.BUFFER_SIZE = 16777216
timesead.data.smap_dataset.ZIP_CHECKSUM = 'b4d66deb492d9b0a353b51879152687ed9313897e8e19320d2dc853d738ed8a7'
timesead.data.smap_dataset.FILE_CHECKSUMS
class timesead.data.smap_dataset.SMAPDownloader(data_path: str = os.path.join(DATA_DIRECTORY, 'smap'))

Class that downloads and extracts the SMAP and MSL datasets [Hundman2018]. Files are also checked for integrity against their SHA-256 hashes stored in data/SMAP/smap_checksums.json.

[Hundman2018] (1,2)

K. Hundman, V. Constantinou, C. Laporte, I. Colwell, T. Soderstrom. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018 Jul 19 (pp. 387-395).

Parameters:

data_path (str) – The folder in which to download the dataset.

data_path
static compute_sha256(file, buffer_size: int = BUFFER_SIZE) str

Compute the SHA-256 hash of a file object.

Parameters:
  • file – A file object returned by open(). Note that it should be opened in binary mode.

  • buffer_size (int) – Data from the file is read in chunks. This specifies the chunk size in bytes.

Returns:

The SHA-256 hash of the file as a hex string.

Return type:

str

check_existing_files() bool

Checks if all files specified in the FILE_CHECKSUMS json file are present and if their checksums are correct.

Returns:

True if all files are present and their checksums are correct, False otherwise.

Return type:

bool

static download_to_file(url: str, file, buffer_size: int = BUFFER_SIZE)

Download a file from any URL supported by urllib to a file object.

Parameters:
  • url (str) – The URL of the file to download.

  • file – Open file object to which the data is saved. This should be in binary mode.

  • buffer_size (int) – This method downloads data in chunks. buffer_size specifies the chunk size.

download_data()

Download the SMAP and MSL datasets.

class timesead.data.smap_dataset.SMAPDataset(data_path: str = os.path.join(DATA_DIRECTORY, 'smap'), channel_id: int = 0, training: bool = True, download: bool = True)

Bases: _SMAPBaseDataset

Implementation of the SMAP dataset [Hundman2018]. It consists of several monitored values from a single satellite and commands sent to that satellite. We consider the trace for each channel a separate dataset, where the monitored value is in the first feature dimension and the remaining binary features correspond to the commands.

Parameters:
  • data_path (str) – Folder from which to load the dataset.

  • channel_id (int) – Data from which channel to load. Must be in [0-54].

  • training (bool) – Whether to load the training or the test set.

  • download (bool) – Whether to download the dataset if it doesn’t exist.

property num_features: int | Tuple[int, Ellipsis]

Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension.

Return type:

Union[int, Tuple[int, Ellipsis]]

class timesead.data.smap_dataset.MSLDataset(data_path: str = os.path.join(DATA_DIRECTORY, 'smap'), channel_id: int = 0, training: bool = True, download: bool = True)

Bases: _SMAPBaseDataset

Implementation of the MSL dataset [Hundman2018]. It consists of several monitored values from a mars rover and commands sent to the rover. We consider the trace for each channel a separate dataset, where the monitored value is in the first feature dimension and the remaining binary features correspond to the commands.

Parameters:
  • data_path (str) – Folder from which to load the dataset.

  • channel_id (int) – Data from which channel to load. Must be in [0-26].

  • training (bool) – Whether to load the training or the test set.

  • download (bool) – Whether to download the dataset if it doesn’t exist.

property num_features: int | Tuple[int, Ellipsis]

Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension.

Return type:

Union[int, Tuple[int, Ellipsis]]