timesead.data.smap_dataset ========================== .. py:module:: timesead.data.smap_dataset Attributes ---------- .. autoapisummary:: timesead.data.smap_dataset.DATASET_URL timesead.data.smap_dataset.LABELS_URL timesead.data.smap_dataset.BUFFER_SIZE timesead.data.smap_dataset.ZIP_CHECKSUM timesead.data.smap_dataset.FILE_CHECKSUMS Classes ------- .. autoapisummary:: timesead.data.smap_dataset.SMAPDownloader timesead.data.smap_dataset.SMAPDataset timesead.data.smap_dataset.MSLDataset Module Contents --------------- .. py:data:: DATASET_URL :value: 'https://s3-us-west-2.amazonaws.com/telemanom/data.zip' .. py:data:: LABELS_URL :value: 'https://raw.githubusercontent.com/khundman/telemanom/master/labeled_anomalies.csv' .. py:data:: BUFFER_SIZE :value: 16777216 .. py:data:: ZIP_CHECKSUM :value: 'b4d66deb492d9b0a353b51879152687ed9313897e8e19320d2dc853d738ed8a7' .. py:data:: FILE_CHECKSUMS .. py:class:: SMAPDownloader(data_path: str = os.path.join(DATA_DIRECTORY, 'smap')) Class that downloads and extracts the SMAP and MSL datasets [Hundman2018]_. Files are also checked for integrity against their SHA-256 hashes stored in data/SMAP/smap_checksums.json. .. [Hundman2018] K. Hundman, V. Constantinou, C. Laporte, I. Colwell, T. Soderstrom. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018 Jul 19 (pp. 387-395). :param data_path: The folder in which to download the dataset. .. py:attribute:: data_path .. py:method:: compute_sha256(file, buffer_size: int = BUFFER_SIZE) -> str :staticmethod: Compute the SHA-256 hash of a file object. :param file: A file object returned by :func:`open`. Note that it should be opened in binary mode. :param buffer_size: Data from the file is read in chunks. This specifies the chunk size in bytes. :return: The SHA-256 hash of the file as a hex string. .. py:method:: check_existing_files() -> bool Checks if all files specified in the `FILE_CHECKSUMS` json file are present and if their checksums are correct. :return: `True` if all files are present and their checksums are correct, `False` otherwise. .. py:method:: download_to_file(url: str, file, buffer_size: int = BUFFER_SIZE) :staticmethod: Download a file from any URL supported by `urllib` to a file object. :param url: The URL of the file to download. :param file: Open file object to which the data is saved. This should be in binary mode. :param buffer_size: This method downloads data in chunks. `buffer_size` specifies the chunk size. .. py:method:: download_data() Download the SMAP and MSL datasets. .. py:class:: SMAPDataset(data_path: str = os.path.join(DATA_DIRECTORY, 'smap'), channel_id: int = 0, training: bool = True, download: bool = True) Bases: :py:obj:`_SMAPBaseDataset` Implementation of the SMAP dataset [Hundman2018]. It consists of several monitored values from a single satellite and commands sent to that satellite. We consider the trace for each channel a separate dataset, where the monitored value is in the first feature dimension and the remaining binary features correspond to the commands. :param data_path: Folder from which to load the dataset. :param channel_id: Data from which channel to load. Must be in [0-54]. :param training: Whether to load the training or the test set. :param download: Whether to download the dataset if it doesn't exist. .. py:property:: num_features :type: Union[int, Tuple[int, Ellipsis]] Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension. .. py:class:: MSLDataset(data_path: str = os.path.join(DATA_DIRECTORY, 'smap'), channel_id: int = 0, training: bool = True, download: bool = True) Bases: :py:obj:`_SMAPBaseDataset` Implementation of the MSL dataset [Hundman2018]. It consists of several monitored values from a mars rover and commands sent to the rover. We consider the trace for each channel a separate dataset, where the monitored value is in the first feature dimension and the remaining binary features correspond to the commands. :param data_path: Folder from which to load the dataset. :param channel_id: Data from which channel to load. Must be in [0-26]. :param training: Whether to load the training or the test set. :param download: Whether to download the dataset if it doesn't exist. .. py:property:: num_features :type: Union[int, Tuple[int, Ellipsis]] Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension.