timesead.data.transforms ======================== .. py:module:: timesead.data.transforms Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/timesead/data/transforms/artificial_anomalies/index /autoapi/timesead/data/transforms/dataset_source/index /autoapi/timesead/data/transforms/general_transforms/index /autoapi/timesead/data/transforms/pipeline_dataset/index /autoapi/timesead/data/transforms/target_transforms/index /autoapi/timesead/data/transforms/transform_base/index /autoapi/timesead/data/transforms/window_transform/index Classes ------- .. autoapisummary:: timesead.data.transforms.Transform timesead.data.transforms.SubsampleTransform timesead.data.transforms.CacheTransform timesead.data.transforms.LimitTransform timesead.data.transforms.ReconstructionTargetTransform timesead.data.transforms.OneVsRestTargetTransform timesead.data.transforms.PredictionTargetTransform timesead.data.transforms.OverlapPredictionTargetTransform timesead.data.transforms.WindowTransform timesead.data.transforms.DatasetSource timesead.data.transforms.PipelineDataset Functions --------- .. autoapisummary:: timesead.data.transforms.make_dataset_split timesead.data.transforms.make_pipe_from_dict Package Contents ---------------- .. py:class:: Transform(parent: Optional[Transform]) Bases: :py:obj:`abc.ABC` Base class for all transforms. A Transform processes one (or several) data points and outputs them. Transforms can be chained in a pull-based pipeline. :param parent: Another :class:`~timesead.data.transforms.Transform` which is used as the data source for this :class:`~timesead.data.transforms.Transform`. Can be `None` in the case of a source. .. py:attribute:: parent .. py:method:: get_datapoint(item: int) -> Tuple[Tuple[torch.Tensor, Ellipsis], Tuple[torch.Tensor, Ellipsis]] Returns a datapoint (in our case this is a sequence) from this transform. :param item: Must be `0<=item Optional[int] This should return the number of available sequences after the transformation. .. py:property:: seq_len :type: Union[int, List[int]] This should return the length of each time series. If the time series have different lengths, the return value should be a list that contains the length of each sequence. If all sequences are of equal length, this should return an `int`. .. py:property:: num_features :type: Union[int, Tuple[int, Ellipsis]] Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension. .. py:class:: SubsampleTransform(parent: timesead.data.transforms.transform_base.Transform, subsampling_factor: int, aggregation: str = 'first') Bases: :py:obj:`timesead.data.transforms.transform_base.Transform` Subsample sequences by a specified factor. `subsampling_factor` consecutive datapoints in a sequence will be aggregated into one point using the `aggregation` function. :param parent: Another :class:`~timesead.data.transforms.Transform` which is used as the data source for this :class:`~timesead.data.transforms.Transform`. :param subsampling_factor: This specifies the number of consecutive data points that will be aggregated. :param aggregation: The function that should be applied to aggregate a window of data points. Can be either 'mean', 'last' or 'first'. .. py:attribute:: subsampling_factor .. py:property:: seq_len .. py:class:: CacheTransform(parent: timesead.data.transforms.transform_base.Transform) Bases: :py:obj:`timesead.data.transforms.transform_base.Transform` Caches the results from a previous :class:`~timesead.data.transforms.Transform` in memory so that expensive calculations do not have to be recomputed. :param parent: Another :class:`~timesead.data.transforms.Transform` which is used as the data source for this :class:`~timesead.data.transforms.Transform`. .. py:attribute:: cache .. py:class:: LimitTransform(parent: timesead.data.transforms.transform_base.Transform, count: int) Bases: :py:obj:`timesead.data.transforms.transform_base.Transform` Limits the amount of data points returned. :param parent: Another :class:`~timesead.data.transforms.Transform` which is used as the data source for this :class:`~timesead.data.transforms.Transform`. :param count: The max number of sequences that should be returned by this :class:`~timesead.data.transforms.Transform`. .. py:attribute:: max_count .. py:method:: __len__() .. py:class:: ReconstructionTargetTransform(parent: timesead.data.transforms.transform_base.Transform, replace_labels: bool = False) Bases: :py:obj:`timesead.data.transforms.transform_base.Transform` Adds the current inputs as targets for reconstruction objectives. :param parent: Another :class:`~timesead.data.transforms.Transform` which is used as the data source for this transform. :param replace_labels: Whether the original labels should be replaced by the reconstruction target. If `False`, the reconstruction target will be added to the tuple of original labels. .. py:attribute:: replace_labels :value: False .. py:class:: OneVsRestTargetTransform(parent: timesead.data.transforms.transform_base.Transform, normal_class: Optional[Any] = None, anomalous_class: Optional[Any] = None, replace_labels: bool = False) Bases: :py:obj:`timesead.data.transforms.transform_base.Transform` Transforms multi-class labels into binary labels for anomaly detection. "Normal" data points will have label 0, others will have label 1. :param parent: Another :class:`~timesead.data.transforms.Transform` which is used as the data source for this :class:`~timesead.data.transforms.Transform`. :param normal_class: The input class label that should be considered normal and will have label 0 in the output. :param anomalous_class: You can also specify an anomalous class that will have label 1. All other labels will be transformed to 0. Note that you cannot specify both `normal_class` and `anomalous_class`. :param replace_labels: Whether the original labels should be replaced by the :class:`~timesead.data.transforms.Transform`. If `False`, the additional labels will be added to the tuple of original labels. .. py:attribute:: replace_labels :value: False .. py:attribute:: normal_class :value: None .. py:attribute:: anomalous_class :value: None .. py:class:: PredictionTargetTransform(parent: timesead.data.transforms.transform_base.Transform, window_size: int, prediction_horizon: int, replace_labels: bool = False, step_size: int = 1, reverse: bool = False) Bases: :py:obj:`timesead.data.transforms.window_transform.WindowTransform` Adds the last `prediction_window` points from the current inputs as targets for prediction objectives. :param parent: Another :class:`~timesead.data.transforms.Transform` which is used as the data source for this :class:`~timesead.data.transforms.Transform`. :param prediction_horizon: Number of datapoints that should be predicted. :param replace_labels: Whether the original labels should be replaced by the prediction target. If `False`, the prediction target will be added to the tuple of original labels. .. py:attribute:: input_window_size .. py:attribute:: prediction_horizon .. py:attribute:: replace_labels :value: False .. py:property:: seq_len :type: Union[int, List[int]] This should return the length of each time series. If the time series have different lengths, the return value should be a list that contains the length of each sequence. If all sequences are of equal length, this should return an `int`. .. py:class:: OverlapPredictionTargetTransform(parent: timesead.data.transforms.transform_base.Transform, offset: int, replace_labels: bool = False) Bases: :py:obj:`timesead.data.transforms.transform_base.Transform` Adds the sequence shifted by offset as the target. :param parent: Another :class:`~timesead.data.transforms.Transform` which is used as the data source for this :class:`~timesead.data.transforms.Transform`. :param offset: Number of steps ahead that should be predicted. :param replace_labels: Whether the original labels should be replaced by the prediction target. If `False`, the prediction target will be added to the tuple of original labels. .. py:attribute:: offset .. py:attribute:: replace_labels :value: False .. py:property:: seq_len :type: Union[int, List[int]] This should return the length of each time series. If the time series have different lengths, the return value should be a list that contains the length of each sequence. If all sequences are of equal length, this should return an `int`. .. py:class:: WindowTransform(parent: timesead.data.transforms.transform_base.Transform, window_size: int, step_size: int = 1, reverse: bool = False) Bases: :py:obj:`timesead.data.transforms.transform_base.Transform` This :class:`~timesead.data.transforms.Transform` produces sliding windows from input sequences. Incomplete windows (that can appear if ``step_size>1``) will not be returned. :param parent: Another :class:`~timesead.data.transforms.Transform` which is used as the data source for this :class:`~timesead.data.transforms.Transform`. :param window_size: The size of each window. :param step_size: The step size at which the sliding window is moved along the sequence. :param reverse: If this is `True`, start the sliding window at the end of a sequence, instead of the start. Note that this will not reverse the order of sequences in the dataset and only applies within a single sequence. .. py:attribute:: window_size .. py:attribute:: step_size :value: 1 .. py:attribute:: reverse :value: False .. py:method:: inverse_transform_index(item) -> Tuple[int, int] .. py:method:: __len__() .. py:property:: seq_len .. py:class:: DatasetSource(dataset: timesead.data.dataset.BaseTSDataset, start: Union[int, List[int]] = None, end: Union[int, List[int]] = None, axis: str = 'batch') Bases: :py:obj:`timesead.data.transforms.transform_base.Transform` This acts as a source :class:`~timesead.data.transforms.Transform` (meaning it has no parent) that simply returns sequences from a given dataset. It can be constrained to return only a specific part of the data. :param dataset: The dataset from which to take points. :param start: Start index for this dataset. Please see below for a more detailed explanation. :param end: End index for this dataset (exclusive). Please see below for a more detailed explanation. :param axis: Can be either 'batch' or 'time'. In 'batch' mode, this simply returns only the sequences indexed from `start` to `end`. 'time' mode is used for datasets that contain only one long time series. That time series will be cut according to `start` and `end`. .. py:attribute:: dataset .. py:attribute:: axis :value: 'batch' .. py:method:: __len__() This should return the number of available sequences after the transformation. .. py:property:: seq_len This should return the length of each time series. If the time series have different lengths, the return value should be a list that contains the length of each sequence. If all sequences are of equal length, this should return an `int`. .. py:property:: num_features Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension. .. py:function:: make_dataset_split(dataset: timesead.data.dataset.BaseTSDataset, *splits: float, axis: str = 'batch') Create :class:`DatasetSource`\s for different parts of a given dataset. :param dataset: The dataset, for which the split should be done. :param splits: This should be the percentages of the dataset in each split. Will be normalized to 100%. :param axis: The axis along which to split the dataset. Please see :class:`DatasetSource` for a more detailed explanation. :return: This will return a generator that yields :class:`DatasetSource`\s according to the specified splits. .. py:class:: PipelineDataset(sink_transform: timesead.data.transforms.transform_base.Transform) Bases: :py:obj:`timesead.data.dataset.BaseTSDataset` Dataset that can be used with a :class:`torch.utils.data.DataLoader` and executes a pipeline of transforms to retrieve its datapoints. :param sink_transform: The last :class:`~timesead.data.transforms.Transform` in the pipeline that should be queried for data points. .. py:attribute:: sink_transform .. py:method:: __iter__() .. py:method:: __getitem__(item) -> Tuple[Tuple[torch.Tensor, Ellipsis], Tuple[torch.Tensor, Ellipsis]] Access the timeseries at position `index` and its corresponding label sequence. A call to this function should return a single time series that was sampled independently of the other time series in this dataset. :param index: The zero-based index of the time series to retrieve. :return: A tuple `(inputs, targets)`, where inputs is again a tuple of :class:`~torch.Tensor`\s with shape `(T, D*)`, where `D*` can very between the tensors. `targets` contains labels for the time series as tensors of shape `(T,)`. .. py:method:: __len__() This should return the number of independent time series in the dataset .. py:property:: seq_len :type: Union[int, List[int]] This should return the length of each time series. If the time series have different lengths, the return value should be a list that contains the length of each sequence. If all sequences are of equal length, this should return an int. .. py:property:: num_features :type: Union[int, Tuple[int, Ellipsis]] Number of features of each datapoint. This can also be a tuple if the data has more than one feature dimension. .. py:method:: get_default_pipeline() -> Dict[str, Dict[str, Any]] :staticmethod: Return the default pipeline for this dataset that is used if the user does not specify a different pipeline. This must be a dict of the form:: { '': {'class': '', 'args': {'', ...}}, ... } .. py:method:: get_feature_names() -> List[str] :staticmethod: :abstractmethod: Return names for the features in the order they are present in the data tensors. :return: A list of strings with names for each feature. .. py:method:: save(path: str, chunk_size: int = 0, batch_dim: int = 0) Save this dataset as it would be returned after all processing by its transforms is done. :param path: The folder in which to save the dataset. :param chunk_size: The maximum number of data points that should be saved in one file. If there are more data points than this value, multiple files will be created. Set this to 0 to save the entire dataset in one file. :param batch_dim: All (or `chunk_size`) datapoints will be stacked along this axis in a new tensor that is then saved to disk. .. py:function:: make_pipe_from_dict(pipeline: Dict[str, Dict[str, Any]], data_source: timesead.data.transforms.dataset_source.DatasetSource) -> PipelineDataset Instantiates a :class:`PipelineDataset` from a given :class:`~timesead.data.transforms.DatasetSource` and a pipeline specification. .. warning:: In case the specification of a :class:`~timesead.data.transforms.Transform` in the pipeline is incomplete or its instantiation fails for some other reason, this function simply prints a warning and continues with the next :class:`~timesead.data.transforms.Transform` instead of raising the exception. :param pipeline: Specification of the pipeline as a dict in the following format:: { '': {'class': '', 'args': {'': , ...}}, ... } The function respects the order of transforms specified in the dict. That is, the first transform specified in the dict will be the first transform added to the pipeline and so on. :param data_source: The :class:`~timesead.data.transforms.DatasetSource` that acts as a source transform for the pipeline. :return: A :class:`PipelineDataset` that retrieves data from the given :class:`DatasetSource` and then executes the specified pipeline.