timesead.evaluation =================== .. py:module:: timesead.evaluation .. autoapi-nested-parse:: This package contains functions for evaluating the performance of time-series anomaly detectors. At the moment, the :class:`Evaluator` class supports classic point-wise metrics such as :math:`F_1`-score, area under the precision recall curve, etc., and composite metrics derived from precision and recall for time series. Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/timesead/evaluation/evaluator/index /autoapi/timesead/evaluation/ts_precision_recall/index Classes ------- .. autoapisummary:: timesead.evaluation.Evaluator Functions --------- .. autoapisummary:: timesead.evaluation.ts_precision_and_recall timesead.evaluation.constant_bias_fn timesead.evaluation.back_bias_fn timesead.evaluation.front_bias_fn timesead.evaluation.middle_bias_fn timesead.evaluation.inverse_proportional_cardinality_fn Package Contents ---------------- .. py:class:: Evaluator A class that can compute several evaluation metrics for a dataset. Each method returns the score as a single float, but it can also return additional information in a dict. .. py:method:: auc(labels: torch.Tensor, scores: torch.Tensor) -> Tuple[float, Dict[str, Any]] Compute the classic point-wise area under the receiver operating characteristic curve. This will return a value between 0 and 1 where 1 indicates a perfect classifier. .. seealso:: Scikit-learns's :func:`~sklearn.metrics.roc_auc_score` function. :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal. :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an :class:`~timesead.models.common.AnomalyDetector`. :return: A tuple consisting of the AUC score and an empty dict. .. py:method:: f1_score(labels: torch.Tensor, scores: torch.Tensor, pos_label: int = 1) -> Tuple[float, Dict[str, Any]] Compute the classic point-wise F1 score. This will return a value between 0 and 1 where 1 indicates a perfect classifier. .. seealso:: Scikit-learn's :func:`~sklearn.metrics.f1_score` function. :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal. :param scores: A 1-D :class:`~torch.Tensor` containing binary predictions of whether a point is an anomaly or not. :param pos_label: Class to report. :return: A tuple consisting of the F1 score and an empty dict. .. py:method:: best_fbeta_score(labels: torch.Tensor, scores: torch.Tensor, beta: float) -> Tuple[float, Dict[str, Any]] Compute the classic point-wise :math:`F_{\beta}` score. This method will apply all possible thresholds to the values in ``scores`` and compute the :math:`F_{\beta}` score for the resulting binary predictions. It then returns the highest score. .. seealso:: Scikit-learn's :func:`~sklearn.metrics.fbeta_score` function. :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal. :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an :class:`~timesead.models.common.AnomalyDetector`. :param beta: Positive number that determines the trade-off between precision and recall when computing the F-score. :math:`\beta = 1` assigns equal weight to both while :math:`\beta < 1` emphasizes precision and vice versa. :return: A tuple consisting of the best :math:`F_{\beta}` score and a dict containing the threshold that produced the maximal score. .. py:method:: best_f1_score(labels: torch.Tensor, scores: torch.Tensor) -> Tuple[float, Dict[str, Any]] Compute the classic point-wise :math:`F_{1}` score. This method will apply all possible thresholds to the values in ``scores`` and compute the :math:`F_{1}` score for the resulting binary predictions. It then returns the highest score. .. seealso:: Scikit-learn's :func:`~sklearn.metrics.f1_score` function. :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal. :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an :class:`~timesead.models.common.AnomalyDetector`. :return: A tuple consisting of the best :math:`F_{1}` score and a dict containing the threshold that produced the maximal score. .. py:method:: auprc(labels: torch.Tensor, scores: torch.Tensor, integration: str = 'trapezoid') -> Tuple[float, Dict[str, Any]] Compute the classic point-wise area under the precision-recall curve. This will return a value between 0 and 1 where 1 indicates a perfect classifier. .. seealso:: Scikit-learn's :func:`~sklearn.metrics.average_precision` function. Scikit-learn's :func:`~sklearn.metrics.precision_recall_curve` function. :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal. :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an :class:`~timesead.models.common.AnomalyDetector`. :param integration: Method to use for computing the area under the curve. ``'riemann'`` corresponds to a simple Riemann sum, whereas ``'trapezoid'`` uses the trapezoidal rule. :return: A tuple consisting of the AuPRC score and an empty dict. .. py:method:: average_precision(labels: torch.Tensor, scores: torch.Tensor) -> Tuple[float, Dict[str, Any]] Compute the classic point-wise average precision score. .. note:: This is just a shorthand for :meth:`auprc` with ``integration='riemann'``. .. seealso:: Scikit-learn's :func:`~sklearn.metrics.average_precision` function. :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal. :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an :class:`~timesead.models.common.AnomalyDetector` :return: A tuple consisting of the average precision score and an empty dict. .. py:method:: ts_auprc(labels: torch.Tensor, scores: torch.Tensor, integration='trapezoid', weighted_precision: bool = True) -> Tuple[float, Dict[str, Any]] Compute the area under the precision-recall curve using precision and recall for time series [Tatbul2018]_. .. note:: This function uses the improved cardinality function described in [Wagner2023]_. :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal. :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an :class:`~timesead.models.common.AnomalyDetector` :param integration: Method to use for computing the area under the curve. ``'riemann'`` corresponds to a simple Riemann sum, whereas ``'trapezoid'`` uses the trapezoidal rule. :param weighted_precision: If ``True``, the precision score of a predicted window will be weighted with the length of the window in the final score. Otherwise, each window will have the same weight. :return: A tuple consisting of the AuPRC score and an empty dict. .. [Tatbul2018] N. Tatbul, T.J. Lee, S. Zdonik, M. Alam, J. Gottschlich. Precision and recall for time series. Advances in neural information processing systems. 2018;31. .. [Wagner2023] D. Wagner, T. Michels, F.C.F. Schulz, A. Nair, M. Rudolph, and M. Kloft. TimeSeAD: Benchmarking Deep Multivariate Time-Series Anomaly Detection. Transactions on Machine Learning Research (TMLR), (to appear) 2023. .. py:method:: ts_average_precision(labels: torch.Tensor, scores: torch.Tensor, weighted_precision: bool = True) -> Tuple[float, Dict[str, Any]] Compute the average precision score using precision and recall for time series [Tatbul2018]_. .. note:: This is just a shorthand for :meth:`ts_auprc` with ``integration='riemann'``. :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal. :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an :class:`~timesead.models.common.AnomalyDetector` :param weighted_precision: If ``True``, the precision score of a predicted window will be weighted with the length of the window in the final score. Otherwise, each window will have the same weight. :return: A tuple consisting of the average precision score and an empty dict. .. py:method:: ts_auprc_unweighted(labels: torch.Tensor, scores: torch.Tensor) -> Tuple[float, Dict[str, Any]] Compute the area under the precision-recall curve using precision and recall for time series [Tatbul2018]_. .. note:: This is just a shorthand for :meth:`ts_auprc` with ``integration='riemann'`` and ``weighted_precision=False``. :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal. :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an :class:`~timesead.models.common.AnomalyDetector` :return: A tuple consisting of the AuPRC score and an empty dict. .. py:method:: best_ts_fbeta_score(labels: torch.Tensor, scores: torch.Tensor, beta: float) -> Tuple[float, Dict[str, Any]] Compute the :math:`F_{\beta}` score using precision and recall for time series [Tatbul2018]_. This method will apply all possible thresholds to the values in ``scores`` and compute the :math:`F_{\beta}` score for the resulting binary predictions. It then returns the highest score. .. note:: This function uses the improved cardinality function and weighted precision as described in [Wagner2023]_. :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal. :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an :class:`~timesead.models.common.AnomalyDetector` :param beta: Positive number that determines the trade-off between precision and recall when computing the F-score. :math:`\beta = 1` assigns equal weight to both while :math:`\beta < 1` emphasizes precision and vice versa. :return: A tuple consisting of the best :math:`F_{\beta}` score and a dict containing the threshold, recall and precision that produced the maximal score. .. py:method:: best_ts_fbeta_score_classic(labels: torch.Tensor, scores: torch.Tensor, beta: float) -> Tuple[float, Dict[str, Any]] Compute the :math:`F_{\beta}` score using precision and recall for time series [Tatbul2018]_. This method will apply all possible thresholds to the values in ``scores`` and compute the :math:`F_{\beta}` score for the resulting binary predictions. It then returns the highest score. .. note:: This function uses the default cardinality function (:math:`\frac[1}{x}`) and unweighted precision, i.e., the default parameters described in [Tatbul2018]_. :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal. :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an :class:`~timesead.models.common.AnomalyDetector` :param beta: Positive number that determines the trade-off between precision and recall when computing the F-score. :math:`\beta = 1` assigns equal weight to both while :math:`\beta < 1` emphasizes precision and vice versa. :return: A tuple consisting of the best :math:`F_{\beta}` score and a dict containing the threshold, recall and precision that produced the maximal score. .. py:method:: best_ts_f1_score(labels: torch.Tensor, scores: torch.Tensor) -> Tuple[float, Dict[str, Any]] Compute the :math:`F_{1}` score using precision and recall for time series [Tatbul2018]_. This method will apply all possible thresholds to the values in ``scores`` and compute the :math:`F_{1}` score for the resulting binary predictions. It then returns the highest score. .. note:: This function uses the improved cardinality function and weighted precision as described in [Wagner2023]_. :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal. :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an :class:`~timesead.models.common.AnomalyDetector` :return: A tuple consisting of the best :math:`F_{1}` score and a dict containing the threshold, recall and precision that produced the maximal score. .. py:method:: best_ts_f1_score_classic(labels: torch.Tensor, scores: torch.Tensor) -> Tuple[float, Dict[str, Any]] Compute the :math:`F_{1}` score using precision and recall for time series [Tatbul2018]_. This method will apply all possible thresholds to the values in ``scores`` and compute the :math:`F_{1}` score for the resulting binary predictions. It then returns the highest score. .. note:: This function uses the default cardinality function (:math:`\frac[1}{x}`) and unweighted precision, i.e., the default parameters described in [Tatbul2018]_. :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal. :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an :class:`~timesead.models.common.AnomalyDetector` :return: A tuple consisting of the best :math:`F_{1}` score and a dict containing the threshold, recall and precision that produced the maximal score. .. py:function:: ts_precision_and_recall(anomalies: torch.Tensor, predictions: torch.Tensor, alpha: float = 0, recall_bias_fn: Callable[[torch.Tensor], float] = constant_bias_fn, recall_cardinality_fn: Callable[[int], float] = inverse_proportional_cardinality_fn, precision_bias_fn: Optional[Callable] = None, precision_cardinality_fn: Optional[Callable] = None, anomaly_ranges: Optional[List[Tuple[int, int]]] = None, prediction_ranges: Optional[List[Tuple[int, int]]] = None, weighted_precision: bool = False) -> Tuple[float, float] Computes precision and recall for time series as defined in [Tatbul2018]_. .. note:: The default parameters for this function correspond to the defaults recommended in [Tatbul2018]_. However, those might not be desirable in most cases, please see [Wagner2023]_ for a detailed discussion. :param anomalies: Binary 1-D :class:`~torch.Tensor` of shape ``(length,)`` containing the true labels. :param predictions: Binary 1-D :class:`~torch.Tensor` of shape ``(length,)`` containing the predicted labels. :param alpha: Weight for existence term in recall. :param recall_bias_fn: Function that computes the bias term for a given ground-truth window. :param recall_cardinality_fn: Function that compute the cardinality factor for a given ground-truth window. :param precision_bias_fn: Function that computes the bias term for a given predicted window. If ``None``, this will be the same as ``recall_bias_function``. :param precision_cardinality_fn: Function that computes the cardinality factor for a given predicted window. If ``None``, this will be the same as ``recall_cardinality_function``. :param weighted_precision: If True, the precision score of a predicted window will be weighted with the length of the window in the final score. Otherwise, each window will have the same weight. :param anomaly_ranges: A list of tuples ``(start, end)`` for each anomaly window in ``anomalies``, where ``start`` is the index at which the window starts and ``end`` is the first index after the end of the window. This can be ``None``, in which case the list is computed automatically from ``anomalies``. :param prediction_ranges: A list of tuples ``(start, end)`` for each anomaly window in ``predictions``, where ``start`` is the index at which the window starts and ``end`` is the first index after the end of the window. This can be ``None``, in which case the list is computed automatically from ``predictions``. :return: A tuple consisting of the time-series precision and recall for the given labels. .. py:function:: constant_bias_fn(inputs: torch.Tensor) -> float Compute the overlap size for a constant bias function that assigns the same weight to all positions. This functions computes .. math:: \omega(\text{inputs}) = \frac{1}{n} \sum_{i = 1}^{n} \text{inputs}_i, where :math:`n = \lvert \text{inputs} \rvert`. .. note:: To improve the runtime of our algorithm, we calculate the overlap :math:`\omega` directly as part of the bias function. :param inputs: A 1-D :class:`~torch.Tensor` containing the predictions inside a ground-truth window. :return: The overlap :math:`\omega`. .. py:function:: back_bias_fn(inputs: torch.Tensor) -> float Compute the overlap size for a bias function that assigns the more weight to predictions towards the back of a ground-truth anomaly window. This functions computes .. math:: \omega(\text{inputs}) = \frac{2}{n * (n + 1)} \sum_{i = 1}^{n} \text{inputs}_i \cdot i, where :math:`n = \lvert \text{inputs} \rvert`. .. note:: To improve the runtime of our algorithm, we calculate the overlap :math:`\omega` directly as part of the bias function. :param inputs: A 1-D :class:`~torch.Tensor` containing the predictions inside a ground-truth window. :return: The overlap :math:`\omega`. .. py:function:: front_bias_fn(inputs: torch.Tensor) -> float Compute the overlap size for a bias function that assigns the more weight to predictions towards the front of a ground-truth anomaly window. This functions computes .. math:: \omega(\text{inputs}) = \frac{2}{n * (n + 1)} \sum_{i = 1}^{n} \text{inputs}_i \cdot (n + 1 - i), where :math:`n = \lvert \text{inputs} \rvert`. .. note:: To improve the runtime of our algorithm, we calculate the overlap :math:`\omega` directly as part of the bias function. :param inputs: A 1-D :class:`~torch.Tensor` containing the predictions inside a ground-truth window. :return: The overlap :math:`\omega`. .. py:function:: middle_bias_fn(inputs: torch.Tensor) -> float Compute the overlap size for a bias function that assigns the more weight to predictions in the middle of a ground-truth anomaly window. This functions computes .. math:: \omega(\text{inputs}) = \frac{2}{m * (m + 1) + (n - m) * (n - m + 1)} \sum_{i = 1}^{n} \text{inputs}_i \cdot \begin{cases} i & \text{if } i \leq m\\ (n + 1 - i) & \text{otherwise} \end{cases}, where :math:`n = \lvert \text{inputs} \rvert` and :math:`m = \lceil \frac{n}{2} \rceil`. .. note:: To improve the runtime of our algorithm, we calculate the overlap :math:`\omega` directly as part of the bias function. :param inputs: A 1-D :class:`~torch.Tensor` containing the predictions inside a ground-truth window. :return: The overlap :math:`\omega`. .. py:function:: inverse_proportional_cardinality_fn(cardinality: int, gt_length: int) -> float Cardinality function that assigns an inversely proportional weight to predictions within a single ground-truth window. This is the default cardinality function recommended in [Tatbul2018]_. .. note:: This function leads to a metric that is not recall-consistent! Please see [Wagner2023]_ for more details. :param cardinality: Number of predicted windows that overlap the ground-truth window in question. :param gt_length: Length of the ground-truth window (unused). :return: The cardinality factor :math:`\frac{1}{\text{cardinality}}`. .. [Tatbul2018] N. Tatbul, T.J. Lee, S. Zdonik, M. Alam, J. Gottschlich. Precision and recall for time series. Advances in neural information processing systems. 2018;31. .. [Wagner2023] D. Wagner, T. Michels, F.C.F. Schulz, A. Nair, M. Rudolph, and M. Kloft. TimeSeAD: Benchmarking Deep Multivariate Time-Series Anomaly Detection. Transactions on Machine Learning Research (TMLR), (to appear) 2023.