timesead.evaluation
===================

.. py:module:: timesead.evaluation

.. autoapi-nested-parse::

   This package contains functions for evaluating the performance of time-series anomaly detectors.

   At the moment, the :class:`Evaluator` class supports classic point-wise metrics such as :math:`F_1`-score, area under
   the precision recall curve, etc., and composite metrics derived from precision and recall for time series.


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/timesead/evaluation/evaluator/index
   /autoapi/timesead/evaluation/ts_precision_recall/index


Classes
-------

.. autoapisummary::

   timesead.evaluation.Evaluator


Functions
---------

.. autoapisummary::

   timesead.evaluation.ts_precision_and_recall
   timesead.evaluation.constant_bias_fn
   timesead.evaluation.back_bias_fn
   timesead.evaluation.front_bias_fn
   timesead.evaluation.middle_bias_fn
   timesead.evaluation.inverse_proportional_cardinality_fn


Package Contents
----------------

.. py:class:: Evaluator

   A class that can compute several evaluation metrics for a dataset. Each method returns the score as a single float,
   but it can also return additional information in a dict.


   .. py:method:: auc(labels: torch.Tensor, scores: torch.Tensor) -> Tuple[float, Dict[str, Any]]

      Compute the classic point-wise area under the receiver operating characteristic curve.

      This will return a value between 0 and 1 where 1 indicates a perfect classifier.

      .. seealso::
          Scikit-learns's :func:`~sklearn.metrics.roc_auc_score` function.

      :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly,
          0 means that the point is normal.
      :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an
          :class:`~timesead.models.common.AnomalyDetector`.
      :return: A tuple consisting of the AUC score and an empty dict.


   .. py:method:: f1_score(labels: torch.Tensor, scores: torch.Tensor, pos_label: int = 1) -> Tuple[float, Dict[str, Any]]

      Compute the classic point-wise F1 score.

      This will return a value between 0 and 1 where 1 indicates a perfect classifier.

      .. seealso::
          Scikit-learn's :func:`~sklearn.metrics.f1_score` function.

      :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly,
          0 means that the point is normal.
      :param scores: A 1-D :class:`~torch.Tensor` containing binary predictions of whether a point is an anomaly or
          not.
      :param pos_label: Class to report.
      :return: A tuple consisting of the F1 score and an empty dict.


   .. py:method:: best_fbeta_score(labels: torch.Tensor, scores: torch.Tensor, beta: float) -> Tuple[float, Dict[str, Any]]

      Compute the classic point-wise :math:`F_{\beta}` score.

      This method will apply all possible thresholds to the values in ``scores`` and compute the :math:`F_{\beta}`
      score for the resulting binary predictions. It then returns the highest score.

      .. seealso::
          Scikit-learn's :func:`~sklearn.metrics.fbeta_score` function.

      :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly,
          0 means that the point is normal.
      :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an
          :class:`~timesead.models.common.AnomalyDetector`.
      :param beta: Positive number that determines the trade-off between precision and recall when computing the
          F-score. :math:`\beta = 1` assigns equal weight to both while :math:`\beta < 1` emphasizes precision and
          vice versa.
      :return: A tuple consisting of the best :math:`F_{\beta}` score and a dict containing the threshold that
          produced the maximal score.


   .. py:method:: best_f1_score(labels: torch.Tensor, scores: torch.Tensor) -> Tuple[float, Dict[str, Any]]

      Compute the classic point-wise :math:`F_{1}` score.

      This method will apply all possible thresholds to the values in ``scores`` and compute the :math:`F_{1}`
      score for the resulting binary predictions. It then returns the highest score.

      .. seealso::
          Scikit-learn's :func:`~sklearn.metrics.f1_score` function.

      :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly,
          0 means that the point is normal.
      :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an
          :class:`~timesead.models.common.AnomalyDetector`.
      :return: A tuple consisting of the best :math:`F_{1}` score and a dict containing the threshold that
          produced the maximal score.


   .. py:method:: auprc(labels: torch.Tensor, scores: torch.Tensor, integration: str = 'trapezoid') -> Tuple[float, Dict[str, Any]]

      Compute the classic point-wise area under the precision-recall curve.

      This will return a value between 0 and 1 where 1 indicates a perfect classifier.

      .. seealso::
          Scikit-learn's :func:`~sklearn.metrics.average_precision` function.

          Scikit-learn's :func:`~sklearn.metrics.precision_recall_curve` function.

      :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly,
          0 means that the point is normal.
      :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an
          :class:`~timesead.models.common.AnomalyDetector`.
      :param integration: Method to use for computing the area under the curve. ``'riemann'`` corresponds to a simple
          Riemann sum, whereas ``'trapezoid'`` uses the trapezoidal rule.
      :return: A tuple consisting of the AuPRC score and an empty dict.


   .. py:method:: average_precision(labels: torch.Tensor, scores: torch.Tensor) -> Tuple[float, Dict[str, Any]]

      Compute the classic point-wise average precision score.

      .. note::
          This is just a shorthand for :meth:`auprc` with ``integration='riemann'``.

      .. seealso::
          Scikit-learn's :func:`~sklearn.metrics.average_precision` function.

      :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly,
          0 means that the point is normal.
      :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an
          :class:`~timesead.models.common.AnomalyDetector`
      :return: A tuple consisting of the average precision score and an empty dict.


   .. py:method:: ts_auprc(labels: torch.Tensor, scores: torch.Tensor, integration='trapezoid', weighted_precision: bool = True) -> Tuple[float, Dict[str, Any]]

      Compute the area under the precision-recall curve using precision and recall for time series [Tatbul2018]_.

      .. note::
          This function uses the improved cardinality function described in [Wagner2023]_.

      :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly,
          0 means that the point is normal.
      :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an
          :class:`~timesead.models.common.AnomalyDetector`
      :param integration: Method to use for computing the area under the curve. ``'riemann'`` corresponds to a simple
          Riemann sum, whereas ``'trapezoid'`` uses the trapezoidal rule.
      :param weighted_precision: If ``True``, the precision score of a predicted window will be weighted with the
          length of the window in the final score. Otherwise, each window will have the same weight.
      :return: A tuple consisting of the AuPRC score and an empty dict.

      .. [Tatbul2018] N. Tatbul, T.J. Lee, S. Zdonik, M. Alam, J. Gottschlich.
          Precision and recall for time series. Advances in neural information processing systems. 2018;31.
      .. [Wagner2023] D. Wagner, T. Michels, F.C.F. Schulz, A. Nair, M. Rudolph, and M. Kloft.
          TimeSeAD: Benchmarking Deep Multivariate Time-Series Anomaly Detection.
          Transactions on Machine Learning Research (TMLR), (to appear) 2023.


   .. py:method:: ts_average_precision(labels: torch.Tensor, scores: torch.Tensor, weighted_precision: bool = True) -> Tuple[float, Dict[str, Any]]

      Compute the average precision score using precision and recall for time series [Tatbul2018]_.

      .. note::
          This is just a shorthand for :meth:`ts_auprc` with ``integration='riemann'``.

      :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly,
          0 means that the point is normal.
      :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an
          :class:`~timesead.models.common.AnomalyDetector`
      :param weighted_precision: If ``True``, the precision score of a predicted window will be weighted with the
          length of the window in the final score. Otherwise, each window will have the same weight.
      :return: A tuple consisting of the average precision score and an empty dict.


   .. py:method:: ts_auprc_unweighted(labels: torch.Tensor, scores: torch.Tensor) -> Tuple[float, Dict[str, Any]]

      Compute the area under the precision-recall curve using precision and recall for time series [Tatbul2018]_.

      .. note::
          This is just a shorthand for :meth:`ts_auprc` with ``integration='riemann'`` and
          ``weighted_precision=False``.

      :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly,
          0 means that the point is normal.
      :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an
          :class:`~timesead.models.common.AnomalyDetector`
      :return: A tuple consisting of the AuPRC score and an empty dict.


   .. py:method:: best_ts_fbeta_score(labels: torch.Tensor, scores: torch.Tensor, beta: float) -> Tuple[float, Dict[str, Any]]

      Compute the :math:`F_{\beta}` score using precision and recall for time series [Tatbul2018]_.

      This method will apply all possible thresholds to the values in ``scores`` and compute the :math:`F_{\beta}`
      score for the resulting binary predictions. It then returns the highest score.

      .. note::
          This function uses the improved cardinality function and weighted precision as described in [Wagner2023]_.

      :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly,
          0 means that the point is normal.
      :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an
          :class:`~timesead.models.common.AnomalyDetector`
      :param beta: Positive number that determines the trade-off between precision and recall when computing the
          F-score. :math:`\beta = 1` assigns equal weight to both while :math:`\beta < 1` emphasizes precision and
          vice versa.
      :return: A tuple consisting of the best :math:`F_{\beta}` score and a dict containing the threshold, recall and
          precision that produced the maximal score.


   .. py:method:: best_ts_fbeta_score_classic(labels: torch.Tensor, scores: torch.Tensor, beta: float) -> Tuple[float, Dict[str, Any]]

      Compute the :math:`F_{\beta}` score using precision and recall for time series [Tatbul2018]_.

      This method will apply all possible thresholds to the values in ``scores`` and compute the :math:`F_{\beta}`
      score for the resulting binary predictions. It then returns the highest score.

      .. note::
          This function uses the default cardinality function (:math:`\frac[1}{x}`) and unweighted precision, i.e.,
          the default parameters described in [Tatbul2018]_.

      :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly,
          0 means that the point is normal.
      :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an
          :class:`~timesead.models.common.AnomalyDetector`
      :param beta: Positive number that determines the trade-off between precision and recall when computing the
          F-score. :math:`\beta = 1` assigns equal weight to both while :math:`\beta < 1` emphasizes precision and
          vice versa.
      :return: A tuple consisting of the best :math:`F_{\beta}` score and a dict containing the threshold, recall and
          precision that produced the maximal score.


   .. py:method:: best_ts_f1_score(labels: torch.Tensor, scores: torch.Tensor) -> Tuple[float, Dict[str, Any]]

      Compute the :math:`F_{1}` score using precision and recall for time series [Tatbul2018]_.

      This method will apply all possible thresholds to the values in ``scores`` and compute the :math:`F_{1}`
      score for the resulting binary predictions. It then returns the highest score.

      .. note::
          This function uses the improved cardinality function and weighted precision as described in [Wagner2023]_.

      :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly,
          0 means that the point is normal.
      :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an
          :class:`~timesead.models.common.AnomalyDetector`
      :return: A tuple consisting of the best :math:`F_{1}` score and a dict containing the threshold, recall and
          precision that produced the maximal score.


   .. py:method:: best_ts_f1_score_classic(labels: torch.Tensor, scores: torch.Tensor) -> Tuple[float, Dict[str, Any]]

      Compute the :math:`F_{1}` score using precision and recall for time series [Tatbul2018]_.

      This method will apply all possible thresholds to the values in ``scores`` and compute the :math:`F_{1}`
      score for the resulting binary predictions. It then returns the highest score.

      .. note::
          This function uses the default cardinality function (:math:`\frac[1}{x}`) and unweighted precision, i.e.,
          the default parameters described in [Tatbul2018]_.

      :param labels: A 1-D :class:`~torch.Tensor` containing the ground-truth labels. 1 corresponds to an anomaly,
          0 means that the point is normal.
      :param scores: A 1-D :class:`~torch.Tensor` containing the scores returned by an
          :class:`~timesead.models.common.AnomalyDetector`
      :return: A tuple consisting of the best :math:`F_{1}` score and a dict containing the threshold, recall and
          precision that produced the maximal score.


.. py:function:: ts_precision_and_recall(anomalies: torch.Tensor, predictions: torch.Tensor, alpha: float = 0, recall_bias_fn: Callable[[torch.Tensor], float] = constant_bias_fn, recall_cardinality_fn: Callable[[int], float] = inverse_proportional_cardinality_fn, precision_bias_fn: Optional[Callable] = None, precision_cardinality_fn: Optional[Callable] = None, anomaly_ranges: Optional[List[Tuple[int, int]]] = None, prediction_ranges: Optional[List[Tuple[int, int]]] = None, weighted_precision: bool = False) -> Tuple[float, float]

   Computes precision and recall for time series as defined in [Tatbul2018]_.

   .. note::
      The default parameters for this function correspond to the defaults recommended in [Tatbul2018]_. However,
      those might not be desirable in most cases, please see [Wagner2023]_ for a detailed discussion.

   :param anomalies: Binary 1-D :class:`~torch.Tensor` of shape ``(length,)`` containing the true labels.
   :param predictions: Binary 1-D :class:`~torch.Tensor` of shape ``(length,)`` containing the predicted labels.
   :param alpha: Weight for existence term in recall.
   :param recall_bias_fn: Function that computes the bias term for a given ground-truth window.
   :param recall_cardinality_fn: Function that compute the cardinality factor for a given ground-truth window.
   :param precision_bias_fn: Function that computes the bias term for a given predicted window.
       If ``None``, this will be the same as ``recall_bias_function``.
   :param precision_cardinality_fn: Function that computes the cardinality factor for a given predicted window.
       If ``None``, this will be the same as ``recall_cardinality_function``.
   :param weighted_precision: If True, the precision score of a predicted window will be weighted with the
       length of the window in the final score. Otherwise, each window will have the same weight.
   :param anomaly_ranges: A list of tuples ``(start, end)`` for each anomaly window in ``anomalies``, where ``start``
       is the index at which the window starts and ``end`` is the first index after the end of the window. This can
       be ``None``, in which case the list is computed automatically from ``anomalies``.
   :param prediction_ranges: A list of tuples ``(start, end)`` for each anomaly window in ``predictions``, where
       ``start`` is the index at which the window starts and ``end`` is the first index after the end of the window.
       This can be ``None``, in which case the list is computed automatically from ``predictions``.
   :return: A tuple consisting of the time-series precision and recall for the given labels.


.. py:function:: constant_bias_fn(inputs: torch.Tensor) -> float

   Compute the overlap size for a constant bias function that assigns the same weight to all positions.

   This functions computes

   .. math::
       \omega(\text{inputs}) = \frac{1}{n} \sum_{i = 1}^{n} \text{inputs}_i,

   where :math:`n = \lvert \text{inputs} \rvert`.

   .. note::
      To improve the runtime of our algorithm, we calculate the overlap :math:`\omega` directly as part of the bias
      function.

   :param inputs: A 1-D :class:`~torch.Tensor` containing the predictions inside a ground-truth window.
   :return: The overlap :math:`\omega`.


.. py:function:: back_bias_fn(inputs: torch.Tensor) -> float

   Compute the overlap size for a bias function that assigns the more weight to predictions towards the back of a
   ground-truth anomaly window.

   This functions computes

   .. math::
       \omega(\text{inputs}) = \frac{2}{n * (n + 1)} \sum_{i = 1}^{n} \text{inputs}_i \cdot i,

   where :math:`n = \lvert \text{inputs} \rvert`.

   .. note::
      To improve the runtime of our algorithm, we calculate the overlap :math:`\omega` directly as part of the bias
      function.

   :param inputs: A 1-D :class:`~torch.Tensor` containing the predictions inside a ground-truth window.
   :return: The overlap :math:`\omega`.


.. py:function:: front_bias_fn(inputs: torch.Tensor) -> float

   Compute the overlap size for a bias function that assigns the more weight to predictions towards the front of a
   ground-truth anomaly window.

   This functions computes

   .. math::
       \omega(\text{inputs}) = \frac{2}{n * (n + 1)} \sum_{i = 1}^{n} \text{inputs}_i \cdot (n + 1 - i),

   where :math:`n = \lvert \text{inputs} \rvert`.

   .. note::
      To improve the runtime of our algorithm, we calculate the overlap :math:`\omega` directly as part of the bias
      function.

   :param inputs: A 1-D :class:`~torch.Tensor` containing the predictions inside a ground-truth window.
   :return: The overlap :math:`\omega`.


.. py:function:: middle_bias_fn(inputs: torch.Tensor) -> float

   Compute the overlap size for a bias function that assigns the more weight to predictions in the middle of a
   ground-truth anomaly window.

   This functions computes

   .. math::
       \omega(\text{inputs}) = \frac{2}{m * (m + 1) + (n - m) * (n - m + 1)} \sum_{i = 1}^{n} \text{inputs}_i \cdot
       \begin{cases}
           i & \text{if } i \leq m\\
           (n + 1 - i) & \text{otherwise}
       \end{cases},

   where :math:`n = \lvert \text{inputs} \rvert` and :math:`m = \lceil \frac{n}{2} \rceil`.

   .. note::
      To improve the runtime of our algorithm, we calculate the overlap :math:`\omega` directly as part of the bias
      function.

   :param inputs: A 1-D :class:`~torch.Tensor` containing the predictions inside a ground-truth window.
   :return: The overlap :math:`\omega`.


.. py:function:: inverse_proportional_cardinality_fn(cardinality: int, gt_length: int) -> float

   Cardinality function that assigns an inversely proportional weight to predictions within a single ground-truth
   window.

   This is the default cardinality function recommended in [Tatbul2018]_.

   .. note::
      This function leads to a metric that is not recall-consistent! Please see [Wagner2023]_ for more details.

   :param cardinality: Number of predicted windows that overlap the ground-truth window in question.
   :param gt_length: Length of the ground-truth window (unused).
   :return: The cardinality factor :math:`\frac{1}{\text{cardinality}}`.

   .. [Tatbul2018] N. Tatbul, T.J. Lee, S. Zdonik, M. Alam, J. Gottschlich.
       Precision and recall for time series. Advances in neural information processing systems. 2018;31.
   .. [Wagner2023] D. Wagner, T. Michels, F.C.F. Schulz, A. Nair, M. Rudolph, and M. Kloft.
       TimeSeAD: Benchmarking Deep Multivariate Time-Series Anomaly Detection.
       Transactions on Machine Learning Research (TMLR), (to appear) 2023.