timesead.evaluation
This package contains functions for evaluating the performance of time-series anomaly detectors.
At the moment, the Evaluator class supports classic point-wise metrics such as \(F_1\)-score, area under
the precision recall curve, etc., and composite metrics derived from precision and recall for time series.
Submodules
Classes
A class that can compute several evaluation metrics for a dataset. Each method returns the score as a single float, |
Functions
|
Computes precision and recall for time series as defined in [Tatbul2018]. |
|
Compute the overlap size for a constant bias function that assigns the same weight to all positions. |
|
Compute the overlap size for a bias function that assigns the more weight to predictions towards the back of a |
|
Compute the overlap size for a bias function that assigns the more weight to predictions towards the front of a |
|
Compute the overlap size for a bias function that assigns the more weight to predictions in the middle of a |
|
Cardinality function that assigns an inversely proportional weight to predictions within a single ground-truth |
Package Contents
- class timesead.evaluation.Evaluator
A class that can compute several evaluation metrics for a dataset. Each method returns the score as a single float, but it can also return additional information in a dict.
- auc(labels: torch.Tensor, scores: torch.Tensor) Tuple[float, Dict[str, Any]]
Compute the classic point-wise area under the receiver operating characteristic curve.
This will return a value between 0 and 1 where 1 indicates a perfect classifier.
See also
Scikit-learns’s
roc_auc_score()function.- Parameters:
labels (torch.Tensor) – A 1-D
Tensorcontaining the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.scores (torch.Tensor) – A 1-D
Tensorcontaining the scores returned by anAnomalyDetector.
- Returns:
A tuple consisting of the AUC score and an empty dict.
- Return type:
- f1_score(labels: torch.Tensor, scores: torch.Tensor, pos_label: int = 1) Tuple[float, Dict[str, Any]]
Compute the classic point-wise F1 score.
This will return a value between 0 and 1 where 1 indicates a perfect classifier.
See also
Scikit-learn’s
f1_score()function.- Parameters:
labels (torch.Tensor) – A 1-D
Tensorcontaining the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.scores (torch.Tensor) – A 1-D
Tensorcontaining binary predictions of whether a point is an anomaly or not.pos_label (int) – Class to report.
- Returns:
A tuple consisting of the F1 score and an empty dict.
- Return type:
- best_fbeta_score(labels: torch.Tensor, scores: torch.Tensor, beta: float) Tuple[float, Dict[str, Any]]
Compute the classic point-wise \(F_{\beta}\) score.
This method will apply all possible thresholds to the values in
scoresand compute the \(F_{\beta}\) score for the resulting binary predictions. It then returns the highest score.See also
Scikit-learn’s
fbeta_score()function.- Parameters:
labels (torch.Tensor) – A 1-D
Tensorcontaining the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.scores (torch.Tensor) – A 1-D
Tensorcontaining the scores returned by anAnomalyDetector.beta (float) – Positive number that determines the trade-off between precision and recall when computing the F-score. \(\beta = 1\) assigns equal weight to both while \(\beta < 1\) emphasizes precision and vice versa.
- Returns:
A tuple consisting of the best \(F_{\beta}\) score and a dict containing the threshold that produced the maximal score.
- Return type:
- best_f1_score(labels: torch.Tensor, scores: torch.Tensor) Tuple[float, Dict[str, Any]]
Compute the classic point-wise \(F_{1}\) score.
This method will apply all possible thresholds to the values in
scoresand compute the \(F_{1}\) score for the resulting binary predictions. It then returns the highest score.See also
Scikit-learn’s
f1_score()function.- Parameters:
labels (torch.Tensor) – A 1-D
Tensorcontaining the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.scores (torch.Tensor) – A 1-D
Tensorcontaining the scores returned by anAnomalyDetector.
- Returns:
A tuple consisting of the best \(F_{1}\) score and a dict containing the threshold that produced the maximal score.
- Return type:
- auprc(labels: torch.Tensor, scores: torch.Tensor, integration: str = 'trapezoid') Tuple[float, Dict[str, Any]]
Compute the classic point-wise area under the precision-recall curve.
This will return a value between 0 and 1 where 1 indicates a perfect classifier.
See also
Scikit-learn’s
average_precision()function.Scikit-learn’s
precision_recall_curve()function.- Parameters:
labels (torch.Tensor) – A 1-D
Tensorcontaining the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.scores (torch.Tensor) – A 1-D
Tensorcontaining the scores returned by anAnomalyDetector.integration (str) – Method to use for computing the area under the curve.
'riemann'corresponds to a simple Riemann sum, whereas'trapezoid'uses the trapezoidal rule.
- Returns:
A tuple consisting of the AuPRC score and an empty dict.
- Return type:
- average_precision(labels: torch.Tensor, scores: torch.Tensor) Tuple[float, Dict[str, Any]]
Compute the classic point-wise average precision score.
Note
This is just a shorthand for
auprc()withintegration='riemann'.See also
Scikit-learn’s
average_precision()function.- Parameters:
labels (torch.Tensor) – A 1-D
Tensorcontaining the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.scores (torch.Tensor) – A 1-D
Tensorcontaining the scores returned by anAnomalyDetector
- Returns:
A tuple consisting of the average precision score and an empty dict.
- Return type:
- ts_auprc(labels: torch.Tensor, scores: torch.Tensor, integration='trapezoid', weighted_precision: bool = True) Tuple[float, Dict[str, Any]]
Compute the area under the precision-recall curve using precision and recall for time series [Tatbul2018].
Note
This function uses the improved cardinality function described in [Wagner2023].
- Parameters:
labels (torch.Tensor) – A 1-D
Tensorcontaining the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.scores (torch.Tensor) – A 1-D
Tensorcontaining the scores returned by anAnomalyDetectorintegration – Method to use for computing the area under the curve.
'riemann'corresponds to a simple Riemann sum, whereas'trapezoid'uses the trapezoidal rule.weighted_precision (bool) – If
True, the precision score of a predicted window will be weighted with the length of the window in the final score. Otherwise, each window will have the same weight.
- Returns:
A tuple consisting of the AuPRC score and an empty dict.
- Return type:
[Tatbul2018]N. Tatbul, T.J. Lee, S. Zdonik, M. Alam, J. Gottschlich. Precision and recall for time series. Advances in neural information processing systems. 2018;31.
[Wagner2023]D. Wagner, T. Michels, F.C.F. Schulz, A. Nair, M. Rudolph, and M. Kloft. TimeSeAD: Benchmarking Deep Multivariate Time-Series Anomaly Detection. Transactions on Machine Learning Research (TMLR), (to appear) 2023.
- ts_average_precision(labels: torch.Tensor, scores: torch.Tensor, weighted_precision: bool = True) Tuple[float, Dict[str, Any]]
Compute the average precision score using precision and recall for time series [Tatbul2018].
Note
This is just a shorthand for
ts_auprc()withintegration='riemann'.- Parameters:
labels (torch.Tensor) – A 1-D
Tensorcontaining the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.scores (torch.Tensor) – A 1-D
Tensorcontaining the scores returned by anAnomalyDetectorweighted_precision (bool) – If
True, the precision score of a predicted window will be weighted with the length of the window in the final score. Otherwise, each window will have the same weight.
- Returns:
A tuple consisting of the average precision score and an empty dict.
- Return type:
- ts_auprc_unweighted(labels: torch.Tensor, scores: torch.Tensor) Tuple[float, Dict[str, Any]]
Compute the area under the precision-recall curve using precision and recall for time series [Tatbul2018].
Note
This is just a shorthand for
ts_auprc()withintegration='riemann'andweighted_precision=False.- Parameters:
labels (torch.Tensor) – A 1-D
Tensorcontaining the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.scores (torch.Tensor) – A 1-D
Tensorcontaining the scores returned by anAnomalyDetector
- Returns:
A tuple consisting of the AuPRC score and an empty dict.
- Return type:
- best_ts_fbeta_score(labels: torch.Tensor, scores: torch.Tensor, beta: float) Tuple[float, Dict[str, Any]]
Compute the \(F_{\beta}\) score using precision and recall for time series [Tatbul2018].
This method will apply all possible thresholds to the values in
scoresand compute the \(F_{\beta}\) score for the resulting binary predictions. It then returns the highest score.Note
This function uses the improved cardinality function and weighted precision as described in [Wagner2023].
- Parameters:
labels (torch.Tensor) – A 1-D
Tensorcontaining the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.scores (torch.Tensor) – A 1-D
Tensorcontaining the scores returned by anAnomalyDetectorbeta (float) – Positive number that determines the trade-off between precision and recall when computing the F-score. \(\beta = 1\) assigns equal weight to both while \(\beta < 1\) emphasizes precision and vice versa.
- Returns:
A tuple consisting of the best \(F_{\beta}\) score and a dict containing the threshold, recall and precision that produced the maximal score.
- Return type:
- best_ts_fbeta_score_classic(labels: torch.Tensor, scores: torch.Tensor, beta: float) Tuple[float, Dict[str, Any]]
Compute the \(F_{\beta}\) score using precision and recall for time series [Tatbul2018].
This method will apply all possible thresholds to the values in
scoresand compute the \(F_{\beta}\) score for the resulting binary predictions. It then returns the highest score.Note
This function uses the default cardinality function (\(\frac[1}{x}\)) and unweighted precision, i.e., the default parameters described in [Tatbul2018].
- Parameters:
labels (torch.Tensor) – A 1-D
Tensorcontaining the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.scores (torch.Tensor) – A 1-D
Tensorcontaining the scores returned by anAnomalyDetectorbeta (float) – Positive number that determines the trade-off between precision and recall when computing the F-score. \(\beta = 1\) assigns equal weight to both while \(\beta < 1\) emphasizes precision and vice versa.
- Returns:
A tuple consisting of the best \(F_{\beta}\) score and a dict containing the threshold, recall and precision that produced the maximal score.
- Return type:
- best_ts_f1_score(labels: torch.Tensor, scores: torch.Tensor) Tuple[float, Dict[str, Any]]
Compute the \(F_{1}\) score using precision and recall for time series [Tatbul2018].
This method will apply all possible thresholds to the values in
scoresand compute the \(F_{1}\) score for the resulting binary predictions. It then returns the highest score.Note
This function uses the improved cardinality function and weighted precision as described in [Wagner2023].
- Parameters:
labels (torch.Tensor) – A 1-D
Tensorcontaining the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.scores (torch.Tensor) – A 1-D
Tensorcontaining the scores returned by anAnomalyDetector
- Returns:
A tuple consisting of the best \(F_{1}\) score and a dict containing the threshold, recall and precision that produced the maximal score.
- Return type:
- best_ts_f1_score_classic(labels: torch.Tensor, scores: torch.Tensor) Tuple[float, Dict[str, Any]]
Compute the \(F_{1}\) score using precision and recall for time series [Tatbul2018].
This method will apply all possible thresholds to the values in
scoresand compute the \(F_{1}\) score for the resulting binary predictions. It then returns the highest score.Note
This function uses the default cardinality function (\(\frac[1}{x}\)) and unweighted precision, i.e., the default parameters described in [Tatbul2018].
- Parameters:
labels (torch.Tensor) – A 1-D
Tensorcontaining the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.scores (torch.Tensor) – A 1-D
Tensorcontaining the scores returned by anAnomalyDetector
- Returns:
A tuple consisting of the best \(F_{1}\) score and a dict containing the threshold, recall and precision that produced the maximal score.
- Return type:
- timesead.evaluation.ts_precision_and_recall(anomalies: torch.Tensor, predictions: torch.Tensor, alpha: float = 0, recall_bias_fn: Callable[[torch.Tensor], float] = constant_bias_fn, recall_cardinality_fn: Callable[[int], float] = inverse_proportional_cardinality_fn, precision_bias_fn: Callable | None = None, precision_cardinality_fn: Callable | None = None, anomaly_ranges: List[Tuple[int, int]] | None = None, prediction_ranges: List[Tuple[int, int]] | None = None, weighted_precision: bool = False) Tuple[float, float]
Computes precision and recall for time series as defined in [Tatbul2018].
Note
The default parameters for this function correspond to the defaults recommended in [Tatbul2018]. However, those might not be desirable in most cases, please see [Wagner2023] for a detailed discussion.
- Parameters:
anomalies (torch.Tensor) – Binary 1-D
Tensorof shape(length,)containing the true labels.predictions (torch.Tensor) – Binary 1-D
Tensorof shape(length,)containing the predicted labels.alpha (float) – Weight for existence term in recall.
recall_bias_fn (Callable[[torch.Tensor], float]) – Function that computes the bias term for a given ground-truth window.
recall_cardinality_fn (Callable[[int], float]) – Function that compute the cardinality factor for a given ground-truth window.
precision_bias_fn (Optional[Callable]) – Function that computes the bias term for a given predicted window. If
None, this will be the same asrecall_bias_function.precision_cardinality_fn (Optional[Callable]) – Function that computes the cardinality factor for a given predicted window. If
None, this will be the same asrecall_cardinality_function.weighted_precision (bool) – If True, the precision score of a predicted window will be weighted with the length of the window in the final score. Otherwise, each window will have the same weight.
anomaly_ranges (Optional[List[Tuple[int, int]]]) – A list of tuples
(start, end)for each anomaly window inanomalies, wherestartis the index at which the window starts andendis the first index after the end of the window. This can beNone, in which case the list is computed automatically fromanomalies.prediction_ranges (Optional[List[Tuple[int, int]]]) – A list of tuples
(start, end)for each anomaly window inpredictions, wherestartis the index at which the window starts andendis the first index after the end of the window. This can beNone, in which case the list is computed automatically frompredictions.
- Returns:
A tuple consisting of the time-series precision and recall for the given labels.
- Return type:
- timesead.evaluation.constant_bias_fn(inputs: torch.Tensor) float
Compute the overlap size for a constant bias function that assigns the same weight to all positions.
This functions computes
\[\omega(\text{inputs}) = \frac{1}{n} \sum_{i = 1}^{n} \text{inputs}_i,\]where \(n = \lvert \text{inputs} \rvert\).
Note
To improve the runtime of our algorithm, we calculate the overlap \(\omega\) directly as part of the bias function.
- Parameters:
inputs (torch.Tensor) – A 1-D
Tensorcontaining the predictions inside a ground-truth window.- Returns:
The overlap \(\omega\).
- Return type:
- timesead.evaluation.back_bias_fn(inputs: torch.Tensor) float
Compute the overlap size for a bias function that assigns the more weight to predictions towards the back of a ground-truth anomaly window.
This functions computes
\[\omega(\text{inputs}) = \frac{2}{n * (n + 1)} \sum_{i = 1}^{n} \text{inputs}_i \cdot i,\]where \(n = \lvert \text{inputs} \rvert\).
Note
To improve the runtime of our algorithm, we calculate the overlap \(\omega\) directly as part of the bias function.
- Parameters:
inputs (torch.Tensor) – A 1-D
Tensorcontaining the predictions inside a ground-truth window.- Returns:
The overlap \(\omega\).
- Return type:
- timesead.evaluation.front_bias_fn(inputs: torch.Tensor) float
Compute the overlap size for a bias function that assigns the more weight to predictions towards the front of a ground-truth anomaly window.
This functions computes
\[\omega(\text{inputs}) = \frac{2}{n * (n + 1)} \sum_{i = 1}^{n} \text{inputs}_i \cdot (n + 1 - i),\]where \(n = \lvert \text{inputs} \rvert\).
Note
To improve the runtime of our algorithm, we calculate the overlap \(\omega\) directly as part of the bias function.
- Parameters:
inputs (torch.Tensor) – A 1-D
Tensorcontaining the predictions inside a ground-truth window.- Returns:
The overlap \(\omega\).
- Return type:
- timesead.evaluation.middle_bias_fn(inputs: torch.Tensor) float
Compute the overlap size for a bias function that assigns the more weight to predictions in the middle of a ground-truth anomaly window.
This functions computes
\[\begin{split}\omega(\text{inputs}) = \frac{2}{m * (m + 1) + (n - m) * (n - m + 1)} \sum_{i = 1}^{n} \text{inputs}_i \cdot \begin{cases} i & \text{if } i \leq m\\ (n + 1 - i) & \text{otherwise} \end{cases},\end{split}\]where \(n = \lvert \text{inputs} \rvert\) and \(m = \lceil \frac{n}{2} \rceil\).
Note
To improve the runtime of our algorithm, we calculate the overlap \(\omega\) directly as part of the bias function.
- Parameters:
inputs (torch.Tensor) – A 1-D
Tensorcontaining the predictions inside a ground-truth window.- Returns:
The overlap \(\omega\).
- Return type:
- timesead.evaluation.inverse_proportional_cardinality_fn(cardinality: int, gt_length: int) float
Cardinality function that assigns an inversely proportional weight to predictions within a single ground-truth window.
This is the default cardinality function recommended in [Tatbul2018].
Note
This function leads to a metric that is not recall-consistent! Please see [Wagner2023] for more details.
- Parameters:
- Returns:
The cardinality factor \(\frac{1}{\text{cardinality}}\).
- Return type:
[Tatbul2018]N. Tatbul, T.J. Lee, S. Zdonik, M. Alam, J. Gottschlich. Precision and recall for time series. Advances in neural information processing systems. 2018;31.
[Wagner2023]D. Wagner, T. Michels, F.C.F. Schulz, A. Nair, M. Rudolph, and M. Kloft. TimeSeAD: Benchmarking Deep Multivariate Time-Series Anomaly Detection. Transactions on Machine Learning Research (TMLR), (to appear) 2023.