timesead.evaluation.evaluator

Classes

Evaluator

A class that can compute several evaluation metrics for a dataset. Each method returns the score as a single float,

Module Contents

class timesead.evaluation.evaluator.Evaluator

A class that can compute several evaluation metrics for a dataset. Each method returns the score as a single float, but it can also return additional information in a dict.

auc(labels: torch.Tensor, scores: torch.Tensor) Tuple[float, Dict[str, Any]]

Compute the classic point-wise area under the receiver operating characteristic curve.

This will return a value between 0 and 1 where 1 indicates a perfect classifier.

See also

Scikit-learns’s roc_auc_score() function.

Parameters:
Returns:

A tuple consisting of the AUC score and an empty dict.

Return type:

Tuple[float, Dict[str, Any]]

f1_score(labels: torch.Tensor, scores: torch.Tensor, pos_label: int = 1) Tuple[float, Dict[str, Any]]

Compute the classic point-wise F1 score.

This will return a value between 0 and 1 where 1 indicates a perfect classifier.

See also

Scikit-learn’s f1_score() function.

Parameters:
  • labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.

  • scores (torch.Tensor) – A 1-D Tensor containing binary predictions of whether a point is an anomaly or not.

  • pos_label (int) – Class to report.

Returns:

A tuple consisting of the F1 score and an empty dict.

Return type:

Tuple[float, Dict[str, Any]]

best_fbeta_score(labels: torch.Tensor, scores: torch.Tensor, beta: float) Tuple[float, Dict[str, Any]]

Compute the classic point-wise \(F_{\beta}\) score.

This method will apply all possible thresholds to the values in scores and compute the \(F_{\beta}\) score for the resulting binary predictions. It then returns the highest score.

See also

Scikit-learn’s fbeta_score() function.

Parameters:
  • labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.

  • scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector.

  • beta (float) – Positive number that determines the trade-off between precision and recall when computing the F-score. \(\beta = 1\) assigns equal weight to both while \(\beta < 1\) emphasizes precision and vice versa.

Returns:

A tuple consisting of the best \(F_{\beta}\) score and a dict containing the threshold that produced the maximal score.

Return type:

Tuple[float, Dict[str, Any]]

best_f1_score(labels: torch.Tensor, scores: torch.Tensor) Tuple[float, Dict[str, Any]]

Compute the classic point-wise \(F_{1}\) score.

This method will apply all possible thresholds to the values in scores and compute the \(F_{1}\) score for the resulting binary predictions. It then returns the highest score.

See also

Scikit-learn’s f1_score() function.

Parameters:
Returns:

A tuple consisting of the best \(F_{1}\) score and a dict containing the threshold that produced the maximal score.

Return type:

Tuple[float, Dict[str, Any]]

auprc(labels: torch.Tensor, scores: torch.Tensor, integration: str = 'trapezoid') Tuple[float, Dict[str, Any]]

Compute the classic point-wise area under the precision-recall curve.

This will return a value between 0 and 1 where 1 indicates a perfect classifier.

See also

Scikit-learn’s average_precision() function.

Scikit-learn’s precision_recall_curve() function.

Parameters:
  • labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.

  • scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector.

  • integration (str) – Method to use for computing the area under the curve. 'riemann' corresponds to a simple Riemann sum, whereas 'trapezoid' uses the trapezoidal rule.

Returns:

A tuple consisting of the AuPRC score and an empty dict.

Return type:

Tuple[float, Dict[str, Any]]

average_precision(labels: torch.Tensor, scores: torch.Tensor) Tuple[float, Dict[str, Any]]

Compute the classic point-wise average precision score.

Note

This is just a shorthand for auprc() with integration='riemann'.

See also

Scikit-learn’s average_precision() function.

Parameters:
Returns:

A tuple consisting of the average precision score and an empty dict.

Return type:

Tuple[float, Dict[str, Any]]

ts_auprc(labels: torch.Tensor, scores: torch.Tensor, integration='trapezoid', weighted_precision: bool = True) Tuple[float, Dict[str, Any]]

Compute the area under the precision-recall curve using precision and recall for time series [Tatbul2018].

Note

This function uses the improved cardinality function described in [Wagner2023].

Parameters:
  • labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.

  • scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector

  • integration – Method to use for computing the area under the curve. 'riemann' corresponds to a simple Riemann sum, whereas 'trapezoid' uses the trapezoidal rule.

  • weighted_precision (bool) – If True, the precision score of a predicted window will be weighted with the length of the window in the final score. Otherwise, each window will have the same weight.

Returns:

A tuple consisting of the AuPRC score and an empty dict.

Return type:

Tuple[float, Dict[str, Any]]

[Tatbul2018] (1,2,3,4,5,6,7,8,9)

N. Tatbul, T.J. Lee, S. Zdonik, M. Alam, J. Gottschlich. Precision and recall for time series. Advances in neural information processing systems. 2018;31.

[Wagner2023] (1,2,3)

D. Wagner, T. Michels, F.C.F. Schulz, A. Nair, M. Rudolph, and M. Kloft. TimeSeAD: Benchmarking Deep Multivariate Time-Series Anomaly Detection. Transactions on Machine Learning Research (TMLR), (to appear) 2023.

ts_average_precision(labels: torch.Tensor, scores: torch.Tensor, weighted_precision: bool = True) Tuple[float, Dict[str, Any]]

Compute the average precision score using precision and recall for time series [Tatbul2018].

Note

This is just a shorthand for ts_auprc() with integration='riemann'.

Parameters:
  • labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.

  • scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector

  • weighted_precision (bool) – If True, the precision score of a predicted window will be weighted with the length of the window in the final score. Otherwise, each window will have the same weight.

Returns:

A tuple consisting of the average precision score and an empty dict.

Return type:

Tuple[float, Dict[str, Any]]

ts_auprc_unweighted(labels: torch.Tensor, scores: torch.Tensor) Tuple[float, Dict[str, Any]]

Compute the area under the precision-recall curve using precision and recall for time series [Tatbul2018].

Note

This is just a shorthand for ts_auprc() with integration='riemann' and weighted_precision=False.

Parameters:
Returns:

A tuple consisting of the AuPRC score and an empty dict.

Return type:

Tuple[float, Dict[str, Any]]

best_ts_fbeta_score(labels: torch.Tensor, scores: torch.Tensor, beta: float) Tuple[float, Dict[str, Any]]

Compute the \(F_{\beta}\) score using precision and recall for time series [Tatbul2018].

This method will apply all possible thresholds to the values in scores and compute the \(F_{\beta}\) score for the resulting binary predictions. It then returns the highest score.

Note

This function uses the improved cardinality function and weighted precision as described in [Wagner2023].

Parameters:
  • labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.

  • scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector

  • beta (float) – Positive number that determines the trade-off between precision and recall when computing the F-score. \(\beta = 1\) assigns equal weight to both while \(\beta < 1\) emphasizes precision and vice versa.

Returns:

A tuple consisting of the best \(F_{\beta}\) score and a dict containing the threshold, recall and precision that produced the maximal score.

Return type:

Tuple[float, Dict[str, Any]]

best_ts_fbeta_score_classic(labels: torch.Tensor, scores: torch.Tensor, beta: float) Tuple[float, Dict[str, Any]]

Compute the \(F_{\beta}\) score using precision and recall for time series [Tatbul2018].

This method will apply all possible thresholds to the values in scores and compute the \(F_{\beta}\) score for the resulting binary predictions. It then returns the highest score.

Note

This function uses the default cardinality function (\(\frac[1}{x}\)) and unweighted precision, i.e., the default parameters described in [Tatbul2018].

Parameters:
  • labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.

  • scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector

  • beta (float) – Positive number that determines the trade-off between precision and recall when computing the F-score. \(\beta = 1\) assigns equal weight to both while \(\beta < 1\) emphasizes precision and vice versa.

Returns:

A tuple consisting of the best \(F_{\beta}\) score and a dict containing the threshold, recall and precision that produced the maximal score.

Return type:

Tuple[float, Dict[str, Any]]

best_ts_f1_score(labels: torch.Tensor, scores: torch.Tensor) Tuple[float, Dict[str, Any]]

Compute the \(F_{1}\) score using precision and recall for time series [Tatbul2018].

This method will apply all possible thresholds to the values in scores and compute the \(F_{1}\) score for the resulting binary predictions. It then returns the highest score.

Note

This function uses the improved cardinality function and weighted precision as described in [Wagner2023].

Parameters:
Returns:

A tuple consisting of the best \(F_{1}\) score and a dict containing the threshold, recall and precision that produced the maximal score.

Return type:

Tuple[float, Dict[str, Any]]

best_ts_f1_score_classic(labels: torch.Tensor, scores: torch.Tensor) Tuple[float, Dict[str, Any]]

Compute the \(F_{1}\) score using precision and recall for time series [Tatbul2018].

This method will apply all possible thresholds to the values in scores and compute the \(F_{1}\) score for the resulting binary predictions. It then returns the highest score.

Note

This function uses the default cardinality function (\(\frac[1}{x}\)) and unweighted precision, i.e., the default parameters described in [Tatbul2018].

Parameters:
Returns:

A tuple consisting of the best \(F_{1}\) score and a dict containing the threshold, recall and precision that produced the maximal score.

Return type:

Tuple[float, Dict[str, Any]]