timesead.evaluation

This package contains functions for evaluating the performance of time-series anomaly detectors.

At the moment, the Evaluator class supports classic point-wise metrics such as \(F_1\)-score, area under the precision recall curve, etc., and composite metrics derived from precision and recall for time series.

Submodules

Classes

Evaluator

A class that can compute several evaluation metrics for a dataset. Each method returns the score as a single float,

Functions

`ts_precision_and_recall`(→ Tuple[float, float])	Computes precision and recall for time series as defined in [Tatbul2018].
`constant_bias_fn`(→ float)	Compute the overlap size for a constant bias function that assigns the same weight to all positions.
`back_bias_fn`(→ float)	Compute the overlap size for a bias function that assigns the more weight to predictions towards the back of a
`front_bias_fn`(→ float)	Compute the overlap size for a bias function that assigns the more weight to predictions towards the front of a
`middle_bias_fn`(→ float)	Compute the overlap size for a bias function that assigns the more weight to predictions in the middle of a
`inverse_proportional_cardinality_fn`(→ float)	Cardinality function that assigns an inversely proportional weight to predictions within a single ground-truth

Package Contents

class timesead.evaluation.Evaluator

A class that can compute several evaluation metrics for a dataset. Each method returns the score as a single float, but it can also return additional information in a dict.

auc(labels: torch.Tensor, scores: torch.Tensor) → Tuple[float, Dict[str, Any]]

Compute the classic point-wise area under the receiver operating characteristic curve.

This will return a value between 0 and 1 where 1 indicates a perfect classifier.

See also

Scikit-learns’s roc_auc_score() function.

Parameters:

labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.
scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector.

Returns:

A tuple consisting of the AUC score and an empty dict.

Return type:

Tuple[float, Dict[str, Any]]

f1_score(labels: torch.Tensor, scores: torch.Tensor, pos_label: int = 1) → Tuple[float, Dict[str, Any]]

Compute the classic point-wise F1 score.

This will return a value between 0 and 1 where 1 indicates a perfect classifier.

See also

Scikit-learn’s f1_score() function.

Parameters:

labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.
scores (torch.Tensor) – A 1-D Tensor containing binary predictions of whether a point is an anomaly or not.
pos_label (int) – Class to report.

Returns:

A tuple consisting of the F1 score and an empty dict.

Return type:

Tuple[float, Dict[str, Any]]

best_fbeta_score(labels: torch.Tensor, scores: torch.Tensor, beta: float) → Tuple[float, Dict[str, Any]]

Compute the classic point-wise \(F_{\beta}\) score.

This method will apply all possible thresholds to the values in scores and compute the \(F_{\beta}\) score for the resulting binary predictions. It then returns the highest score.

See also

Scikit-learn’s fbeta_score() function.

Parameters:

labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.
scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector.
beta (float) – Positive number that determines the trade-off between precision and recall when computing the F-score. \(\beta = 1\) assigns equal weight to both while \(\beta < 1\) emphasizes precision and vice versa.

Returns:

A tuple consisting of the best \(F_{\beta}\) score and a dict containing the threshold that produced the maximal score.

Return type:

Tuple[float, Dict[str, Any]]

best_f1_score(labels: torch.Tensor, scores: torch.Tensor) → Tuple[float, Dict[str, Any]]

Compute the classic point-wise \(F_{1}\) score.

This method will apply all possible thresholds to the values in scores and compute the \(F_{1}\) score for the resulting binary predictions. It then returns the highest score.

See also

Scikit-learn’s f1_score() function.

Parameters:

labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.
scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector.

Returns:

A tuple consisting of the best \(F_{1}\) score and a dict containing the threshold that produced the maximal score.

Return type:

Tuple[float, Dict[str, Any]]

auprc(labels: torch.Tensor, scores: torch.Tensor, integration: str = 'trapezoid') → Tuple[float, Dict[str, Any]]

Compute the classic point-wise area under the precision-recall curve.

This will return a value between 0 and 1 where 1 indicates a perfect classifier.

See also

Scikit-learn’s average_precision() function.

Scikit-learn’s precision_recall_curve() function.

Parameters:

labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.
scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector.
integration (str) – Method to use for computing the area under the curve. 'riemann' corresponds to a simple Riemann sum, whereas 'trapezoid' uses the trapezoidal rule.

Returns:

A tuple consisting of the AuPRC score and an empty dict.

Return type:

Tuple[float, Dict[str, Any]]

average_precision(labels: torch.Tensor, scores: torch.Tensor) → Tuple[float, Dict[str, Any]]

Compute the classic point-wise average precision score.

Note

This is just a shorthand for auprc() with integration='riemann'.

See also

Scikit-learn’s average_precision() function.

Parameters:

labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.
scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector

Returns:

A tuple consisting of the average precision score and an empty dict.

Return type:

Tuple[float, Dict[str, Any]]

ts_auprc(labels: torch.Tensor, scores: torch.Tensor, integration='trapezoid', weighted_precision: bool = True) → Tuple[float, Dict[str, Any]]

Compute the area under the precision-recall curve using precision and recall for time series [Tatbul2018].

Note

This function uses the improved cardinality function described in [Wagner2023].

Parameters:

labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.
scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector
integration – Method to use for computing the area under the curve. 'riemann' corresponds to a simple Riemann sum, whereas 'trapezoid' uses the trapezoidal rule.
weighted_precision (bool) – If True, the precision score of a predicted window will be weighted with the length of the window in the final score. Otherwise, each window will have the same weight.

Returns:

A tuple consisting of the AuPRC score and an empty dict.

Return type:

Tuple[float, Dict[str, Any]]

[Tatbul2018]

N. Tatbul, T.J. Lee, S. Zdonik, M. Alam, J. Gottschlich. Precision and recall for time series. Advances in neural information processing systems. 2018;31.

[Wagner2023]

D. Wagner, T. Michels, F.C.F. Schulz, A. Nair, M. Rudolph, and M. Kloft. TimeSeAD: Benchmarking Deep Multivariate Time-Series Anomaly Detection. Transactions on Machine Learning Research (TMLR), (to appear) 2023.

ts_average_precision(labels: torch.Tensor, scores: torch.Tensor, weighted_precision: bool = True) → Tuple[float, Dict[str, Any]]

Compute the average precision score using precision and recall for time series [Tatbul2018].

Note

This is just a shorthand for ts_auprc() with integration='riemann'.

Parameters:

labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.
scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector
weighted_precision (bool) – If True, the precision score of a predicted window will be weighted with the length of the window in the final score. Otherwise, each window will have the same weight.

Returns:

A tuple consisting of the average precision score and an empty dict.

Return type:

Tuple[float, Dict[str, Any]]

ts_auprc_unweighted(labels: torch.Tensor, scores: torch.Tensor) → Tuple[float, Dict[str, Any]]

Compute the area under the precision-recall curve using precision and recall for time series [Tatbul2018].

Note

This is just a shorthand for ts_auprc() with integration='riemann' and weighted_precision=False.

Parameters:

labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.
scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector

Returns:

A tuple consisting of the AuPRC score and an empty dict.

Return type:

Tuple[float, Dict[str, Any]]

best_ts_fbeta_score(labels: torch.Tensor, scores: torch.Tensor, beta: float) → Tuple[float, Dict[str, Any]]

Compute the \(F_{\beta}\) score using precision and recall for time series [Tatbul2018].

This method will apply all possible thresholds to the values in scores and compute the \(F_{\beta}\) score for the resulting binary predictions. It then returns the highest score.

Note

This function uses the improved cardinality function and weighted precision as described in [Wagner2023].

Parameters:

labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.
scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector
beta (float) – Positive number that determines the trade-off between precision and recall when computing the F-score. \(\beta = 1\) assigns equal weight to both while \(\beta < 1\) emphasizes precision and vice versa.

Returns:

A tuple consisting of the best \(F_{\beta}\) score and a dict containing the threshold, recall and precision that produced the maximal score.

Return type:

Tuple[float, Dict[str, Any]]

best_ts_fbeta_score_classic(labels: torch.Tensor, scores: torch.Tensor, beta: float) → Tuple[float, Dict[str, Any]]

Compute the \(F_{\beta}\) score using precision and recall for time series [Tatbul2018].

This method will apply all possible thresholds to the values in scores and compute the \(F_{\beta}\) score for the resulting binary predictions. It then returns the highest score.

Note

This function uses the default cardinality function (\(\frac[1}{x}\)) and unweighted precision, i.e., the default parameters described in [Tatbul2018].

Parameters:

labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.
scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector
beta (float) – Positive number that determines the trade-off between precision and recall when computing the F-score. \(\beta = 1\) assigns equal weight to both while \(\beta < 1\) emphasizes precision and vice versa.

Returns:

A tuple consisting of the best \(F_{\beta}\) score and a dict containing the threshold, recall and precision that produced the maximal score.

Return type:

Tuple[float, Dict[str, Any]]

best_ts_f1_score(labels: torch.Tensor, scores: torch.Tensor) → Tuple[float, Dict[str, Any]]

Compute the \(F_{1}\) score using precision and recall for time series [Tatbul2018].

This method will apply all possible thresholds to the values in scores and compute the \(F_{1}\) score for the resulting binary predictions. It then returns the highest score.

Note

This function uses the improved cardinality function and weighted precision as described in [Wagner2023].

Parameters:

labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.
scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector

Returns:

A tuple consisting of the best \(F_{1}\) score and a dict containing the threshold, recall and precision that produced the maximal score.

Return type:

Tuple[float, Dict[str, Any]]

best_ts_f1_score_classic(labels: torch.Tensor, scores: torch.Tensor) → Tuple[float, Dict[str, Any]]

Compute the \(F_{1}\) score using precision and recall for time series [Tatbul2018].

This method will apply all possible thresholds to the values in scores and compute the \(F_{1}\) score for the resulting binary predictions. It then returns the highest score.

Note

This function uses the default cardinality function (\(\frac[1}{x}\)) and unweighted precision, i.e., the default parameters described in [Tatbul2018].

Parameters:

labels (torch.Tensor) – A 1-D Tensor containing the ground-truth labels. 1 corresponds to an anomaly, 0 means that the point is normal.
scores (torch.Tensor) – A 1-D Tensor containing the scores returned by an AnomalyDetector

Returns:

A tuple consisting of the best \(F_{1}\) score and a dict containing the threshold, recall and precision that produced the maximal score.

Return type:

Tuple[float, Dict[str, Any]]

timesead.evaluation.ts_precision_and_recall(anomalies: torch.Tensor, predictions: torch.Tensor, alpha: float = 0, recall_bias_fn: Callable[[torch.Tensor], float] = constant_bias_fn, recall_cardinality_fn: Callable[[int], float] = inverse_proportional_cardinality_fn, precision_bias_fn: Callable | None = None, precision_cardinality_fn: Callable | None = None, anomaly_ranges: List[Tuple[int, int]] | None = None, prediction_ranges: List[Tuple[int, int]] | None = None, weighted_precision: bool = False) → Tuple[float, float]

Computes precision and recall for time series as defined in [Tatbul2018].

Note

The default parameters for this function correspond to the defaults recommended in [Tatbul2018]. However, those might not be desirable in most cases, please see [Wagner2023] for a detailed discussion.

Parameters:

anomalies (torch.Tensor) – Binary 1-D Tensor of shape (length,) containing the true labels.
predictions (torch.Tensor) – Binary 1-D Tensor of shape (length,) containing the predicted labels.
alpha (float) – Weight for existence term in recall.
recall_bias_fn (Callable[[torch.Tensor], float]) – Function that computes the bias term for a given ground-truth window.
recall_cardinality_fn (Callable[[int], float]) – Function that compute the cardinality factor for a given ground-truth window.
precision_bias_fn (Optional[Callable]) – Function that computes the bias term for a given predicted window. If None, this will be the same as recall_bias_function.
precision_cardinality_fn (Optional[Callable]) – Function that computes the cardinality factor for a given predicted window. If None, this will be the same as recall_cardinality_function.
weighted_precision (bool) – If True, the precision score of a predicted window will be weighted with the length of the window in the final score. Otherwise, each window will have the same weight.
anomaly_ranges (Optional[List[Tuple[int, int]]]) – A list of tuples (start, end) for each anomaly window in anomalies, where start is the index at which the window starts and end is the first index after the end of the window. This can be None, in which case the list is computed automatically from anomalies.
prediction_ranges (Optional[List[Tuple[int, int]]]) – A list of tuples (start, end) for each anomaly window in predictions, where start is the index at which the window starts and end is the first index after the end of the window. This can be None, in which case the list is computed automatically from predictions.

Returns:

A tuple consisting of the time-series precision and recall for the given labels.

Return type:

Tuple[float, float]

timesead.evaluation.constant_bias_fn(inputs: torch.Tensor) → float

Compute the overlap size for a constant bias function that assigns the same weight to all positions.

This functions computes

\[\omega(\text{inputs}) = \frac{1}{n} \sum_{i = 1}^{n} \text{inputs}_i,\]

where \(n = \lvert \text{inputs} \rvert\).

Note

To improve the runtime of our algorithm, we calculate the overlap \(\omega\) directly as part of the bias function.

Parameters:: inputs (torch.Tensor) – A 1-D Tensor containing the predictions inside a ground-truth window.
Returns:: The overlap \(\omega\).
Return type:: float

timesead.evaluation.back_bias_fn(inputs: torch.Tensor) → float

Compute the overlap size for a bias function that assigns the more weight to predictions towards the back of a ground-truth anomaly window.

This functions computes

\[\omega(\text{inputs}) = \frac{2}{n * (n + 1)} \sum_{i = 1}^{n} \text{inputs}_i \cdot i,\]

where \(n = \lvert \text{inputs} \rvert\).

Note

To improve the runtime of our algorithm, we calculate the overlap \(\omega\) directly as part of the bias function.

Parameters:: inputs (torch.Tensor) – A 1-D Tensor containing the predictions inside a ground-truth window.
Returns:: The overlap \(\omega\).
Return type:: float

timesead.evaluation.front_bias_fn(inputs: torch.Tensor) → float

Compute the overlap size for a bias function that assigns the more weight to predictions towards the front of a ground-truth anomaly window.

This functions computes

\[\omega(\text{inputs}) = \frac{2}{n * (n + 1)} \sum_{i = 1}^{n} \text{inputs}_i \cdot (n + 1 - i),\]

where \(n = \lvert \text{inputs} \rvert\).

Note

To improve the runtime of our algorithm, we calculate the overlap \(\omega\) directly as part of the bias function.

Parameters:: inputs (torch.Tensor) – A 1-D Tensor containing the predictions inside a ground-truth window.
Returns:: The overlap \(\omega\).
Return type:: float

timesead.evaluation.middle_bias_fn(inputs: torch.Tensor) → float

Compute the overlap size for a bias function that assigns the more weight to predictions in the middle of a ground-truth anomaly window.

This functions computes

\[\begin{split}\omega(\text{inputs}) = \frac{2}{m * (m + 1) + (n - m) * (n - m + 1)} \sum_{i = 1}^{n} \text{inputs}_i \cdot \begin{cases} i & \text{if } i \leq m\\ (n + 1 - i) & \text{otherwise} \end{cases},\end{split}\]

where \(n = \lvert \text{inputs} \rvert\) and \(m = \lceil \frac{n}{2} \rceil\).

Note

To improve the runtime of our algorithm, we calculate the overlap \(\omega\) directly as part of the bias function.

Parameters:: inputs (torch.Tensor) – A 1-D Tensor containing the predictions inside a ground-truth window.
Returns:: The overlap \(\omega\).
Return type:: float

timesead.evaluation.inverse_proportional_cardinality_fn(cardinality: int, gt_length: int) → float

Cardinality function that assigns an inversely proportional weight to predictions within a single ground-truth window.

This is the default cardinality function recommended in [Tatbul2018].

Note

This function leads to a metric that is not recall-consistent! Please see [Wagner2023] for more details.

Parameters:

cardinality (int) – Number of predicted windows that overlap the ground-truth window in question.
gt_length (int) – Length of the ground-truth window (unused).

Returns:

The cardinality factor \(\frac{1}{\text{cardinality}}\).

Return type:

float

[Tatbul2018]

N. Tatbul, T.J. Lee, S. Zdonik, M. Alam, J. Gottschlich. Precision and recall for time series. Advances in neural information processing systems. 2018;31.

[Wagner2023]

D. Wagner, T. Michels, F.C.F. Schulz, A. Nair, M. Rudolph, and M. Kloft. TimeSeAD: Benchmarking Deep Multivariate Time-Series Anomaly Detection. Transactions on Machine Learning Research (TMLR), (to appear) 2023.