An ML team wants to compare models before a decision threshold has been set. What is the name of the metric that represents the area under the ROC curve and expresses the discrimination performance of a binary classifier in a single value that is independent of the threshold?

1 / 1
Select an answer
CorrectA

Explanation

Identify the threshold-independent binary classification performance metric.

  • 1independent of the thresholdNot affected by the classification boundary
  • 2area under the ROC curveAUC (area under the ROC curve)
ACorrect

AUC

This is correct. The ROC curve is a curve that plots true positive rate (TPR, y-axis) and false positive rate (FPR, x-axis) as the classification threshold is varied. TPR is the proportion of actual positives that were correctly identified as positive, and FPR is the proportion of actual negatives that were incorrectly identified as positive. AUC (Area Under the Curve) is the area under that ROC curve, expressing a model's discrimination performance in a single value independent of the classification threshold. A value close to 1.0 indicates high performance; 0.5 is equivalent to random classification, making it useful for comparing models before a threshold is set.

BIncorrect

F1 score

This is incorrect. The F1 score is the harmonic mean of precision and recall and is calculated from predictions at a specific threshold. Because it depends on the threshold, it does not match the condition in this question (see the key point for the relationship between the threshold and AUC).

CIncorrect

Accuracy

This is incorrect. Accuracy is also a metric calculated from predictions after a threshold has been set. Because it depends on the threshold, it does not match the condition in this question (see the key point for the relationship between the threshold and AUC).

DIncorrect

Recall

This is incorrect. Recall is also a metric calculated from predictions determined by the threshold. Because it depends on the threshold, it does not match the condition in this question (see the key point for the relationship between the threshold and AUC).

Key Takeaway

First, a threshold is the classification boundary that divides a model's probability scores into positive or negative (for example, classifying a score of 0.5 or above as positive).
For example, in a disease screening test, the model outputs a 'probability score of having the disease' for each patient. Lowering the threshold to 0.3 broadens the net of positive classifications, reducing misses (classifying a patient with the disease as negative) but increasing false positives (classifying a healthy person as positive) (TPR↑ but FPR also↑). Conversely, raising it to 0.8 reduces false positives but increases misses. As the threshold is varied, TPR and FPR trade off against each other.
- ROC curve: The trajectory of TPR (y-axis) and FPR (x-axis) as the threshold is moved from 0 to 1.
- AUC (area under the ROC curve): The area under that curve. Expresses discrimination performance in a single threshold-independent value; closer to 1 = higher performance; 0.5 = equivalent to random.
- Analysis and selection: First use AUC to compare models' inherent quality, then in production select the threshold on the ROC curve that suits the goal (e.g., minimize FPR). F1, accuracy, and recall are threshold-dependent metrics and are not suited for comparison before the threshold is set.