An ML team is choosing the evaluation metrics for a defect-detection (classification) model, distinguishing them from metrics for regression. Which are appropriate evaluation metrics for measuring the performance of a classification model? (Choose TWO.)

1 / 1
Select all that apply
CorrectA, B

Explanation

A question about choosing TWO evaluation metrics for a classification task.

  • 1measuring the performance of a classificationMeasure the correctness of classification
  • 2appropriate evaluation metricsPrecision and recall apply
ACorrect

Precision

Correct. Precision is a classification metric that represents the proportion of items the model judged as "positive" that were actually correct. For example, if 90 of 100 items judged as "defective" in defect detection are actually defective, precision is 0.9. It indicates a low rate of false detection (mistaking good items as defective).

BCorrect

Recall

Correct. Recall is a classification metric that represents the proportion of items that are actually "positive" that the model correctly captured. For example, if 40 of 50 actual defective items are captured as "defective", recall is 0.8. It indicates a low rate of misses (mistaking defective items as good).

CIncorrect

RMSE

RMSE (root mean square error) is a metric that measures the size of prediction error in a regression task. For example, when predicting continuous values such as housing prices or temperature, it represents the size of the gap between predicted and actual values.

It cannot measure the correctness of a category such as defective or not, so it is incorrect as a classification metric.

DIncorrect

MAE

MAE (mean absolute error) is also a regression task error metric. For example, in sales forecasting, it represents the average of the absolute gaps between actual and predicted values.

It measures the error of continuous values and is not a classification performance metric, so this is incorrect.

EIncorrect

Perplexity

Perplexity is a metric for a language model's next-word prediction. For example, it measures how confidently a language model can predict the next word (the certainty of prediction).

It does not measure the correctness of classification, so it is incorrect as a classification model performance metric.

Key Takeaway

Evaluation metrics for a classification task include precision (the proportion of items judged positive that were correct) and recall (the proportion of actual positives that were captured) (the F1 score, the harmonic mean of the two, is also commonly used).