A question about choosing the dataset for evalua…

An ML team is trying to ensure the reliability of the evaluation figure used in the model's final report. Which dataset is held back until the end, used neither for training nor for hyperparameter tuning, and used to evaluate the model's final generalization performance?

1 / 1

Select an answer

CorrectB

Explanation

Question Overview

A question about choosing the dataset for evaluating final generalization performance.

Requirements to satisfy

1「used neither for training nor for hyperparameter tuning」Not used for either training or tuning
2「to evaluate the model's final generalization performance」For the final evaluation = the test set

Per-option explanation

AIncorrect

The training (train) set

The training set is data for training the model's weights.

It is not the data held back for final evaluation, so this is incorrect.

BCorrect

The test (test) set

Correct. The test set is the data held back to the very end, used neither for training nor for hyperparameter tuning, and used once to evaluate the final generalization performance.

CIncorrect

The validation (validation) set

The validation set is data used during training for hyperparameter tuning and model selection.

It is not the data held back for final evaluation, so this is incorrect.

DIncorrect

A sampled subset of the training data

A subset of the training data is data the model saw during training, even if it is sampled.

Evaluation on data it has already seen becomes optimistic and cannot be used for final evaluation, so this is incorrect.

Key Takeaway

Note the correct answer, the test (test) set.
- Data held back to the very end, used neither for training nor for hyperparameter tuning, and used once to evaluate the final generalization performance.
- It reveals unbiased performance close to production.
The training set (training the weights), the validation set (comparing settings during training), and the feature store (storing features) are all not the dataset for final evaluation.

Explanation

💡Key Takeaway

Related Links

Key Takeaway