The training (train) set
The training set is data for training the model's weights.
It is not the data held back for final evaluation, so this is incorrect.
An ML team is trying to ensure the reliability of the evaluation figure used in the model's final report. Which dataset is held back until the end, used neither for training nor for hyperparameter tuning, and used to evaluate the model's final generalization performance?
A question about choosing the dataset for evaluating final generalization performance.
The training (train) set
The training set is data for training the model's weights.
It is not the data held back for final evaluation, so this is incorrect.
The test (test) set
Correct. The test set is the data held back to the very end, used neither for training nor for hyperparameter tuning, and used once to evaluate the final generalization performance.
The validation (validation) set
The validation set is data used during training for hyperparameter tuning and model selection.
It is not the data held back for final evaluation, so this is incorrect.
A sampled subset of the training data
A subset of the training data is data the model saw during training, even if it is sampled.
Evaluation on data it has already seen becomes optimistic and cannot be used for final evaluation, so this is incorrect.
Note the correct answer, the test (test) set.
- Data held back to the very end, used neither for training nor for hyperparameter tuning, and used once to evaluate the final generalization performance.
- It reveals unbiased performance close to production.
The training set (training the weights), the validation set (comparing settings during training), and the feature store (storing features) are all not the dataset for final evaluation.