To fairly estimate generalization performance on unseen data without bias
This is correct. Splitting data into three is to give each a role and fairly measure generalization performance.
- Training data (train): data shown to the model repeatedly to learn its parameters (weights).
- Validation data (validation): data used to test the in-progress model for hyperparameter tuning, model selection, and checking overfitting (not used for training itself).
- Test data (test): data used only once after everything is decided, to measure final generalization performance on unseen data without bias.
Evaluating on data not used in training avoids overstated results due to rote memorization.