In a company review, a model trained on biased historical hiring data was found to reproduce the same bias. What is the name of the problem in which bias contained in the training data appears directly as bias in the model's predictions?

1 / 1
Select an answer
CorrectD

Explanation

Choosing the name of bias that originates from the training data.

  • 1bias contained in the training dataBias in the data itself
  • 2appears directly as bias in the model's predictionsBias reflected in training = data bias
AIncorrect

Hallucination

Hallucination is the problem in which generative AI plausibly outputs content not grounded in fact.

It is a problem of model output quality, not the reproduction of training-data bias in predictions, so it is incorrect.

BIncorrect

Overfitting

Overfitting is the problem in which a model fits the training data excessively and loses accuracy on unseen data.

It is a problem of generalization in training, not bias in the data appearing as bias in predictions, so it is incorrect.

CIncorrect

Data drift

Data drift is the problem in which, during operation, the distribution of input data gradually shifts from training time.

It is a problem of change over time, not the reproduction of bias that was in the training data from the start, so it is incorrect.

DCorrect

Data bias

Correct. Data bias is the problem in which bias contained in the training data appears directly as bias in the model's predictions (overrepresentation or underrepresentation of certain attributes, historical prejudice, and so on).

Key Takeaway

Remember the correct answer, 'data bias,' with concrete examples.
・Data bias is the problem in which bias contained in the training data appears directly as bias in the model's predictions.
・Examples: training on data where past hiring was skewed toward men causes the model to rate men more highly; if data for a certain region or age group is scarce, only that group's prediction accuracy drops.
・It is a major cause of impaired fairness, addressed by ensuring representative data and bias detection (with tools such as SageMaker Clarify).