An AI team is comparing techniques to improve a chat model's responses. What is the term for the technique that uses feedback in which humans rate the quality of outputs to adjust the model's responses to align with human preferences and values?

1 / 1
Select an answer
CorrectD

Explanation

A question that asks which technique aligns a model using human feedback.

  • 1humans rate the quality of outputsUse human ratings as a reward
  • 2align with human preferences and valuesAligning responses = RLHF
AIncorrect

Fine-tuning

Fine-tuning is a general term for additional training that adapts a model to a specific task using labeled data.

It is not the term for the mechanism that uses human ratings as a reward to align with preferences, so RLHF is the better answer to this question.

BIncorrect

Continued pre-training

Continued pre-training is additional training that broadens a model's knowledge using unlabeled data.

It is not a technique that uses human ratings as a reward to align with preferences, so this is incorrect.

CIncorrect

Transfer learning

Transfer learning is the idea of reusing knowledge from one task for another task.

It is not an adjustment technique that uses human feedback as a reward, so this is incorrect.

DCorrect

RLHF

Correct. RLHF is a technique that uses feedback in which humans rate outputs as a reward to adjust the model's responses to align with human preferences and values. It moves responses toward being safe and helpful.

Key Takeaway

Remember the correct answer, RLHF (reinforcement learning from human feedback).
・A technique that uses feedback in which humans rate outputs as a reward to adjust responses to align with human preferences and values.
・An important technique related to responsible AI that moves responses toward being safer and more helpful.