A question that asks which technique aligns a mo…

An AI team is comparing techniques to improve a chat model's responses. What is the term for the technique that uses feedback in which humans rate the quality of outputs to adjust the model's responses to align with human preferences and values?

1 / 1

Select an answer

CorrectD

Explanation

Question Overview

A question that asks which technique aligns a model using human feedback.

Requirements to satisfy

1「humans rate the quality of outputs」Use human ratings as a reward
2「align with human preferences and values」Aligning responses = RLHF

Per-option explanation

AIncorrect

Fine-tuning

Fine-tuning is a general term for additional training that adapts a model to a specific task using labeled data.

It is not the term for the mechanism that uses human ratings as a reward to align with preferences, so RLHF is the better answer to this question.

BIncorrect

Continued pre-training

Continued pre-training is additional training that broadens a model's knowledge using unlabeled data.

It is not a technique that uses human ratings as a reward to align with preferences, so this is incorrect.

CIncorrect

Transfer learning

Transfer learning is the idea of reusing knowledge from one task for another task.

It is not an adjustment technique that uses human feedback as a reward, so this is incorrect.

DCorrect

RLHF

Correct. RLHF is a technique that uses feedback in which humans rate outputs as a reward to adjust the model's responses to align with human preferences and values. It moves responses toward being safe and helpful.

Key Takeaway

Remember the correct answer, RLHF (reinforcement learning from human feedback).
・A technique that uses feedback in which humans rate outputs as a reward to adjust responses to align with human preferences and values.
・An important technique related to responsible AI that moves responses toward being safer and more helpful.

Explanation

💡Key Takeaway

Related Links

Key Takeaway