A question about choosing the method that trains…

A company wants to improve the inconsistency in how its model's responses follow instructions. Which method trains the model on data made of "instruction (prompt) and desired response" pairs to raise its ability to follow instructions?

1 / 1

Select an answer

CorrectC

Explanation

Question Overview

A question about choosing the method that trains on instruction-response pairs.

Requirements to satisfy

1「"instruction (prompt) and desired response" pairs」Train on pair data of instruction to response
2「raise its ability to follow instructions」Teach instruction-following = instruction tuning

Per-option explanation

AIncorrect

Continued pretraining

Continued pretraining is a method that performs additional training on large amounts of unlabeled data to acquire domain knowledge.

It is not a method that teaches instruction-following with instruction-response pairs, so this is incorrect.

BIncorrect

RAG (retrieval-augmented generation)

RAG is a method that retrieves external knowledge to reinforce answers and does not update the weights.

It is not a method that teaches instruction-following with instruction-response pairs, so this is incorrect.

CCorrect

Instruction tuning

Correct. Instruction tuning is a type of fine-tuning that trains on data made of pairs of instructions and desired responses to raise the ability to follow instructions. For example, training on many pairs of diverse instructions and model answers, such as "summarize this text in three lines" paired with a model summary, "translate to English" paired with a correct translation, and "make it a bulleted list" paired with a bulleted example, makes it easier to follow even unseen instructions.

DIncorrect

Prompt template

A prompt template is a method that standardizes the input format to improve consistency and does not involve training.

It is not a method that trains on instruction-response pairs to raise instruction-following, so this is incorrect.

Key Takeaway

Note the idea of the correct answer, instruction tuning.
- A type of fine-tuning that trains on data made of pairs of instructions and desired responses to raise the ability to follow instructions.
- It makes the model more likely to return responses aligned with the user's intent.
It differs from continued pretraining (domain knowledge from unlabeled data), RAG (external references), and prompt templates (format standardization, no training).

Explanation

💡Key Takeaway

Related Links

Key Takeaway