A company wants to improve the inconsistency in how its model's responses follow instructions. Which method trains the model on data made of "instruction (prompt) and desired response" pairs to raise its ability to follow instructions?

1 / 1
Select an answer
CorrectC

Explanation

A question about choosing the method that trains on instruction-response pairs.

  • 1"instruction (prompt) and desired response" pairsTrain on pair data of instruction to response
  • 2raise its ability to follow instructionsTeach instruction-following = instruction tuning
AIncorrect

Continued pretraining

Continued pretraining is a method that performs additional training on large amounts of unlabeled data to acquire domain knowledge.

It is not a method that teaches instruction-following with instruction-response pairs, so this is incorrect.

BIncorrect

RAG (retrieval-augmented generation)

RAG is a method that retrieves external knowledge to reinforce answers and does not update the weights.

It is not a method that teaches instruction-following with instruction-response pairs, so this is incorrect.

CCorrect

Instruction tuning

Correct. Instruction tuning is a type of fine-tuning that trains on data made of pairs of instructions and desired responses to raise the ability to follow instructions. For example, training on many pairs of diverse instructions and model answers, such as "summarize this text in three lines" paired with a model summary, "translate to English" paired with a correct translation, and "make it a bulleted list" paired with a bulleted example, makes it easier to follow even unseen instructions.

DIncorrect

Prompt template

A prompt template is a method that standardizes the input format to improve consistency and does not involve training.

It is not a method that trains on instruction-response pairs to raise instruction-following, so this is incorrect.

Key Takeaway

Note the idea of the correct answer, instruction tuning.
- A type of fine-tuning that trains on data made of pairs of instructions and desired responses to raise the ability to follow instructions.
- It makes the model more likely to return responses aligned with the user's intent.
It differs from continued pretraining (domain knowledge from unlabeled data), RAG (external references), and prompt templates (format standardization, no training).