Choosing the inference parameter that narrows ou…

A developer is comparing inference parameters that finely control output diversity, while being aware of the difference from top-k. What is the name of the setting that controls diversity by narrowing the output candidates to the top words until a cumulative probability reaches a set value?

1 / 1

Select an answer

CorrectB

Explanation

Question Overview

Choosing the inference parameter that narrows output candidates by cumulative probability.

Requirements to satisfy

1「the top words until a cumulative probability reaches a set value」Narrows candidates to the top by probability
2「controls diversity by narrowing」The inference parameter is top-p (nucleus sampling)

Per-option explanation

AIncorrect

top-k

top-k is a parameter that narrows the candidates to the top k items by probability (a count).

The method that narrows by cumulative probability (a proportion) is top-p, so the basis for narrowing differs and it is incorrect.

BCorrect

top-p

Correct. top-p (nucleus sampling) is an inference parameter that sorts the candidates for the next word in descending order of probability and narrows them to only the top words until the cumulative probability reaches a set value p. For example, with p=0.9, the candidates are the words whose probabilities, added from the top, reach a total of 90%, and lower-probability words below that are excluded. The smaller p is, the more it narrows to certain words and the more stable the output; the larger it is, the wider the candidates and the more diverse the expression. Unlike top-k, which fixes a count, the number of candidates changes dynamically according to the shape of the probability distribution.

CIncorrect

temperature

temperature is a parameter that adjusts the overall sharpness of the probability distribution.

It is not a method for narrowing down candidates, so it is incorrect.

DIncorrect

Maximum tokens

Maximum tokens is a parameter that determines the upper limit on output length.

It is not the narrowing of candidate words, so it is incorrect.

Key Takeaway

An LLM is a model that 'predicts the word most likely to come next as a continuation of the text so far'. It computes a probability for each candidate next word, selects one word from them to output, and repeats this process to generate text. The three inference parameters that control this 'way of selecting the next word' are top-p, top-k, and temperature.
・top-p (nucleus sampling): narrows the candidates to the top words until the cumulative probability reaches a set value (because it narrows by the sum of probabilities, the number of candidates changes dynamically).
・top-k: narrows the candidates to the top k items by probability (a fixed count).
・temperature: adjusts the overall sharpness of the probability distribution (lower is more certain, higher is more diverse).
All of these operate at inference time. Number of epochs, learning rate, and batch size are settings for 'training time' and are not parameters that narrow output candidates at inference time.

Explanation

💡Key Takeaway

Related Links

Key Takeaway