Choosing the unit for counting an LLM's billing …

A development team is estimating the cost of on-demand usage of Amazon Bedrock. The pricing lists a unit price based on the amount of input and output, and the upper limit on the input a model can accept at one time (the context window) is also expressed in the same unit. Which term corresponds to this unit?

1 / 1

Select an answer

CorrectB

Explanation

Question Overview

Choosing the unit for counting an LLM's billing and input limit.

Requirements to satisfy

1「a unit price based on the amount of input and output」On-demand billing is based on the number of tokens
2「expressed in the same unit」The context window limit is also counted in tokens

Per-option explanation

AIncorrect

Embedding

An embedding is a representation that converts the meaning of a token or sentence into a numeric vector.

It is used for semantic search and similarity comparison, not as a unit for counting the amount of input and output or pricing, so it is incorrect.

BCorrect

Token

Correct. An LLM splits input text into the smallest units called tokens (words, substrings, and so on) for processing. Bedrock on-demand pricing is charged by the number of input and output tokens, and the context window limit is also expressed in number of tokens.

CIncorrect

Parameter

Parameters are the number of internal weights a model acquires through training, a metric for model scale. They are used as the strength of connections (weights) between neurons when computing output from input, and as they are gradually adjusted through training, they determine the accuracy of text generation and prediction. In general, the more parameters there are, the higher the expressiveness, but memory and compute cost also increase (for example, 'a model with 7 billion parameters' indicates scale).

Because it is a 'term that expresses a number', it is easy to confuse with a billing unit, but what is being counted is the weights on the model side, not a unit for the amount of input and output or pricing, so it is incorrect.

DIncorrect

Epoch

An epoch is the number of training passes that represents how many times the entire dataset is repeated during training.

It is a training-time metric, not a unit for the amount of input and output or pricing at inference time, so it is incorrect.

Key Takeaway

Distinguish the basic generative AI terms by 'what is being counted'.
・Token: the smallest processing unit of text. The basis for input/output amount, pricing, and the context window.
・Parameter: the number of internal weights of a model (model scale).
・Embedding: a vector representation of meaning.
・Epoch: the number of training iterations.
When the question is about the 'unit for billing or input limit', it is the token.

Explanation

💡Key Takeaway

Related Links

Key Takeaway