A company is planning to improve its FAQ search. The plan is to convert each text to a representation that can numerically compute semantic similarity and store it, so that even when a user's question and an FAQ use different wording, a search will return a match if the meanings are close. Which term describes this representation?

1 / 1
Select an answer
CorrectD

Explanation

Identify the term for the numeric representation that supports meaning-based search.

  • 1even when a user's question and an FAQ use different wordingNeed to compare by meaning, not by surface form
  • 2a representation that can numerically compute semantic similarityThe representation that measures similarity by distance = embedding (vector)
AIncorrect

Token

This is incorrect. A token is the processing unit into which text is divided for the model — split into words, sub-word strings such as individual morphemes, or symbols (for example, the word 'customer service' might be split into multiple tokens). LLMs process token sequences in order, and tokens are also the basis for measuring input/output volume and pricing. However, simply splitting text allows only surface-level comparison, so the semantic similarity of texts with different wording cannot be computed.

BIncorrect

Chunk

This is incorrect. A chunk is a text split into manageable pieces. Simply splitting the text does not allow computing semantic similarity numerically; the chunks must first be converted to embeddings before distance-based comparison becomes possible.

CIncorrect

Prompt

This is incorrect. A prompt is the instruction or input text itself given to the model — for example, instructions such as 'summarize the following text in 3 lines,' reference materials, output format examples, or role settings ('You are an accountant'). The technique of carefully crafting these is called prompt engineering. What this question asks about is a semantic representation stored and used for comparison, not the way instructions are given to a model.

DCorrect

Embedding

This is correct. Embeddings convert the meaning of words or sentences into a numerical vector. Because vectors of semantically similar meanings are also similar in distance, meaning-based search is possible by computing distance (similarity) even with different wording.

Key Takeaway

'Computing semantic similarity numerically (by distance)' refers to embeddings. Each text is converted to a vector (a sequence of numbers) with hundreds to thousands of dimensions, and the distance between vectors (cosine similarity, Euclidean distance, etc.) is computed. A small distance means the meanings are similar, so converting a question to a vector and retrieving the top few results with the smallest distance from stored vectors allows semantically similar FAQs to surface even with different wording (this is the mechanism behind semantic search and the retrieval step in RAG). Tokens (processing units) and chunks (divided pieces) are both text in their original form and cannot compute semantic similarity on their own.