A company is integrating generative AI into a customer support chat interface. It is important that the response starts returning immediately after the user submits a message, without making users wait. Which criterion should be given the MOST weight when selecting a model?

1 / 1
Select an answer
CorrectD

Explanation

Select the model selection criterion that should be given the most weight for chat that requires immediacy.

  • 1without making users waitImmediacy (real-time) is the key factor for experience
  • 2response starts returning immediatelySpeed at which the response is returned = prioritize low latency
AIncorrect

Model size

This is incorrect. Model size (number of parameters) is an indicator of accuracy and required resources. In general, larger models tend to be more accurate but slower to respond — this can actually work against the immediate-response requirement.

BIncorrect

Multilingual support

This is incorrect. Multilingual support indicates the breadth of languages the model can handle, and is given priority when serving users in multiple languages with a single model. The criterion in this question is response speed, not number of languages.

CIncorrect

Customizability

This is incorrect. Customizability indicates how easily the model can be adapted for a specific use case through techniques such as fine-tuning, and is given priority when building in proprietary behavior or style. The issue in this question is response speed, not ease of adjustment.

DCorrect

Latency

This is correct. Latency is the criterion that indicates how quickly a response is returned from the time of a request. In situations requiring immediacy such as chat, selecting a model with low latency directly affects the user experience.

Key Takeaway

Understand the mechanism by which 'prioritizing latency' matters:
- The perceived speed of chat is determined by the time until the first token is returned (latency).
- Larger models tend to be heavier to generate, delaying responses and degrading experience.
- Therefore, for immediacy requirements, a low-latency model is prioritized (accuracy vs. speed is a trade-off).
Whenever 'immediate response,' 'real-time,' or 'no waiting' appears, the answer is latency.