A company wants to run inference on large inputs of several hundred MB each (such as high-resolution video). Processing may take several minutes, and an immediate synchronous response is not required. The plan is to place requests in a queue and process them in order. Which inference method is MOST suitable?

1 / 1
Select an answer
CorrectB

Explanation

A question on choosing the inference method for large inputs that do not need an immediate response.

  • 1large inputs of several hundred MB eachA method that can handle a large payload is needed
  • 2an immediate synchronous response is not requiredProcess in a queue and retrieve later = asynchronous inference
AIncorrect

Real-time inference

Real-time inference is a method that responds immediately and synchronously with low latency, assuming a short processing time (up to about tens of seconds). The immediate response of a chatbot is a typical example.

It is not suited to large inputs that take several minutes to process, so it is incorrect.

BCorrect

Asynchronous inference

Correct. Asynchronous inference is a method that queues large payloads for processing and receives the result later. It suits inference on large inputs where processing may take time and an immediate synchronous response is not required. Transcribing long audio is a typical example.

CIncorrect

Serverless inference

Serverless inference is a method that handles intermittent traffic cost-effectively, but it is not designed around large payloads or long processing times. Processing of occasional, irregular requests is a typical example.

For large inputs that take several minutes, asynchronous inference is more suitable, so this is incorrect.

DIncorrect

Edge inference

Edge inference is a method that runs the model on a device or edge device, used for offline or low-latency on-site processing. Camera inspection on a factory line is a typical example.

It differs from this scenario's requirement of processing large inputs in a queue, so it is incorrect.

Key Takeaway

Grasp the characteristics of the correct answer, 'asynchronous inference'.
・It queues large payloads for processing and receives the result later.
・It suits large inputs where processing may take time and an immediate synchronous response is not required.
It handles sizes and processing times that are hard to manage with real-time inference (short, synchronous). Its purpose differs from serverless inference (intermittent traffic) and edge inference (on the device).