A company wants to deploy a trained model for production use as low-latency real-time inference while minimizing the overhead of building and operating servers themselves. Which deployment approach BEST meets this requirement?

1 / 1
Select an answer
CorrectC

Explanation

Identify the deployment method that provides real-time inference with low operational overhead.

  • 1minimizing the overhead of building and operating servers themselvesDelegate to a managed service
  • 2low-latency real-time inferenceOnline inference = SageMaker real-time endpoint
AIncorrect

Build and operate an inference server on EC2 independently.

This is incorrect. Building on EC2 can achieve real-time inference, but the company must build, patch, and scale the server itself. This does not meet the requirement of 'minimizing operational overhead.'

BIncorrect

Pre-compute inference results and store them in S3.

This is incorrect. Pre-computed result delivery is a technique for limited input patterns, but it cannot return predictions on the fly for unknown inputs. It does not meet the real-time inference requirement.

CCorrect

Deploy to a SageMaker real-time inference endpoint.

This is correct. SageMaker real-time inference endpoints delegate infrastructure building and operations to a managed service while providing low-latency online inference as an API. They reduce operational overhead.

DIncorrect

Run a SageMaker batch transform job on a scheduled basis.

This is incorrect. Batch transform is managed and has low operational burden, but it processes data in bulk offline. It does not meet the real-time inference requirement of returning an immediate response.

Key Takeaway

The option that satisfies BOTH 'minimizing operational overhead' AND 'low-latency real-time' is the SageMaker real-time inference endpoint. Self-built EC2 (real-time is possible but adds operational burden) and batch transform (managed but offline) each satisfy only one of the requirements — cutting them with the requirements is the key.