Which service can run, in a serverless manner, the ETL process that extracts data from various data sources, transforms it into an analysis-friendly form, and loads it into another destination?

1 / 1
Select an answer
CorrectB

Explanation

Selecting a service that performs ETL serverlessly.

  • 1extracts dataThe Extract of ETL
  • 2transforms it into an analysis-friendly form, and loads it into another destinationThe Transform/Load of ETL = Glue
  • 3ETL processThe main use of Glue
AIncorrect

Amazon EMR

Amazon EMR is a big data processing platform that runs Spark and Hadoop on a cluster.

It can handle large-scale processing, but it involves configuring and managing a cluster, so the requirement for serverless, managed ETL is met by Glue, and this is incorrect.

BCorrect

AWS Glue

Correct. AWS Glue is a managed service that can run extract, transform, and load (ETL) in a serverless manner. It automatically discovers data sources, builds a data catalog, and runs transformation jobs to shape data into an analysis-friendly form. No server management is required.

CIncorrect

Amazon Data Firehose

Amazon Data Firehose is a service that continuously delivers streaming data to destinations such as S3 while transforming it.

It targets loading of continuously flowing data, so the requirement for batch-style extract, transform, and load (ETL) from various sources is met by Glue, and this is incorrect.

DIncorrect

AWS Lambda

AWS Lambda is a general-purpose serverless code execution service and can be used for small data transformations.

However, it has no ETL-specific mechanisms such as a data catalog, crawlers, or job management and has an execution time limit, so the requirement for managed ETL is met by Glue, and this is incorrect.

Key Takeaway

'ETL', 'extract, transform, load', and 'serverless' point to AWS Glue. SQL queries are Athena, visualization is QuickSight, and streaming is Kinesis, so roles are divided across the analytics pipeline.