An ML team has finished collecting data and is planning the work for the data preparation phase before moving into model training. While checking the mapping between each lifecycle phase and its work, which tasks are appropriate for this phase? (Choose TWO.)

1 / 1
Select all that apply
CorrectC, D

Explanation

Choosing TWO tasks from the data preparation phase.

  • 1the data preparation phaseThe stage of getting data into a usable state before training begins
  • 2which tasks are appropriateCleansing and feature engineering apply
AIncorrect

Monitoring the model in the production environment

Monitoring in production is work in the operations phase, where model performance is monitored after deployment.

It is not data preparation work, so it is incorrect.

BIncorrect

Serving inference results to users via an API

Serving inference results via an API is production operations work after the model has been deployed.

It is not data preparation work, so it is incorrect.

CCorrect

Data cleansing, such as imputing missing values and handling outliers

Correct. Data cleansing, such as imputing missing values and handling outliers, is data preparation work that gets the data ready to be used for training.

DCorrect

Feature engineering to create features that are useful for prediction

Correct. Feature engineering is work included in data preparation that creates input variables (features) useful for prediction from raw data.

EIncorrect

Tuning the model hyperparameters

Tuning hyperparameters is work performed in the model training and tuning phase.

It is part of the ML lifecycle, but it is not work in the data preparation phase before training begins, so it is incorrect.

Key Takeaway

Remember the machine learning lifecycle phases and the work done in each.
Data collection: gather the data to be used for training.
Data preparation: get the data into shape through cleansing (imputing missing values, handling outliers, standardizing formats) and feature engineering (creating useful input variables).
Model training: train the model with an algorithm and tune the hyperparameters.
Evaluation: check model performance with metrics such as accuracy.
Deployment: deploy the model to the production environment.
Monitoring (operations): monitor the model's behavior and accuracy in production and serve inference results via an API.
In this question, 'data cleansing and feature engineering' belong to the data preparation phase. Production monitoring and inference API serving belong to the post-deployment operations phase, and hyperparameter tuning belongs to the model training phase.