In an ML project, the team is processing cleansed raw data to design new input variables that help prediction (for example, deriving "day of week" or a "start-of-month flag" from a date). Which step does this work correspond to?

1 / 1
Select an answer
CorrectA

Explanation

A question about choosing the work of creating useful input variables for prediction.

  • 1new input variables that help predictionCreate features that are easier for the model to learn from
  • 2design= feature engineering
ACorrect

Feature engineering

Correct. Feature engineering is the work of processing raw data to design and create input variables (features) that help prediction. For example, deriving a day of week or start-of-month flag from a date, extracting a prefecture from an address, or aggregating the number of purchases in the last 30 days from purchase history creates new variables that are easier for the model to learn from. Good features greatly influence model accuracy.

BIncorrect

Data preprocessing (cleansing)

Data preprocessing (cleansing) is the work of tidying data by imputing missing values and unifying formats. For example, filling missing values with the mean, unifying inconsistent notations, or removing duplicate rows and obvious outliers cleans up dirty data.

Its purpose differs from the work of designing and creating new input variables, so this is incorrect.

CIncorrect

Model evaluation

Model evaluation is the work of measuring the performance of a trained model.

It is a different stage from the work of designing and creating input variables, so this is incorrect.

DIncorrect

Monitoring

Monitoring is the work of observing the behavior of a model after deployment.

It is a different stage from the work of designing and creating input variables, so this is incorrect.

Key Takeaway

Note the place of the correct answer, feature engineering.
- It processes raw data to design and create input variables (features) that help prediction.
- It differs from data preprocessing (tidying) and is often done after tidying.
- Good features greatly influence model accuracy.
Model evaluation (measuring performance) and monitoring (production monitoring) are at different stages.