A financial institution is planning to automatically route inquiry emails arriving at its support desk. Three years of past emails all retain a 'spam / not spam' decision assigned by staff, and the company will use this data to build a model that automatically classifies new emails. Which learning approach is the MOST suitable for this requirement?

1 / 1
Select an answer
CorrectB

Explanation

Select the learning approach suited to prediction with fully labeled data.

  • 1all retain aCorrect labels on all records = semi-supervised learning is unnecessary
  • 2automatically classifies new emailsA classification task that predicts a label from input = supervised learning
AIncorrect

Unsupervised learning

Unsupervised learning is an approach (such as clustering) that finds the structure or groups in data without using labels.

In this question, the staff decisions (correct labels) are available for all records, so unsupervised learning, which discards labels to discover structure, does not fit the requirement.

BCorrect

Supervised learning

This is correct. Supervised learning is an approach that learns the correspondence from pairs of inputs and correct labels and predicts the label of a new input. In this question, all past emails retain a 'spam / not spam' decision (label), so building a classification model using this as training data is optimal.

Concrete examples include predicting the price of a new property from past housing data (size and age with actual sale price), or judging 0-9 from images of handwritten digits with correct labels.

CIncorrect

Semi-supervised learning

Semi-supervised learning is an approach that combines a small amount of labeled data with a large amount of unlabeled data, and is effective when labeling is costly and not all records can be labeled.

In this question, decisions are retained for all records (all labeled), so there is no reason to choose semi-supervised learning to make up for missing labels.

DIncorrect

Reinforcement learning

Reinforcement learning is an approach that optimizes a policy of actions through trial and error and reward in an environment.

This question is about learning predictions from accumulated labeled data; there is no feedback loop of actions on an environment and rewards, so it is incorrect.

Key Takeaway

Choose the learning approach by the presence and amount of labels.
- All records labeled → supervised learning (classification/regression).
- Only some labeled → semi-supervised learning.
- No labels → unsupervised learning (clustering).
- Optimize actions with rewards → reinforcement learning.
The condition that 'decisions are retained for all records' is the point that rules out semi-supervised learning.