A question about choosing the evaluation method …

A company performs a final safety check before releasing a generative AI chatbot. What is the name of the evaluation method in which experts take an attacker's perspective and deliberately try to elicit harmful output or vulnerabilities to surface risks before deployment?

1 / 1

Select an answer

CorrectA

Explanation

Question Overview

A question about choosing the evaluation method that surfaces vulnerabilities from an attacker's perspective.

Requirements to satisfy

1「take an attacker's perspective and deliberately try to elicit harmful output or vulnerabilities」Intentionally attempt an attack
2「surface risks before deployment」Grasping risks in advance = red teaming

Per-option explanation

ACorrect

Red teaming

Correct. Red teaming is an evaluation method in which experts take an attacker's perspective and deliberately try to elicit harmful output or vulnerabilities to surface risks before deployment. It is used to improve safety.

BIncorrect

Penetration testing

Penetration testing is a security test that verifies whether a network or system can be breached.

It is an infrastructure intrusion test, and its target differs from an evaluation that elicits a model's harmful output (red teaming), so this is incorrect.

CIncorrect

A/B testing

A/B testing is a method that splits production users across two variants and compares the results.

It is a method for validating effectiveness, not a method that surfaces risks from an attacker's perspective, so this is incorrect.

DIncorrect

Benchmark evaluation

Benchmark evaluation is a side-by-side comparison of performance on a standard dataset.

It is a performance measurement, not a safety inspection from an attacker's perspective, so this is incorrect.

Key Takeaway

Note the correct answer, red teaming.
- An evaluation method in which experts take an attacker's perspective and deliberately try to elicit harmful output or vulnerabilities to surface risks before deployment.
- Issues found can be addressed in advance, which improves safety.

Explanation

💡Key Takeaway

Related Links

Key Takeaway