Red teaming
Correct. Red teaming is an evaluation method in which experts take an attacker's perspective and deliberately try to elicit harmful output or vulnerabilities to surface risks before deployment. It is used to improve safety.
A company performs a final safety check before releasing a generative AI chatbot. What is the name of the evaluation method in which experts take an attacker's perspective and deliberately try to elicit harmful output or vulnerabilities to surface risks before deployment?
A question about choosing the evaluation method that surfaces vulnerabilities from an attacker's perspective.
Red teaming
Correct. Red teaming is an evaluation method in which experts take an attacker's perspective and deliberately try to elicit harmful output or vulnerabilities to surface risks before deployment. It is used to improve safety.
Penetration testing
Penetration testing is a security test that verifies whether a network or system can be breached.
It is an infrastructure intrusion test, and its target differs from an evaluation that elicits a model's harmful output (red teaming), so this is incorrect.
A/B testing
A/B testing is a method that splits production users across two variants and compares the results.
It is a method for validating effectiveness, not a method that surfaces risks from an attacker's perspective, so this is incorrect.
Benchmark evaluation
Benchmark evaluation is a side-by-side comparison of performance on a standard dataset.
It is a performance measurement, not a safety inspection from an attacker's perspective, so this is incorrect.
Note the correct answer, red teaming.
- An evaluation method in which experts take an attacker's perspective and deliberately try to elicit harmful output or vulnerabilities to surface risks before deployment.
- Issues found can be addressed in advance, which improves safety.