A company performs a final safety check before releasing a generative AI chatbot. What is the name of the evaluation method in which experts take an attacker's perspective and deliberately try to elicit harmful output or vulnerabilities to surface risks before deployment?

1 / 1
Select an answer
CorrectA

Explanation

A question about choosing the evaluation method that surfaces vulnerabilities from an attacker's perspective.

  • 1take an attacker's perspective and deliberately try to elicit harmful output or vulnerabilitiesIntentionally attempt an attack
  • 2surface risks before deploymentGrasping risks in advance = red teaming
ACorrect

Red teaming

Correct. Red teaming is an evaluation method in which experts take an attacker's perspective and deliberately try to elicit harmful output or vulnerabilities to surface risks before deployment. It is used to improve safety.

BIncorrect

Penetration testing

Penetration testing is a security test that verifies whether a network or system can be breached.

It is an infrastructure intrusion test, and its target differs from an evaluation that elicits a model's harmful output (red teaming), so this is incorrect.

CIncorrect

A/B testing

A/B testing is a method that splits production users across two variants and compares the results.

It is a method for validating effectiveness, not a method that surfaces risks from an attacker's perspective, so this is incorrect.

DIncorrect

Benchmark evaluation

Benchmark evaluation is a side-by-side comparison of performance on a standard dataset.

It is a performance measurement, not a safety inspection from an attacker's perspective, so this is incorrect.

Key Takeaway

Note the correct answer, red teaming.
- An evaluation method in which experts take an attacker's perspective and deliberately try to elicit harmful output or vulnerabilities to surface risks before deployment.
- Issues found can be addressed in advance, which improves safety.