Anthropic Red Team
Tested safety systems on unreleased Claude models. Autumn 2024 plus Spring 2025 cohorts.
Got invited to Anthropic's red teaming program where I tested safety systems on unreleased Claude models. I was part of both the Autumn 2024 and Spring 2025 cohorts.
The job was straightforward. Try to break Constitutional Classifiers and other safety mitigations Anthropic built for ASL-3 deployment. I focused mostly on finding universal bypass vectors across CBRN domains (chemical, biological, radiological, nuclear).
Had access to internal research models before they went public, including prototype safety classifiers. Tested multiple model iterations and defense layers across program rounds. Work fed into Anthropic's broader effort to stress test frontier AI safety before release.




