Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios

Can AI agents conduct cyber-attacks autonomously? If AI agents can reliably execute multi-step attack chains with minimal human oversight, it could lower the skill barrier for unsophisticated threat actors, increase the sophistication of at