The Rise of AI Pentesting | Protea Security

The automated penetration testing market has grown up. Fast.

A few years ago, "automated pentesting" was mostly a rebranding exercise — scanners with a prettier dashboard and a higher price tag. That era is over. What's happening now is genuinely different, and if you're a CISO or a professional pentester, you should be paying attention.

The Tools That Actually Matter Right Now

The market has split into three distinct product categories, each solving a different problem.

Autonomous pentesting platforms simulate real attacker behavior end-to-end. The standout names here are Horizon3.ai NodeZero and Pentera. NodeZero has run over 170,000 production tests and is particularly strong at credential attacks, lateral movement, and Active Directory exploitation. Pentera targets large enterprises and focuses on validating exploitable attack paths in live environments. These aren't scanners. They actually exploit things.

AI-native web and API testers like Escape and XBOW have made serious progress on business logic vulnerabilities — the class of bugs that used to be exclusively the domain of skilled human testers. XBOW became the first AI agent to top the global HackerOne leaderboard, which is a meaningful real-world signal.

Breach and attack simulation (BAS) platforms like Picus and Cymulate do something slightly different: they test whether your defenses actually work, rather than finding new holes. Think of them as a sparring partner for your SOC rather than a red team.

Then there's the open-source layer: OWASP ZAP, Metasploit, Nuclei, PentestGPT. Flexible, transparent, and free — but they require real expertise to use well. They augment a skilled tester, they don't replace one.

Then Anthropic Dropped Mythos

On April 7, 2026, Anthropic announced Claude Mythos Preview. It's a general-purpose AI model — not a security product — that turned out to be so capable at finding and exploiting vulnerabilities that Anthropic decided not to release it publicly at all.

The numbers are hard to ignore.

Anthropic's previous flagship model, Opus 4.6, developed working Firefox JavaScript engine exploits twice out of several hundred attempts. Mythos Preview did it 181 times. On their internal OSS-Fuzz benchmark across roughly a thousand open-source repositories, Opus 4.6 achieved full control-flow hijack exactly zero times. Mythos achieved it on ten separate, fully patched targets.

It found a 27-year-old bug in OpenBSD. A 17-year-old RCE in FreeBSD's NFS server. Vulnerabilities in TLS, AES-GCM, and SSH implementations. Critical bugs in every major operating system and every major browser. Bugs that had survived decades of human review and millions of automated scans.

Where a skilled penetration tester might take weeks to develop a working exploit, Mythos did it in hours.

Instead of releasing it, Anthropic launched Project Glasswing — a restricted defensive coalition now expanded to roughly 200 organisations including AWS, Apple, Microsoft, Google, Cisco, and CrowdStrike. Partners use Mythos to scan critical codebases for vulnerabilities before attackers find them first. So far, they've surfaced more than 10,000 high- or critical-severity findings.

It's worth noting that not everyone accepts Anthropic's claims at face value. The flagship demonstration — a vulnerability in MIT Kerberos — was already in the model's training data, and independent researchers pointed out that several other open-weight models caught it too. Daniel Stenberg, the lead developer of curl, tested Mythos on his own codebase and called it incremental improvement rather than a categorical leap. The internal benchmarks are dramatic. Fully independent third-party validation of novel zero-day discovery is still limited.

That said, even the sceptics agree the direction of travel is clear.

What Automation Still Can't Do

Here's the thing that gets lost in the hype cycle: the Verizon DBIR 2025 found that 82% of exploited vulnerabilities involved human reasoning, exploit chaining, and contextual analysis. Automated tools — even Mythos-class ones — have real gaps.

Business logic. Understanding how this specific organisation's payment flow can be abused, or why this particular API violates a trust assumption that isn't documented anywhere — that requires understanding the gap between what a system does and what it's supposed to do. Models can't interview the product manager or read the implicit context baked into a decade of organisational decisions.

Creative chaining. The most damaging real-world breaches usually involve chaining together things that look boring individually. A misconfigured S3 bucket plus a low-severity SSRF plus a forgotten IAM policy equals a full account takeover. Humans are still better at developing the intuition for what's interesting.

Social engineering and physical security. No automated tool is walking into your office.

Strategic adversary simulation. Modelling how APT28 or Lazarus Group would actually target your organisation, given your industry, your vendors, and the current geopolitical context, is not something you get from any automated platform.

Regulatory requirements. TIBER-EU, DORA TLPT, and similar frameworks explicitly require human-led engagements. "An AI did it" is not a compliant answer for critical financial infrastructure, and regulators aren't rushing to change that.

Industry estimates put AI's current effectiveness at roughly 30-40% of routine pentesting tasks, and just 5-10% of complex scenarios. The Stanford 2025 study found that nearly 80% of human testers found a critical RCE that every AI agent tested missed entirely.

What We Do at Protea Security

We've been watching this space closely, and we've made a deliberate choice about how to position ourselves.

We practice what we call agentic pentesting — and we want to be specific about what that means, because the term gets thrown around loosely.

We use AI agents to handle the work that doesn't require a human: automated reconnaissance, repetitive scanning across large attack surfaces, correlating and deduplicating findings, mapping output to compliance frameworks, and generating the first pass of remediation guidance. This frees our consultants from the parts of the job that a machine does faster and more consistently anyway.

But we apply strict scopes to every agent we deploy. Defined targets. Defined techniques. Human review before anything consequential happens. We don't point an autonomous agent at a client environment and see what happens.

The reason is simple: automated tools generate findings. Human testers understand what those findings mean for a specific business. Our job is to do both — leverage automation where it adds speed and coverage, and apply human judgment where it adds insight.

Where This Is Heading (Honestly)

Anthropic estimates that Mythos-class capability will be broadly available from multiple AI labs within 6 to 18 months. When it lands in commercial tooling — and it will — the market will look different.

Discovery of known vulnerability classes will be almost fully automated. Exploit development for standard vulnerability patterns will be fast and cheap. Report generation, compliance mapping, remediation drafting — commoditised.

This creates two risks running in parallel. For defenders: a flood of findings that existing remediation workflows can't absorb. For attackers: the same capabilities, without the access controls. The median time from vulnerability disclosure to working exploit has already collapsed from 125 days in early 2025 to under half a day by April 2026. Mythos accelerates that further.

The pentesters who will feel this most are those whose work currently consists of running tools, writing up the output, and calling it a pentest. That model is under real pressure, and the timeline is shorter than most want to admit.

The ones who will be fine — and busy — are those who move up the stack: into adversary simulation, AI system security (prompt injection, agentic exploit chains, RAG poisoning), security architecture, and the strategic layer that models simply don't operate at.

Conclusion

Manual pentesting isn't going away. But the best version of it looks different than it did three years ago.

The right model is manual pentesters who leverage AI to spend less time on the repetitive work and more time on the parts that actually require a human: understanding your business, your threat model, your specific risk context, and the creative adversarial thinking that no benchmark has yet managed to automate.

For CISOs, the message is this: annual point-in-time pentests are increasingly inadequate on their own. Continuous automated validation plus periodic expert-led engagements is the baseline, not a premium option.

For pentesters: the tools are not your competition. The pentesters using the tools are. Get ahead of it now.

At Protea Security, that's how we work. If you want to understand what that looks like for your environment, we're easy to find at proteasecurity.com.

Sources

Petronella Cybersecurity News, Automated Penetration Testing Tools: 2026 Comparison (March 2026) — https://petronellatech.com/blog/automated-penetration-testing-tools-comparison-2026/
Escape, Top Automated Penetration Testing Tools (2026) — https://escape.tech/blog/top-automated-pentesting-tools/
Penligent, Best AI Pentesting Tools in 2026 (April 2026) — https://www.penligent.ai/hackinglabs/best-ai-pentesting-tools-in-2026/
General Analysis, Best Automated Penetration Testing Platforms in 2026 (May 2026) — https://generalanalysis.com/guides/best-automated-penetration-testing-tools
Aikido, Top 18 Automated Pentesting Tools Every DevSecOps Team Should Know — https://www.aikido.dev/blog/top-automated-penetration-testing-tools
StackHawk, Best AI Pentesting Tools in 2026: Top Picks Compared — https://www.stackhawk.com/blog/ai-pentesting-tools/
Stingrai, Top 10 Automated Pentesting Tools 2026 (June 2026) — https://www.stingrai.io/blog/top-10-automated-penetration-testing-tools-2026
Redfox Cybersecurity, Best AI Pentesting Tools in 2026: Hands-On Comparison (May 2026) — https://www.redfoxsec.com/blog/best-ai-pentesting-tools-in-2026-a-hands-on-comparison
Security Online, 5 Best Autonomous Penetration Testing Tools in 2026 (April 2026) — https://securityonline.info/5-best-autonomous-penetration-testing-tools-in-2026/
Gartner Peer Insights, Best Adversarial Exposure Validation Reviews 2026 — https://www.gartner.com/reviews/market/adversarial-exposure-validation
Astra Security, Automated vs Manual Penetration Testing: Which One Do You Need? (January 2026) — https://www.getastra.com/blog/security-audit/automated-vs-manual-penetration-testing/
FreeCodeCamp, Penetration Testing: Services vs Automated Platforms: What's Better in 2026? (March 2026) — https://www.freecodecamp.org/news/penetration-testing-services-vs-automated-platforms-what-is-better
Matproof, Automated vs Manual Penetration Testing: A Complete Comparison for 2026 (March 2026) — https://matproof.com/blog/automated-vs-manual-penetration-testing
InfosecOne, AI vs Penetration Testers 2026: Will Automated Testing Replace Your Penetration Testing Job (November 2025) — https://infosecone.com/blog/ai-impact-penetration-testing-careers-and-job-market/
ioSENTRIX, AI-Driven Penetration Testing in 2026: Benefits, Limits, and the Hybrid Future (November 2025) — https://iosentrix.com/blog/ai-driven-penetration-testing
Simbian, AI Penetration Testing vs. Manual Pentesting: Which is Right for You in 2026? (March 2026) — https://simbian.ai/blog/ai-penetration-testing-vs-manual-pentesting-which-is-right-for-you-in-2026
DeepStrike, Manual vs Automated Penetration Testing: The 2025 Guide — https://deepstrike.io/blog/manual-vs-automated-penetration-testing
Anthropic, Assessing Claude Mythos Preview's Cybersecurity Capabilities (April 2026) — https://red.anthropic.com/2026/mythos-preview/
AISI, Our Evaluation of Claude Mythos Preview's Cyber Capabilities (April 2026) — https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities
Bishop Fox, Anthropic's Claude Mythos Preview: The AI Cybersecurity Inflection Point (April 2026) — https://bishopfox.com/blog/anthropics-claude-mythos-preview-the-ai-cybersecurity-inflection-point
Built In, The Conversation About Claude Mythos Misses a Bigger Risk (June 2026) — https://builtin.com/articles/claude-mythos-analysis
ArmorCode, Anthropic's Claude Mythos and What it Means for Security (April 2026) — https://www.armorcode.com/blog/anthropics-claude-mythos-and-what-it-means-for-security
Tahir via Medium, Assessing Anthropic Claude Mythos Preview's Cybersecurity Capabilities (April 2026) — https://medium.com/@tahirbalarabe2/assessing-anthropic-claude-mythos-previews-cybersecurity-capabilities-251a4e0a2137
Petronella Technology Group, Claude Mythos: Anthropic's April 2026 AI Preview (May 2026) — https://petronellatech.com/blog/claude-mythos-guide-2026/
Anthropic, Expanding Project Glasswing — https://www.anthropic.com/news/expanding-project-glasswing
Post Quantum, Anthropic's Mythos Preview and the End of a Twenty-Year Cybersecurity Equilibrium (April 2026) — https://postquantum.com/security-pqc/anthropic-mythos-preview-ai-offensive-security/
Anthropic, Project Glasswing: Securing Critical Software for the AI Era (April 2026) — https://www.anthropic.com/glasswing

The Rise of AI Pentesting: What It Means for CISOs and the People Who Hack for a Living

Automated penetration testing has grown up. What does AI pentesting mean for CISOs, pentesters, and the future of offensive security?

The Tools That Actually Matter Right Now

Then Anthropic Dropped Mythos

What Automation Still Can't Do

What We Do at Protea Security

Where This Is Heading (Honestly)

Conclusion

Sources

Need this applied to your environment?

Keep reading

Cyber Threat Radar review: threat intelligence for the Netherlands and Belgium

Vibe coded and vulnerable

Password lists for security testing: beyond RockYou.txt

Ready to see your organization through an attacker's eyes?