Can You Trust an AI to Catch Critical Vulnerabilities? Here’s What the Data Says

The Scale Problem That Made AI Necessary

In 2024, a record 40,009 Common Vulnerabilities and Exposures were published, 108 per day, up 38% on the year before. By mid-2025 that rate had risen to approximately 131 per day, with roughly 50,000 CVEs published across 2025 in total. FIRST’s 2026 Vulnerability Forecast projects a median of 59,427 CVEs this year alone with realistic scenarios reaching 70,000 to 100,000.

No security team, however skilled, can keep pace manually. With 70% of vulnerabilities rooted in software development flaws and cybercrime costs heading toward $10.5 trillion annually, AI moved from “nice to have” to operationally necessity.

Where AI Delivers: The Data

A systematic review of 29 studies (2019–2024) found AI consistently outperforms traditional rule-based tools on accuracy, speed, and scale. Headline findings:

AI security tools achieve 95% detection accuracy versus ~85% for traditional approaches, cutting incident response times by 30–50%.
41% of zero-day vulnerabilities in 2025 were discovered through AI-assisted analysis, a class of threat largely out of reach for automated tools three years ago.
At its ICML 2025 introduction, the best autonomous agents on CVE-Bench could exploit up to 13% of real-world critical CVEs in production environments. In controlled conditions where CVE descriptions are provided, rates reach 87% (Fang et al., 2024). The gap between controlled performance and real-world deployment remains significant and is itself an important signal for any organisation evaluating AI security tools.
AI agents can now identify up to 77% of vulnerabilities in real-world software systems across multiple evaluation frameworks.

AI vs. Traditional Tools: at a Glance

Capability	Traditional / Rule-Based	AI-Powered Detection
Detection accuracy	~80–85%	Up to 95%+
Zero-day discovery	Very limited	41% of 2025 zero-days
False positive rate	High	Medium (improving)
CVE volume handling	Breaks at scale	Scales with data
Response time	Hours to days	Seconds to minutes
Novel codebase accuracy	Good on known patterns	Inconsistent on new code
Human oversight needed	Always	Always, but less time

Where AI Still Fails?

False positives

AI tools flag code that is not vulnerable, overwhelming analyst queues and burying real threats. Training data quality drives much of this, models often memorise benchmark patterns rather than learning generalisable detection logic. At scale, this creates a secondary crisis: an abundance of alerts that obscures genuine exposure.

Hallucinations

LLM-generated vulnerability reports are frequently polished but technically shallow. Triage teams at major bug bounty platforms identify them quickly. Confident-sounding wrong answers are a genuine risk for less experienced developers who lack the background to challenge the output.

Generalisation gaps

Also Read: The 7 Fractures in Modern Penetration Testing & How AI Fixes Them

A model scoring well on a published benchmark can drop sharply on proprietary enterprise code with different naming conventions, architectural patterns, or vulnerability types not well-represented in its training set. Benchmark performance is not a proxy for real-world deployment performance.

AI-generated code is more vulnerable

Veracode’s 2025 State of Software Security analysis found AI-generated code has 2.7× higher vulnerability density than human-written code, with CVSS 7.0+ critical flaws appearing 2.5× more often. By June 2025, AI-assisted development was adding 10,000+ new security findings per month across studied repositories, a 10× jump from December 2024. The same AI revolution expanding your defences is also expanding your attack surface.

The Third-Party Blind Spot

86% of organisations use third-party packages with critical vulnerabilities in AI-driven environments. AI scanning tools are optimised for first-party code. They were not designed to assess the 50–150 vendors in your ecosystem, each with their own patch cycles, compliance posture, and exposure surfaces. As AI libraries and LLM API dependencies multiply, supply chain risk is growing faster than most programmes can track.

A Practical Trust Framework

AI security deployments succeed when human oversight is built in from the start. Use this as a quick checklist:

Deploy with more confidence when:

The tool is specialised for specific vulnerability classes, not claiming to catch everything.
Human analysts validate results before any action is taken.
False positive rates are disclosed and actively tracked.
AI is used to increase coverage, not to replace depth of review.
Third-party and supply chain risk is assessed separately from first-party code scanning.

Apply more scepticism when:

Accuracy claims come only from self-reported benchmarks on standard datasets.
The tool has no explainability layer just “vulnerability found,” with no reasoning.
AI-generated code is reviewed only by another AI tool with no human checkpoint.
Compliance obligations are treated as a one-time audit rather than continuous monitoring.

Ampcus Cyber, Compliance Compass & GRC Advisory

Most AI tools identify what is broken. We map those findings to your HIPAA, PCI DSS, SOC 2, ISO 27001, FedRAMP, or GDPR obligations, turning vulnerability data into compliant remediation roadmaps with 365-day advisory support.

The Verdict

Can you trust AI to catch critical vulnerabilities? Yes, at scale, at speed, and on classes of threats that traditional tools miss. But AI hallucinates, generalises poorly to novel codebases, and produces more vulnerable code than human developers.

AI-assisted attacks rose 72% in 2024. SentinelOne reports a 1,265% surge in phishing linked to generative AI tools. The 60,000 CVEs forecast for 2026 will not wait.

The organisations best positioned are those who treat AI as a force multiplier for human expertise, not a replacement for it, and who apply that logic across their full risk surface: first-party code, third-party vendors, and compliance obligations alike.

Ready to assess where your programme stands?

Ampcus Cyber offers a no-obligation security posture assessment covering vulnerability management, third-party risk, and compliance readiness, mapped to your industry’s specific threat landscape and regulatory obligations.

Request a free assessment.

Source	Link
FIRST 2026 Vulnerability Forecast	first.org →
CVE-Bench / ICML 2025 (ArXiv)	arxiv.org/abs/2503.17332 →
LLM One-Day CVE Exploitation — Fang et al. 2024	arxiv.org/pdf/2404.08144 →
Veracode State of Software Security 2025	veracode.com →
IBM X-Force 2025 Threat Intelligence Index	ibm.com/reports/threat-intelligence →
IBM Cost of a Data Breach Report 2025	ibm.com/reports/data-breach →
HackerOne Hacker-Powered Security Report 2025	hackerone.com →
CISA AI Vulnerability Detection Pilot 2024	cisa.gov →
CERT-EU Threat Landscape Report 2025	enisa.europa.eu →
SentinelOne AI Cybersecurity Trends 2025	sentinelone.com →
Zscaler ThreatLabz Phishing Report 2024	zscaler.com →
2025 CVE Year-End Review — Maze	mazehq.com →
AI Cyber Threat Statistics — Network Installers	thenetworkinstallers.com →