Can You Trust an AI to Catch Critical Vulnerabilities? Here’s What the Data Says

Share:

Can AI really detect critical vulnerabilities better than traditional security tools? This article explores real-world data, accuracy benchmarks, and limitations of AI in cybersecurity, helping you understand where AI strengthens vulnerability management and where human expertise remains essential.

The honest answer to the question is a yes but not without limits. Somewhere between “AI will replace security teams” and “AI is just a fancy scanner” lies the real picture. This article gives you the data and our experience to find it.

The Scale Problem That Made AI Necessary

In 2024, a record 40,009 Common Vulnerabilities and Exposures were published, 108 per day, up 38% on the year before. By mid-2025 that rate had risen to approximately 131 per day, with roughly 50,000 CVEs published across 2025 in total. FIRST’s 2026 Vulnerability Forecast projects a median of 59,427 CVEs this year alone with realistic scenarios reaching 70,000 to 100,000.

No security team, however skilled, can keep pace manually. With 70% of vulnerabilities rooted in software development flaws and cybercrime costs heading toward $10.5 trillion annually, AI moved from “nice to have” to operationally necessity.

image

Where AI Delivers: The Data

A systematic review of 29 studies (2019–2024) found AI consistently outperforms traditional rule-based tools on accuracy, speed, and scale. Headline findings:

  • AI security tools achieve 95% detection accuracy versus ~85% for traditional approaches, cutting incident response times by 30–50%.
  • 41% of zero-day vulnerabilities in 2025 were discovered through AI-assisted analysis, a class of threat largely out of reach for automated tools three years ago.
  • At its ICML 2025 introduction, the best autonomous agents on CVE-Bench could exploit up to 13% of real-world critical CVEs in production environments. In controlled conditions where CVE descriptions are provided, rates reach 87% (Fang et al., 2024). The gap between controlled performance and real-world deployment remains significant and is itself an important signal for any organisation evaluating AI security tools.
  • AI agents can now identify up to 77% of vulnerabilities in real-world software systems across multiple evaluation frameworks.

AI vs. Traditional Tools: at a Glance

CapabilityTraditional / Rule-BasedAI-Powered Detection
Detection accuracy~80–85%Up to 95%+
Zero-day discoveryVery limited41% of 2025 zero-days
False positive rateHighMedium (improving)
CVE volume handlingBreaks at scaleScales with data
Response timeHours to daysSeconds to minutes
Novel codebase accuracyGood on known patternsInconsistent on new code
Human oversight neededAlwaysAlways, but less time

Where AI Still Fails?

  • False positives

AI tools flag code that is not vulnerable, overwhelming analyst queues and burying real threats. Training data quality drives much of this, models often memorise benchmark patterns rather than learning generalisable detection logic. At scale, this creates a secondary crisis: an abundance of alerts that obscures genuine exposure.

  • Hallucinations

LLM-generated vulnerability reports are frequently polished but technically shallow. Triage teams at major bug bounty platforms identify them quickly. Confident-sounding wrong answers are a genuine risk for less experienced developers who lack the background to challenge the output.

  • Generalisation gaps
Also Read:  The 7 Fractures in Modern Penetration Testing & How AI Fixes Them

A model scoring well on a published benchmark can drop sharply on proprietary enterprise code with different naming conventions, architectural patterns, or vulnerability types not well-represented in its training set. Benchmark performance is not a proxy for real-world deployment performance.

  • AI-generated code is more vulnerable

Veracode’s 2025 State of Software Security analysis found AI-generated code has 2.7× higher vulnerability density than human-written code, with CVSS 7.0+ critical flaws appearing 2.5× more often. By June 2025, AI-assisted development was adding 10,000+ new security findings per month across studied repositories, a 10× jump from December 2024. The same AI revolution expanding your defences is also expanding your attack surface.

The Third-Party Blind Spot

86% of organisations use third-party packages with critical vulnerabilities in AI-driven environments. AI scanning tools are optimised for first-party code. They were not designed to assess the 50–150 vendors in your ecosystem, each with their own patch cycles, compliance posture, and exposure surfaces. As AI libraries and LLM API dependencies multiply, supply chain risk is growing faster than most programmes can track.

A Practical Trust Framework

AI security deployments succeed when human oversight is built in from the start. Use this as a quick checklist:

Deploy with more confidence when:

  • The tool is specialised for specific vulnerability classes, not claiming to catch everything.
  • Human analysts validate results before any action is taken.
  • False positive rates are disclosed and actively tracked.
  • AI is used to increase coverage, not to replace depth of review.
  • Third-party and supply chain risk is assessed separately from first-party code scanning.

Apply more scepticism when:

  • Accuracy claims come only from self-reported benchmarks on standard datasets.
  • The tool has no explainability layer just “vulnerability found,” with no reasoning.
  • AI-generated code is reviewed only by another AI tool with no human checkpoint.
  • Compliance obligations are treated as a one-time audit rather than continuous monitoring.

Ampcus Cyber, Compliance Compass & GRC Advisory

Most AI tools identify what is broken. We map those findings to your HIPAA, PCI DSS, SOC 2, ISO 27001, FedRAMP, or GDPR obligations, turning vulnerability data into compliant remediation roadmaps with 365-day advisory support.

The Verdict

Can you trust AI to catch critical vulnerabilities? Yes, at scale, at speed, and on classes of threats that traditional tools miss. But AI hallucinates, generalises poorly to novel codebases, and produces more vulnerable code than human developers.

AI-assisted attacks rose 72% in 2024. SentinelOne reports a 1,265% surge in phishing linked to generative AI tools. The 60,000 CVEs forecast for 2026 will not wait.

The organisations best positioned are those who treat AI as a force multiplier for human expertise, not a replacement for it, and who apply that logic across their full risk surface: first-party code, third-party vendors, and compliance obligations alike.

Ready to assess where your programme stands?

Ampcus Cyber offers a no-obligation security posture assessment covering vulnerability management, third-party risk, and compliance readiness, mapped to your industry’s specific threat landscape and regulatory obligations.

Request a free assessment.

SourceLink
FIRST 2026 Vulnerability Forecastfirst.org →
CVE-Bench / ICML 2025 (ArXiv)arxiv.org/abs/2503.17332 →
LLM One-Day CVE Exploitation — Fang et al. 2024arxiv.org/pdf/2404.08144 →
Veracode State of Software Security 2025veracode.com →
IBM X-Force 2025 Threat Intelligence Indexibm.com/reports/threat-intelligence →
IBM Cost of a Data Breach Report 2025ibm.com/reports/data-breach →
HackerOne Hacker-Powered Security Report 2025hackerone.com →
CISA AI Vulnerability Detection Pilot 2024cisa.gov →
CERT-EU Threat Landscape Report 2025enisa.europa.eu →
SentinelOne AI Cybersecurity Trends 2025sentinelone.com →
Zscaler ThreatLabz Phishing Report 2024zscaler.com →
2025 CVE Year-End Review — Mazemazehq.com →
AI Cyber Threat Statistics — Network Installersthenetworkinstallers.com →

Enjoyed reading this blog? Stay updated with our latest exclusive content by following us on Twitter and LinkedIn.

Ampcus Cyber
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Talk to an expert