Agentic Code Verification — Close the Review Gap Now
AI agents write 41% of your code. 96% of engineers don't trust it. This stack closes the verification gap by running checks inside the agent loop.
The stack (3 tools)
Sub-second vulnerability feedback during generation, not after PR creation
Closes the agent's test loop against a real DOM — self-healing selectors included
Pairs with Agentic Analysis to make verification → remediation a closed loop
TL;DR
- AI-generated PRs contain 1.7× more issues than human-written ones (10.83 vs 6.45) — and 48% of teams still verify manually, post-generation
- This stack runs verification during agent generation: SonarQube CLI as pre-commit hook, SonarQube MCP Server in the agent loop, Playwright MCP for live browser checks
- All three tools operate below the PR boundary — the only place where verification can match agent output velocity
- What it does not do: replace CI/CD, cover all languages (Java, JS/TS, Python only for now), or catch incorrect business logic assertions
Generation is now automated. Verification is still manual. That gap is the entire problem — and the fix is not adding another post-PR review step. It is moving verification inside the agent’s own tool loop, where it can run at the same speed the agent generates.
On March 3, 2026, Sonar formalized this model at their Summit with the AC/DC framework — Agent Centric Development Cycle — structured as four continuous stages: Guide, Generate, Verify, Solve. The three tools in this stack address stages three and four directly. Everything here operates below the PR boundary. That is not a design choice, it is the constraint: anything that catches bugs post-PR is already too late in an agentic workflow.
Stack Overview
This stack closes the verification gap by treating quality checks, security scans, and browser tests as tools the agent itself can invoke — not as gates a human operates after the agent finishes.
- SonarQube CLI — Pre-commit hook that scans every file the agent touches before it leaves the local environment; includes secrets detection
- SonarQube Agentic Analysis (MCP Server) — Wires SonarQube’s static analysis directly into the agent’s tool loop for sub-second vulnerability feedback during generation
- Playwright MCP — Connects the agent to a live browser session so it can run its own E2E assertions and self-heal broken selectors without a human in the chair
- SonarQube Remediation Agent — Generates fix proposals for issues flagged during verification, closing the Verify → Solve loop automatically
The combination solves something that no individual tool addresses: the agent generates code, verifies it against real analysis and a live DOM, gets concrete fix proposals, and iterates — all in one loop. The human reviews a PR that has already been through multiple automated verification passes.
flowchart LR
A[Agent generates code] --> B[SonarQube CLI\npre-commit hook]
B -->|Issues detected| E[Remediation Agent\nfix proposals]
B -->|Clean| C[SonarQube MCP Server\nin-loop analysis]
C -->|Vulnerabilities| E
C -->|Clean| D[Playwright MCP\nlive browser verification]
D -->|Selector drift / regressions| E
E --> A
D -->|All assertions pass| F[PR created]
The verification numbers that justify this architecture are damning. Sonar surveyed 1,100 developers and found AI tools now contribute to 41% of all code written — with teams expecting that figure to hit 65% within two years. AI-generated pull requests carry 10.83 issues on average versus 6.45 for human-written code. Technical debt is up 30–41% across teams that adopted AI coding tools. And 96% of developers say they do not trust AI-generated code. Yet only 48% verify before committing. You have handed agents the keys to the codebase and kept the review burden on humans who cannot match the output volume.
Components
SonarQube CLI (Current Stable: 10.x)
SonarQube’s CLI scanner runs static analysis locally against any codebase without requiring a full CI pipeline invocation. In this stack, it runs as a pre-commit hook — scanning every file the agent writes or modifies before that file leaves the local environment.
The use case that matters most here is secrets detection. Agents like Claude Code and Cursor build context by reading the local environment, including files that may contain API keys, session tokens, or database credentials. That context can surface in prompt history and, in the wrong configuration, in generated code. The CLI catches this at source with latency under 100ms per file — fast enough that it does not interrupt the agent loop.
Why this tool in this stack:
- Runs without SonarQube Cloud access — available at every tier, including free self-hosted
- Sub-100ms per-file latency is low enough to wire into a git pre-commit hook without degrading agent UX
- Secrets detection is language-agnostic — catches leaked credentials regardless of which language the agent is generating
| Tool | Difference | Switch if |
|---|---|---|
| Gitleaks | Secrets-only scanning, simpler setup | You only need secrets detection and do not need code quality analysis |
| Semgrep CLI | More flexible rule writing, open-source rules | You need custom analysis rules and static analysis is your primary concern |
| Trivy | Container and dependency scanning, not code | Your threat model is supply chain, not generated code quality |
SonarQube Agentic Analysis via MCP Server (Beta)
The MCP Server integration wires SonarQube’s full static analysis engine into the agent’s tool loop. When an agent invokes the SonarQube MCP tool, it gets back the same vulnerability and logic error analysis that would appear in a full CI scan — but in seconds, during generation, not minutes after a pipeline completes.
This is the critical architectural difference from traditional CI-gated quality checks. In a standard pipeline, the agent generates code, creates a PR, the pipeline runs, issues surface in review comments, and a human decides whether to fix or ship. In this stack, the agent queries SonarQube mid-generation, gets concrete feedback on what it just wrote, and iterates before the PR exists. The human sees a PR that has already been through multiple analysis passes.
Current language coverage: Java, JavaScript/TypeScript, and Python. .NET and C/C++ are on the roadmap. If your agent is primarily generating code in those unsupported languages, the CLI path still gives you secrets detection, but the in-loop static analysis is not yet available.
Why this tool in this stack:
- Sub-second feedback during generation matches agent velocity — a minutes-long CI feedback loop does not
- MCP architecture means compatible agents (Claude Code, Cursor, GitHub Copilot) discover and invoke the tool without framework-specific wiring
- Currently in beta for SonarQube Cloud Enterprise; CLI path covers all tiers
| Tool | Difference | Switch if |
|---|---|---|
| CodeRabbit | PR-level review, not in-loop; broader language support | You are comfortable with post-generation review and need wider language coverage |
| DeepCode / Snyk Code | Strong on security, weaker on general code quality | Security vulnerabilities are your primary concern, code quality is secondary |
| GitHub Advanced Security | Deeply integrated with GitHub Actions, not agent tool loops | Your primary surface is CI/CD, not real-time agent feedback |
Playwright MCP (v0.8.1)
Playwright MCP connects the agent to a real browser session via the Model Context Protocol. The agent can navigate pages, run selector assertions against a live DOM, capture screenshots, and get back structured DOM snapshots it can reason about.
The self-healing part is what makes this actually useful in agentic workflows rather than just technically interesting. When the agent generates UI code and the test selectors it wrote fail against the live DOM — because the generated markup differed from what the agent expected — Playwright MCP returns a DOM snapshot the agent can inspect to rewrite the selectors. The agent patches its own tests without human intervention. GitHub Copilot’s coding agent ships with Playwright MCP wired in by default; Claude Code and Cursor users configure it as an MCP server.
Be clear-eyed about what this does and does not cover. Playwright MCP stops selector drift and catches regressions against real browser state. It does not validate business logic. If the agent writes tests that assert incorrect behavior — tests that pass but test the wrong thing — Playwright MCP will not catch that. You still need a human reviewing test coverage for correctness, not just test execution for passage.
Why this tool in this stack:
- Apache 2.0 licensed — no cost, no vendor dependency
- DOM snapshot feedback loop enables agent self-correction without a human approving each fix
- GitHub Copilot integration confirms this is shipping as production infrastructure, not prototype
| Tool | Difference | Switch if |
|---|---|---|
| Cypress | Better developer UX for manual test writing, no MCP integration | Your team writes tests manually and agent loop integration is not a requirement |
| Puppeteer | Lower-level control, no structured MCP interface | You need browser automation beyond test assertions |
| Selenium | Broader browser compatibility, significantly more setup overhead | You need IE/Edge legacy compatibility or are constrained to a Selenium grid |
SonarQube Remediation Agent (Public Beta)
The Remediation Agent entered public beta on March 3, 2026, after a closed beta that began February 11, 2026. It pairs with SonarQube Agentic Analysis to close the Verify → Solve loop: when analysis flags an issue, the Remediation Agent generates a concrete fix proposal rather than leaving the coding agent to reason about a raw error message.
This matters because the quality of a fix proposal depends on how well the fixer understands SonarQube’s issue taxonomy. A general-purpose coding agent will attempt a fix, but its fix may address the symptom rather than the root cause. The Remediation Agent is trained on SonarQube’s issue categories and generates proposals that SonarQube’s own analysis then validates — which means the agent coding loop does not escape without a confirmed-clean scan.
The Remediation Agent is currently included in SonarQube Cloud Teams and Enterprise annual plans. It is not available on the free tier.
Why this tool in this stack:
- Closes the Verify → Solve loop without returning control to a human for every detected issue
- Fix proposals are scoped to SonarQube’s issue taxonomy — higher signal-to-noise than a general-purpose LLM fix
| Tool | Difference | Switch if |
|---|---|---|
| GitHub Copilot Autofix | Integrated into GitHub PR comments, not in-loop; broader language support | You want post-PR automated fixes and are not using SonarQube for analysis |
| Manual remediation | No cost, full control | Issues require architectural judgment that automated fixes will get wrong |
Setup Walkthrough
Step 1: Install SonarQube CLI and confirm it runs
Install the SonarQube CLI scanner. This is the foundation of the pre-commit hook — nothing else in the setup matters if the scanner does not execute cleanly.
# Install sonar-scanner (macOS via Homebrew; adjust for your OS)
brew install sonar-scanner
# Verify installation
sonar-scanner --version
Step 2: Create sonar-project.properties in your repo root
Configure the scanner to point at your SonarQube instance and define which sources to scan. The token is pulled from an environment variable — never hardcode it here.
# sonar-project.properties
sonar.projectKey=your-project-key
sonar.sources=src
sonar.language=js
sonar.sourceEncoding=UTF-8
sonar.host.url=https://sonarcloud.io
sonar.token=${SONAR_TOKEN}
Step 3: Create the pre-commit hook script
Wire the CLI scan as a git pre-commit hook. Every commit the agent produces — not just human commits — triggers this scan before changes enter version history.
# .git/hooks/pre-commit
#!/bin/sh
set -e
echo "Running SonarQube pre-commit scan..."
sonar-scanner \
-Dsonar.analysis.mode=preview \
-Dsonar.issueThreshold=MAJOR
echo "SonarQube scan passed."
Step 4: Set required environment variables
Store all credentials in a .env.example. Load them into your shell before running any agent session — the pre-commit hook and both MCP servers depend on these values.
# .env.example
SONAR_TOKEN=your_sonarqube_token_here
SONAR_HOST_URL=https://sonarcloud.io
PLAYWRIGHT_MCP_PORT=8080
Step 5: Add SonarQube MCP Server to your agent configuration
For Claude Code, add the SonarQube MCP Server to .claude/mcp_servers.json. For Cursor, add it to .cursor/mcp.json. This makes the analysis tool available to the agent during generation — the agent can invoke it mid-task without any prompt engineering.
{
"mcpServers": {
"sonarqube": {
"command": "npx",
"args": ["-y", "@sonarqube/mcp-server"],
"env": {
"SONAR_TOKEN": "${SONAR_TOKEN}",
"SONAR_HOST_URL": "${SONAR_HOST_URL}"
}
}
}
}
Step 6: Add Playwright MCP Server to the same configuration
Extend the MCP config to include Playwright alongside SonarQube. The agent now has both static analysis and live browser verification available as tools in the same session.
{
"mcpServers": {
"sonarqube": {
"command": "npx",
"args": ["-y", "@sonarqube/mcp-server"],
"env": {
"SONAR_TOKEN": "${SONAR_TOKEN}",
"SONAR_HOST_URL": "${SONAR_HOST_URL}"
}
},
"playwright": {
"command": "npx",
"args": ["-y", "@playwright/mcp@latest"],
"env": {
"PLAYWRIGHT_HEADLESS": "true"
}
}
}
}
Step 7: Verify the full stack with a test generation run
Before running a real agent session, confirm all three verification layers are reachable. Then run a test prompt that generates a component and a test — you want to see the pre-commit hook fire, the MCP analysis return, and Playwright run selectors before considering the setup complete.
# Confirm MCP servers are reachable
npx @sonarqube/mcp-server --health-check
npx @playwright/mcp@latest --version
# Then run your agent with a test prompt that generates a component and a test
# Verify: pre-commit hook fires, MCP analysis returns, Playwright runs selectors
Pricing
| Component | License | Free Tier | Paid from | Note |
|---|---|---|---|---|
| SonarQube CLI | Commercial (free tier) | Yes — unlimited local scans | SonarQube Cloud Teams ~$10/mo | Free tier supports local scanning; MCP Server requires Enterprise |
| SonarQube MCP Server | Beta (Enterprise) | No | Enterprise plan required | Beta; pricing not publicly fixed as of April 2026 |
| SonarQube Remediation Agent | Beta (Teams+) | No | Teams annual plan | Public beta; included in Teams and Enterprise annual plans |
| Playwright MCP | Apache 2.0 | Yes — fully free | N/A | No paid tier; open source |
Hosting costs are not covered in the table. Self-hosted SonarQube requires a machine capable of running the SonarQube instance — plan for a minimum 2 vCPU / 4GB RAM server. SonarQube Cloud eliminates this but introduces the Enterprise tier requirement for MCP Server access.
The SonarQube MCP Server is currently in beta for SonarQube Cloud Enterprise only. If you are on a free or Teams plan, use the CLI path for in-loop analysis. You get secrets detection and code quality scanning; you lose the native MCP tool discovery for agents that support it.
The most cost-efficient path for a small team: SonarQube CLI (free) as the pre-commit hook, Playwright MCP (free) for browser verification, and SonarQube Cloud Teams (~$10/mo range) for the Remediation Agent. The MCP Server upgrade makes sense when your agents are generating enough code that manual MCP configuration per-project becomes overhead you cannot absorb.
When This Stack Fits
- Your agent is generating PRs faster than your team can review them. This is the exact problem the stack is designed for. If your review queue is growing, verification-as-you-generate is the only mechanism that scales with agent output velocity.
- You are working primarily in Java, JavaScript/TypeScript, or Python. SonarQube Agentic Analysis covers these three languages. If your codebase is mostly TypeScript with some Python — which describes a significant share of modern web teams — you have full coverage.
- Secrets leakage is a real concern in your agent workflow. If your agents are reading local config files,
.envfiles, or credentials stored in dotfiles for context, the CLI pre-commit hook is non-negotiable. It catches these before they enter version history, where they become a remediation problem instead of a prevention problem. - You are already running Playwright for E2E testing. Playwright MCP is additive — it gives your existing Playwright setup an MCP interface the agent can call. There is no migration cost if Playwright is already in your stack.
- Your team needs an audit trail of verification passes. SonarQube Cloud provides per-PR quality gate history, which becomes compliance-relevant when AI-assisted development needs to be defensible to a security team or auditor.
When This Stack Does Not Fit
- Your primary languages are .NET or C/C++. SonarQube Agentic Analysis does not yet cover these. The CLI still handles secrets detection, but you lose the in-loop static analysis that is the stack’s core value. Wait for the language expansion, or evaluate Semgrep as an interim alternative for those languages.
- You want to eliminate PR review entirely. This stack does not do that. It reduces the review burden by ensuring every PR has been through multiple automated verification passes — but a human still needs to confirm that tests are asserting the right behavior, not just that they pass.
- Your CI/CD pipeline is already catching issues fast enough. If your pipeline runs in under two minutes and your agents are not generating enough volume to create a review backlog, this stack adds configuration overhead without meaningful benefit. The pain point it solves is specifically high-velocity agentic generation, not slow pipelines.
- You need multi-language secrets scanning across a polyglot codebase. Gitleaks or Trivy will give you broader coverage than SonarQube CLI if secrets detection is your only concern and static analysis is out of scope.
The Take
The verification gap is not a tooling problem — it is a process assumption that has not caught up with agentic reality. Most teams designed their review process for human-authored commits, where generation was slow enough that review could follow it. Agents broke that assumption in 2025, and teams are still running the old process.
What I care about in this stack is the latency model. The SonarQube CLI runs in under 100ms per file. The MCP Server returns analysis in seconds. Playwright MCP completes a selector assertion in the time it takes a browser to load a page. None of these are blocking the agent in any meaningful sense — they are verification layers that run at agent speed, not human speed. That distinction matters more than any individual feature in any individual tool.
The AC/DC framework’s four stages — Guide, Generate, Verify, Solve — are the right mental model. The mistake most teams make is treating Verify and Solve as post-generation steps. This stack puts both inside the generation loop. The agent verifies its own output. The agent gets concrete fix proposals. The agent iterates. The PR that surfaces at the end of that loop is not the first draft — it is the output after multiple automated passes.
That does not mean humans are out of the loop. It means humans are reviewing something that has already been substantially filtered. The 96% of developers who say they do not trust AI-generated code are right to be skeptical — 10.83 issues per PR is a real number. Skepticism without tooling is just anxiety. This stack converts that skepticism into an automated check that runs faster than any human reviewer could.