AI tools are rapidly becoming part of the smart contract security workflow. Some teams experiment with them internally, while others are beginning to rely on them as a first line of defense before manual audits.
But a fundamental question remains: how well do these tools actually perform when analyzing real codebases?
To explore that question, we ran a comparison between two systems:
- AuditAgent — a static AI auditing tool focused on vulnerability detection
- Azimuth — a behavioral analysis engine designed to simulate exploit paths and protocol interactions
Rather than evaluating a single repository, we tested both systems across four different codebases, each representing a different category of smart contract architecture.
Repositories Analyzed
- PrimeVault — DeFi protocol (view report)
- LendMachine — Lending protocol (view report)
- Murky — Utility library (view report)
- BaseTap — Payment protocol (view report)
Each repository was analyzed independently by both systems and compared across several dimensions:
- vulnerability detection
- exploit modeling
- protocol reasoning
- workflow analysis
- code-quality observations
The results were instructive.
Methodology
For each repository we:
- Ran AuditAgent to generate a full audit report
- Ran Azimuth to generate behavioral exploit hypotheses
- Compared the outputs across several dimensions:
- number of findings
- exploit depth
- cross-contract reasoning
- economic attack modeling
- operational failure modes
Importantly, these were unmodified repositories analyzed in their original state.
Repository 1 — PrimeVault
Full analysis: view Azimuth report
PrimeVault is a DeFi protocol with vault mechanics and capital flows between multiple contracts.
This kind of architecture introduces several attack surfaces:
- asset accounting
- permission controls
- economic manipulation
- cross-contract interactions
AuditAgent correctly identified several isolated contract risks and best-practice violations.
Azimuth expanded those issues into realistic exploit paths, including scenarios where attackers could manipulate protocol flows across multiple contracts.
This is a recurring theme: static scanners can identify risky code patterns, but often stop short of modeling how those risks translate into real attacks.
Repository 2 — LendMachine
Full analysis: view Azimuth report
LendMachine is a simplified lending protocol with collateral, borrowing, liquidation, and reward mechanics.
These systems are especially sensitive to economic exploits, where small logic flaws can create large financial consequences.
Both tools detected a configuration risk around interest rate control.
AuditAgent noted that the interest rate setter lacked access control. Azimuth went further and modeled several exploit scenarios:
- artificially inflating interest rates to force liquidations
- temporarily setting rates to zero to disable accrual
- manipulating borrower health factors during liquidation windows
Additionally, Azimuth identified issues in reward accounting synchronization, which could lead to phantom reward accumulation under certain conditions.
These types of vulnerabilities are difficult for static scanners to detect because they require reasoning about state transitions across multiple transactions.
Repository 3 — Murky
Full analysis: view Azimuth report
Murky is not a protocol at all.
It is a Merkle tree utility library used primarily for testing and proof generation.
That makes it an interesting control case.
Because Murky has:
- no capital flows
- no incentives
- no multi-contract architecture
...the number of meaningful attack surfaces is naturally limited.
In this case, both tools performed similarly.
AuditAgent produced a larger set of code hygiene observations, including style issues and gas optimizations.
Azimuth focused more on edge cases in Merkle proof verification, such as malformed trees and integration misuse.
But the differences were much smaller than in protocol repositories. This is expected. When the codebase is a simple utility library, there are simply fewer opportunities for exploit modeling to add value.
Repository 4 — BaseTap
Full analysis: view Azimuth report
BaseTap is a modular payment protocol designed around taps, which allow controlled token flows between accounts.
The system includes:
- tap registries
- execution contracts
- payment sessions
- batching logic
- split payments
This architecture introduces several workflow risks.
AuditAgent identified several important issues, including:
- missing authorization checks
- inconsistencies between
canExecute()andexecuteTap() - architectural design weaknesses
Azimuth expanded these into attack scenarios affecting real users.
- a payment session could be griefed by malicious actors calling
markPaid()before legitimate settlement - ETH transfers could become permanently locked when interacting with ERC20 tap paths
- tap owners could inflate payment amounts after users grant approvals
These are not simply coding errors. They are product trust failures, where legitimate users could be harmed even though the contract technically behaves as written.
Cross-Repository Comparison
Looking across the four repositories reveals a consistent pattern.
Each tool excels in different areas.
AuditAgent strengths
- strong static analysis
- best-practice detection
- architectural hygiene
Azimuth strengths
- exploit path modeling
- economic attack analysis
- multi-contract reasoning
- workflow failure detection
What This Means for AI Auditing
The results suggest an important distinction between two categories of AI security tools.
Static AI auditors
These tools behave similarly to traditional vulnerability scanners.
They excel at identifying:
- reentrancy risks
- missing access control
- unsafe patterns
- implementation issues
But they often struggle to reason about:
- multi-step attacks
- economic incentives
- protocol workflows
Behavioral security engines
Systems like Azimuth focus less on pattern matching and more on simulating how contracts behave under adversarial conditions.
This enables them to surface vulnerabilities that appear only when:
- multiple transactions interact
- cross-contract calls occur
- incentives are manipulated
The Bigger Picture
Smart contract security is evolving.
Early tools focused on code correctness.
Modern protocols require analysis of economic behavior and system interactions.
Both layers matter.
Static scanners are valuable for quickly catching implementation mistakes.
But as protocols grow more complex, security tools must also understand:
- how users interact with systems
- how attackers manipulate incentives
- how state transitions create unexpected behavior
Conclusion
Across the four repositories we analyzed — PrimeVault, LendMachine, Murky, and BaseTap — a consistent pattern emerged.
Static AI auditors were effective at identifying common vulnerability patterns and implementation risks.
But the most meaningful issues surfaced when the analysis moved beyond individual functions and began modeling how contracts behave as a system.
Many of the highest-impact findings depended on:
- multi-step interactions
- cross-contract workflows
- economic incentives
- real user behavior
Static analysis is an important first layer of defense, but modern smart contract exploits rarely arise from a single unsafe line of code.
They emerge from how components interact over time.
Static analysis tells you where the code looks risky.
Dynamic analysis tells you how the system actually breaks.
As smart contracts continue to grow in complexity, behavioral analysis will increasingly become a necessary complement to static scanning in serious security workflows.