AI tools are rapidly becoming part of the smart contract security workflow. Some teams experiment with them internally, while others are beginning to rely on them as a first line of defense before manual audits.

But a fundamental question remains: how well do these tools actually perform when analyzing real codebases?

To explore that question, we ran a comparison between two systems:

  • AuditAgent — a static AI auditing tool focused on vulnerability detection
  • Azimuth — a behavioral analysis engine designed to simulate exploit paths and protocol interactions

Rather than evaluating a single repository, we tested both systems across four different codebases, each representing a different category of smart contract architecture.

Repositories Analyzed

Each repository was analyzed independently by both systems and compared across several dimensions:

  • vulnerability detection
  • exploit modeling
  • protocol reasoning
  • workflow analysis
  • code-quality observations

The results were instructive.

Methodology

For each repository we:

  • Ran AuditAgent to generate a full audit report
  • Ran Azimuth to generate behavioral exploit hypotheses
  • Compared the outputs across several dimensions:
    • number of findings
    • exploit depth
    • cross-contract reasoning
    • economic attack modeling
    • operational failure modes

Importantly, these were unmodified repositories analyzed in their original state.

Repository 1 — PrimeVault

Full analysis: view Azimuth report

PrimeVault is a DeFi protocol with vault mechanics and capital flows between multiple contracts.

This kind of architecture introduces several attack surfaces:

  • asset accounting
  • permission controls
  • economic manipulation
  • cross-contract interactions
Capability
AuditAgent
Azimuth
Static vulnerability detection
⭐⭐⭐⭐
⭐⭐⭐⭐
Cross-function exploit discovery
⭐⭐⭐⭐⭐
Protocol economic modeling
⭐⭐⭐⭐⭐
Multi-contract reasoning
⭐⭐
⭐⭐⭐⭐⭐
Implementation & best-practice feedback
⭐⭐⭐
⭐⭐⭐

AuditAgent correctly identified several isolated contract risks and best-practice violations.

Azimuth expanded those issues into realistic exploit paths, including scenarios where attackers could manipulate protocol flows across multiple contracts.

This is a recurring theme: static scanners can identify risky code patterns, but often stop short of modeling how those risks translate into real attacks.

Repository 2 — LendMachine

Full analysis: view Azimuth report

LendMachine is a simplified lending protocol with collateral, borrowing, liquidation, and reward mechanics.

These systems are especially sensitive to economic exploits, where small logic flaws can create large financial consequences.

Capability
AuditAgent
Azimuth
Static vulnerability detection
⭐⭐⭐⭐
⭐⭐⭐⭐
Reentrancy detection
⭐⭐⭐
⭐⭐⭐⭐⭐
Cross-function exploit discovery
⭐⭐⭐⭐⭐
Economic attack modeling
⭐⭐⭐⭐⭐
Protocol reasoning
⭐⭐⭐⭐⭐

Both tools detected a configuration risk around interest rate control.

AuditAgent noted that the interest rate setter lacked access control. Azimuth went further and modeled several exploit scenarios:

  • artificially inflating interest rates to force liquidations
  • temporarily setting rates to zero to disable accrual
  • manipulating borrower health factors during liquidation windows

Additionally, Azimuth identified issues in reward accounting synchronization, which could lead to phantom reward accumulation under certain conditions.

These types of vulnerabilities are difficult for static scanners to detect because they require reasoning about state transitions across multiple transactions.

Repository 3 — Murky

Full analysis: view Azimuth report

Murky is not a protocol at all.

It is a Merkle tree utility library used primarily for testing and proof generation.

That makes it an interesting control case.

Because Murky has:

  • no capital flows
  • no incentives
  • no multi-contract architecture

...the number of meaningful attack surfaces is naturally limited.

Capability
AuditAgent
Azimuth
Static vulnerability detection
⭐⭐⭐⭐
⭐⭐⭐
Merkle logic analysis
⭐⭐⭐⭐
⭐⭐⭐⭐⭐
Edge-case reasoning
⭐⭐⭐
⭐⭐⭐⭐⭐
Protocol exploit modeling

In this case, both tools performed similarly.

AuditAgent produced a larger set of code hygiene observations, including style issues and gas optimizations.

Azimuth focused more on edge cases in Merkle proof verification, such as malformed trees and integration misuse.

But the differences were much smaller than in protocol repositories. This is expected. When the codebase is a simple utility library, there are simply fewer opportunities for exploit modeling to add value.

Repository 4 — BaseTap

Full analysis: view Azimuth report

BaseTap is a modular payment protocol designed around taps, which allow controlled token flows between accounts.

The system includes:

  • tap registries
  • execution contracts
  • payment sessions
  • batching logic
  • split payments

This architecture introduces several workflow risks.

Capability
AuditAgent
Azimuth
Static vulnerability detection
⭐⭐⭐⭐
⭐⭐⭐⭐
Access-control analysis
⭐⭐⭐⭐
⭐⭐⭐⭐⭐
Workflow reasoning
⭐⭐⭐
⭐⭐⭐⭐⭐
Payment-flow exploit modeling
⭐⭐
⭐⭐⭐⭐⭐
Cross-contract reasoning
⭐⭐⭐
⭐⭐⭐⭐⭐

AuditAgent identified several important issues, including:

  • missing authorization checks
  • inconsistencies between canExecute() and executeTap()
  • architectural design weaknesses

Azimuth expanded these into attack scenarios affecting real users.

  • a payment session could be griefed by malicious actors calling markPaid() before legitimate settlement
  • ETH transfers could become permanently locked when interacting with ERC20 tap paths
  • tap owners could inflate payment amounts after users grant approvals

These are not simply coding errors. They are product trust failures, where legitimate users could be harmed even though the contract technically behaves as written.

Cross-Repository Comparison

Looking across the four repositories reveals a consistent pattern.

Capability
AuditAgent
Azimuth
Static vulnerability detection
⭐⭐⭐⭐
⭐⭐⭐⭐
Cross-function exploit discovery
⭐⭐⭐⭐⭐
Protocol economic modeling
⭐⭐⭐⭐⭐
Workflow / state-machine reasoning
⭐⭐
⭐⭐⭐⭐⭐
Implementation & best-practice feedback
⭐⭐⭐
⭐⭐⭐

Each tool excels in different areas.

AuditAgent strengths

  • strong static analysis
  • best-practice detection
  • architectural hygiene

Azimuth strengths

  • exploit path modeling
  • economic attack analysis
  • multi-contract reasoning
  • workflow failure detection

What This Means for AI Auditing

The results suggest an important distinction between two categories of AI security tools.

Static AI auditors

These tools behave similarly to traditional vulnerability scanners.

They excel at identifying:

  • reentrancy risks
  • missing access control
  • unsafe patterns
  • implementation issues

But they often struggle to reason about:

  • multi-step attacks
  • economic incentives
  • protocol workflows

Behavioral security engines

Systems like Azimuth focus less on pattern matching and more on simulating how contracts behave under adversarial conditions.

This enables them to surface vulnerabilities that appear only when:

  • multiple transactions interact
  • cross-contract calls occur
  • incentives are manipulated

The Bigger Picture

Smart contract security is evolving.

Early tools focused on code correctness.

Modern protocols require analysis of economic behavior and system interactions.

Both layers matter.

Static scanners are valuable for quickly catching implementation mistakes.

But as protocols grow more complex, security tools must also understand:

  • how users interact with systems
  • how attackers manipulate incentives
  • how state transitions create unexpected behavior

Conclusion

Across the four repositories we analyzed — PrimeVault, LendMachine, Murky, and BaseTap — a consistent pattern emerged.

Static AI auditors were effective at identifying common vulnerability patterns and implementation risks.

But the most meaningful issues surfaced when the analysis moved beyond individual functions and began modeling how contracts behave as a system.

Many of the highest-impact findings depended on:

  • multi-step interactions
  • cross-contract workflows
  • economic incentives
  • real user behavior

Static analysis is an important first layer of defense, but modern smart contract exploits rarely arise from a single unsafe line of code.

They emerge from how components interact over time.

Static analysis tells you where the code looks risky.

Dynamic analysis tells you how the system actually breaks.

As smart contracts continue to grow in complexity, behavioral analysis will increasingly become a necessary complement to static scanning in serious security workflows.