results

The benchmark changed the recommendations.

Every named recommendation below now has three fresh baseline/optimized pairs on the same noisy owner-breakdown task. The plugin keeps the practices that reduced the token category they target, and it downgrades tools that added cost or context.

replication status

All tool verdicts on this page are 3x repeated results.

Each suite ran three fresh baseline/optimized pairs from commit b96b8a7 with the same prompt and go test ./... quality gate. All published medium-context aggregates passed quality in all three repeats. Single-run comparison JSONs remain available as historical artifacts, but the verdicts below use aggregate-*.json and the fuller sanitized recordings are committed under docs/benchmarks/primary-data/.

cost scale

The useful number is 24.0%, not four cents.

The Agent Analyzer suite went from $0.2468368 to $0.1876295 at published Claude Sonnet 4.6 API rates, a 23.986% reduction. On comparable recurring usage, that is about $479.73/month on $2,000/month, $1,199.32/month on $5,000/month, or $2,398.64/month on $10,000/month.

token category rule

Output, reasoning, tool-output, and cost are separate claims.

Tool-output and retrieval tools target input/context tokens. Terse-response tools target visible output tokens. Codex exposes reasoning tokens separately. Published API-rate cost reprices exposed token categories, while Claude Code native cost can differ because the product may use internal routing and model work.

Repeated Recommendation Matrix

Recommendation Target category Quality Estimated tokens Tool output Output / reasoning Published API delta API-rate percent Verdict
Agent Analyzer guidance Measured workflow and tool-output hygiene 3/3 -12,370 -12,698 Claude output -504 -$0.059207 -24.0% Positive
claude-context Input/context retrieval through MCP 3/3 +7,327 +4,170 Claude output +1,169 +$0.058038 +26.0% Negative here
claude-rlm Recursive sub-agent decomposition 3/3 +19,477 +6,020 Claude root output -1,197 root-session only; sub-agent cost not exposed n/a Negative here
context-mode Tool-output/input-context batching 3/3 -12,359 -13,257 Claude output +170 -$0.052175 -20.4% Conditional
grepai Path-constrained compact retrieval 3/3 -14,567 -15,571 Claude output +443 -$0.037657 -14.5% Conditional
claude-token-efficient Visible-output verbosity guidance 3/3 -391 -754 Claude output -79 -$0.004208 -1.8% Too small
RTK Explicit shell-output compression 3/3 -12,446 -12,716 Claude output +114 -$0.044316 -18.2% Conditional
Probe Bounded code search 3/3 +874 -745 Claude output +548 +$0.038340 +16.6% Negative here
Semble Path-limited semantic retrieval 3/3 -16,301 -16,060 Claude output -480 -$0.114194 -41.5% Positive here
Squeez Explicit shell-output compression 3/3 -8,471 -8,917 Claude output +73 -$0.028224 -12.1% Removed: Spec Kitty conflict
ccusage / ccstatusline Telemetry only n/a n/a n/a n/a n/a n/a Visibility

Codex And Caveman Controls

Harness Intervention Quality Analyzer estimated Tool output Native token signal Published API delta Verdict
Claude Code -p Agent Analyzer plugin 3/3 -12,370 -12,698 native cost -$0.044219 -$0.059207 Positive
Codex exec --json Agent Analyzer text guidance 3/3 -14,520 -14,527 uncached+output -24,369; reasoning -45 -$0.062392 Positive here
Claude Code -p Caveman terse-output pressure 3/3 +4,355 +4,868 native cost +$0.009919; output -370 +$0.009211 Negative here
Codex exec --json Caveman terse-output pressure 3/3 -9,210 -9,109 uncached+output -4,739; reasoning -2 -$0.033986 Harness-specific

Product Conclusions

Artifacts

run locally

Generate your own report and plugin.

npx --yes agent-analyzer@latest run

Runs on your machine first, asks before upload, and turns the sanitized report into a targeted plugin for your detected waste.