benchmark-backed plugin proof
We measured the advice before claiming savings.
The plugin is crafted from actual Claude Code logs, then tested against fresh runs from the same commit and prompt. The result is not a generic “install more tools” bundle: it keeps the practices that reduced the token category they target and downgrades recommendations that did not prove out.
current finding
Agent Analyzer guidance reduced measured token waste and API-rate cost in repeated controlled runs.
On the larger noisy repo, three fresh Agent Analyzer guided runs averaged 12,370 fewer estimated tokens, 12,698 fewer tool-output tokens, 504 fewer Claude output tokens, and $0.044219 lower native Claude Code cost while preserving the quality gate. Repriced at published Claude Sonnet 4.6 API rates, the repeated mean delta was -$0.059207, or 23.986% lower cost.
That percentage is the honest scale-up unit. One task saved cents; a team doing comparable Claude Sonnet coding work at $5,000/month would save about $1,199/month. The tooltip basis is simple: baseline $0.2468368, optimized $0.1876295, delta -$0.0592073.
Every named tool recommendation now has a three-repeat aggregate. The plugin no longer says “install everything”; it ships the core Agent Analyzer workflow, conditionally recommends only tools that reduced cost in the repeated suite, and removes telemetry-only or negative tools from the reducer path.
What The Plugin Actually Does
Measures the baseline log locally
Claude Analyzer identifies avoidable waste such as large shell output, broad discovery, repeated reads, retry loops, and context growth spikes without uploading raw transcripts.
Generates scoped Claude guidance
The plugin turns the measured findings into commands, skills, and a reviewer agent that steer the next session toward narrower reads, quieter verification, and lower-output workflows.
Keeps only evidence-backed claims
Tools that performed well are recommended narrowly by token category. Tool-output reducers and retrieval tools are conditional, telemetry-only tools are kept out of the reducer pack, and tools that did not reduce full-session cost are removed from default advice instead of hidden.
Recommendation Verdicts
| Recommendation | Benchmark verdict | How the plugin uses that evidence |
|---|---|---|
| Agent Analyzer guidance | Positive | Keep as the core plugin behavior and make the workflow more direct. |
| Quiet package-scoped testing | Built in | Keep as part of Agent Analyzer guidance because the repeated plugin run used focused reads and quiet verification to reduce tool-output tokens. |
| ccusage | Telemetry | Keep out of the default reducer pack. Use only as independent accounting if the user asks for visibility. |
| claude-context | Removed | Do not recommend for this workflow. Indexing and MCP-search overhead did not amortize in three fresh runs. |
| context-mode | Conditional | Recommend only for tool-output/context bloat. Repeated runs reduced cost 20.4%, but visible output rose on average. |
| grepai | Conditional | Recommend only as input/context retrieval with compact output, small limits, and path filters; repeated cost savings were 14.5%. |
| ccstatusline | Telemetry | Keep out of the default reducer pack. It can improve awareness but does not directly reduce input, output, tool-output, or reasoning tokens. |
| claude-token-efficient | Too small | Do not ship as a default reducer. The repeated savings were 1.8%, useful only as manual verbosity hygiene. |
| Caveman | Removed | Keep out of default Claude plugin guidance. It reduced Codex native tokens in this fixture but made Claude Code estimated/tool-output tokens and cost worse. |
| RTK | Conditional | Recommend explicit commands such as rtk go test ./... before any global shell hooks; repeated cost savings were 18.2%. |
| Probe, Semble, Squeez | Split result | Probe was removed. Semble is a positive repeated retrieval recommendation at 41.5% cost savings. Squeez had a positive old shell-output result, but is removed because it conflicts with Spec Kitty workflows. |
Artifacts
run locally
Measure your own agent logs.
npx --yes agent-analyzer@latest run
Analyzes recent sessions locally, asks before upload, and uses the sanitized report to build a custom plugin for the waste it detects.