Benchmarks

Does it actually work? We ran the numbers.

3 real production codebases · 78 prompts · TypeScript, Go, and Python · same model, same judge — with and without GrapeRoot.

57%cheaper on TypeScript + GoGR Pro v6 · 33 prompts · Medusa + Gitea
53%cheaper on PythonGR v3 · 24 prompts · Sentry · 7,762 files
78prompts across 3 codebasesTypeScript · Go · Python — same model, same judge

GrapeRoot Pro v6 vs Normal — Benchmark

NEW

33 prompts · Medusa (TypeScript, 1,571 files) + Gitea (Go) · Claude Sonnet 4.6 · LLM judge

The question

GrapeRoot Pro v6 adds exhaustive enumeration tools — pre-computed dead exports, cycle detection via Tarjan SCC, and uncapped grep sweeps. Does this make it strictly better than running Claude with no context engine? We tested 33 prompts across two real codebases to find out.

GrapeRoot Pro v6

81.5

avg quality / 100

cost/prompt$0.69
total (33)$22.86
value (Q/$)118

Normal (no tools)

77.6

avg quality / 100

cost/prompt$1.60
total (33)$52.77
value (Q/$)49

Key Findings

75%

quality wins (25/33 prompts)

57%

cheaper ($0.69 vs $1.60)

+3.9

higher quality (81.5 vs 77.6)

Medusa (TypeScript) — 17 prompts

  • Quality: 79.4 vs 74.2 (+5.2)
  • Cost: $0.45 vs $1.04/prompt
  • Quality wins: 15/17 (88%)
  • Biggest wins: P6 Dead Exports, P9 Circular Deps

Gitea (Go) — 16 prompts

  • Quality: 83.8 vs 81.1 (+2.7)
  • Cost: $0.95 vs $2.20/prompt
  • Quality wins: 10/16 (63%)
  • Biggest wins: P8 CSRF ($1.53 vs $5.27), P30 Input Validation ($2.15 vs $6.49)

Featured Chart

Analytics — 16 Charts

Summary: Quality, Cost & Value

Side-by-side across Medusa, Gitea, and Combined

Summary: Quality, Cost & Value
Expand

Quality per Prompt — Medusa

GR v6 vs Normal across 17 TypeScript audit prompts

Quality per Prompt — Medusa
Expand

Quality per Prompt — Gitea

GR v6 vs Normal across 16 Go audit prompts

Quality per Prompt — Gitea
Expand

Head-to-Head Win Rates

GR wins 75% of prompts (25/33) — 88% on Medusa, 63% on Gitea

Head-to-Head Win Rates
Expand

Quality Delta — Medusa

Per-prompt improvement: green = GR wins, red = Normal wins. Avg +3.9

Quality Delta — Medusa
Expand

Quality Delta — Gitea

Per-prompt improvement on Go codebase

Quality Delta — Gitea
Expand

Cost Savings Summary

57% cheaper overall — $29.91 saved across 33 prompts

Cost Savings Summary
Expand

Cost per Prompt — Medusa

GR $0.45/prompt vs Normal $1.04/prompt (57% cheaper)

Cost per Prompt — Medusa
Expand

Cost per Prompt — Gitea

GR $0.93/prompt vs Normal $2.18/prompt (57% cheaper)

Cost per Prompt — Gitea
Expand

Cost Savings Waterfall — Medusa

Per-prompt cost savings across 17 TypeScript prompts

Cost Savings Waterfall — Medusa
Expand

Cost Savings Waterfall — Gitea

Per-prompt savings on Go codebase

Cost Savings Waterfall — Gitea
Expand

Quality vs Cost Scatter

Every prompt plotted — GR clusters top-left (high Q, low cost)

Quality vs Cost Scatter
Expand

Performance by Task Category

Quality and cost breakdown across security, performance, reliability, maintainability

Performance by Task Category
Expand

Targeted vs Exhaustive Tasks

GR excels on both task types — bigger wins on exhaustive

Targeted vs Exhaustive Tasks
Expand

Quality Score Distribution

Box plots showing GR has higher median and tighter spread

Quality Score Distribution
Expand