Benchmarks
Does it actually work? We ran the numbers.
5 benchmark runs · 80+ prompts · real 92-file production codebase · same model (Claude Sonnet 4.6), same questions — with and without GrapeRoot.
45%cheaper on complex tasksv3.8.35 challenge benchmark · 10/10 prompts
10/10cost + quality winsclean sweep across challenge benchmark
34%avg savings, E2E benchmark16/20 cost wins · real-world multi-step prompts
E2E Per-Prompt Cost
Cost per prompt, E2E benchmark

Expand
E2E Quality (Dual)
Regex + LLM judge dual scoring

Expand
3-Way Cost (CGC)
GrapeRoot vs Normal vs CGC cost

Expand
Quality by Level
Quality breakdown by difficulty level

Expand
Win Matrix
Win/loss matrix across configurations

Expand
E2E Savings Waterfall
Per-prompt savings, E2E run

Expand
Win Rate Evolution
Win rate improvement over time

Expand
E2E Turns
Turn count comparison

Expand
GR vs CGC Savings
GrapeRoot vs CGC savings delta

Expand
E2E Wall Time
Total wall time per prompt

Expand
Total Spend
Aggregate spend across all runs

Expand