Your SDD Framework Is Eating Your Context Window

I Analyzed BMAD's Token Usage. The Numbers Are Bad.

Spec-driven development frameworks promise better AI-assisted coding through structured specs and PRDs. But there is a cost nobody talks about: context window consumption.

I analyzed BMAD (v6.0.0-Beta.7), one of the more popular SDD frameworks, to understand what it actually injects into your context window when you run a command. The results were worse than I expected.

The Analysis

I parsed every BMAD command and built a dependency graph of the files each one loads. Then I estimated token counts using a standard 1 token per 4 characters approximation.

The quick-flow-solo-dev command loads 125 files and injects ~105,000 tokens into your context window. For a single command!

Here is the full picture for the heaviest commands:

Command Files Loaded Est. Tokens
quick-flow-solo-dev 125 ~105,000
bmm-quick-spec 56 ~42,000
bmm-quick-dev 56 ~41,000
agent-bmm-pm 30 ~34,000
agent-bmm-analyst 30 ~33,000
agent-bmm-tech-writer 19 ~28,000
agent-bmm-sm 21 ~24,000
agent-bmm-dev 17 ~22,000
agent-bmm-qa 15 ~22,000

Even the mid-tier commands eat 20,000-30,000 tokens each.

Why This Matters

Here are the standard context windows for the models most developers are using today:

  • Claude Sonnet 4.5: 200k tokens
  • Claude Opus 4.6: 200k tokens
  • GPT-5.3 Codex: 400k tokens (272k effective, after reserving 128k for output)

One quick-flow-solo-dev command takes over half the usable context on Claude models, and about a third of the effective context on GPT-5.3 Codex. Before you have written a single line of code.

BMAD commands are also composable. You run one, then another, then another. Run the PM agent and quick-spec back to back and you are at 76k tokens. Add quick-flow-solo-dev at 105k and you have exceeded 180k tokens. On a 200k model, that leaves almost nothing for actual work.

And that 200k is not really 200k. You have not counted your AGENTS.md or CLAUDE.md files, MCP server metadata and tool definitions, system prompts, any files the AI has already read in the conversation, or your actual code.

Models degrade in quality as context fills up. The last 10-20% of a context window is where you see the most missed instructions and confused outputs. You do not want to be there before you have even started working.

But What About Larger Context Windows?

Both Claude Sonnet 4.5 and Claude Opus 4.6 offer a 1M token context window through the API. GPT-5.3 Codex ships with 400k natively. So maybe this does not matter?

The 1M context windows on Claude models are currently in beta. They also come with premium pricing: 2x on input tokens and 1.5x on output tokens once you exceed 200k input tokens. For coding tasks where you are sending and receiving thousands of tokens per interaction, that adds up fast. They are not available through your subscriptions either. API token access only.

And even if cost is not a concern, more context does not mean better results. Models perform best when the context is focused and relevant. Dumping 105k tokens of framework boilerplate into the window means the model has to wade through templates, configuration files, and examples to find the parts that actually matter for your task. That is wasted attention.

Why SDD Tools Are So Greedy

SDD tools want to give the AI every possible bit of context so it can generate perfect specs. So they load everything: documentation, previous specs, all templates, every example file, configuration, environment details.

It is the kitchen sink approach to prompting. And it wrecks your context window.

The tool developers seem to assume you are running a 200k+ model and that losing half your context to tooling metadata is an acceptable tradeoff. Even on those models, it is a real problem. And plenty of developers are using smaller or local models with 32k-128k context limits, where a single BMAD command would not even fit.

Not Every Framework Has This Problem

Agent OS recently released v3, which took a very different approach. The developers retired the implementation and orchestration phases entirely, recognizing that frontier models handle spec implementation well on their own. Features like plan mode in Claude Code already do what those phases were trying to do, and they do it better.

Agent OS v3 focuses on injecting development standards and enhancing plan mode with targeted questions. It is a fraction of the context footprint because it defers the heavy lifting to the AI tool itself rather than trying to replicate it with prompt files.

That is the right direction. SDD frameworks should be thin layers that enhance what the model already does well, not 105k-token payloads that crowd out your actual work.

What You Should Do

If you are using an SDD framework, measure its context window impact before committing to it.

Look at what gets loaded when you run a command. If it is pulling in dozens of files, that is a red flag. Estimate the tokens with a rough character count divided by 4. It is not exact, but it will tell you if you are in the thousands or the hundred-thousands.

# Get approx token count for the given files
cat file1 file2 file3 | wc -c | awk '{printf "%.0f tokens\n", $1/4}'

Then check composability. Run the commands you would actually use in a session and add up the totals. That is your real context cost. If a single command eats more than 25% of your context window, that tool is too expensive for your workflow.

For most specs, a simple conversation with Claude or ChatGPT works better than most SDD CLI tools. You control the context. You see exactly what the AI sees. You manage the token budget.

Be skeptical of tools that do not disclose their context window impact. A tool that burns half your context before you start working will cost you more than it saves.

Analyze before you adopt.

Have you measured the context window impact of your SDD tools? I am curious if others have done similar analysis or if the token costs came as a surprise.