The context window race is accelerating. Claude offers 200K tokens. Gemini pushes past 1M token context length. GPT-5 lands somewhere in between. The assumption behind the long context arms race is clear: more context means better results.
That assumption is wrong, and it costs people money to learn why.
The Misconception
Teams throw entire repositories into context windows and wonder why the output gets worse. It is counterintuitive. More information should help. It does not, and the reason takes a minute to explain.
When context windows were small, they were the obvious bottleneck. There was not enough room to fit the code the model needed to understand the task. The fix seemed straightforward—make the window bigger.
So they did. Developers responded by including more content. Entire files. Whole directories. README after README. The logic seemed sound: if the model can see everything, it should understand everything.
It is a bit like assuming that giving someone access to an entire library helps them answer a specific question faster. Sometimes it does. Usually it just gives them more places to get lost.
The Real Problem
Large language models do not treat context like a database. There is no indexing, no targeted retrieval. They process everything—all of it—every single time.
This is the part that does not make it into the marketing materials: as context grows, performance degrades.
Researchers call it "lost in the middle"—a finding confirmed by needle in haystack benchmarks. Models remember what is at the beginning of a prompt. They remember what is at the end. The middle gets fuzzy. That function definition buried on line 847 of the context? Maybe the model references it. Maybe it does not. Larger context, less predictable behavior.
Not a bug. Architecture. Attention mechanisms have limits, and those limits do not scale linearly with token count. Doubling the context does not double the understanding. It might halve it.
What Actually Matters
The developers getting consistent results are not maximizing context. They are curating it.
There is a difference between "what can I fit?" and "what does the model need for this task?" One approach fills the window. The other fills it with intent.
Consider the difference: a 10K token context with relevant code, clear instructions, and minimal noise versus a 150K token dump of tangentially related files. The leaner context wins. Every time.
The mechanics explain why. The model is not searching through a haystack—it is attending to everything at once. Noise dilutes attention. Diluted attention means worse output. The math is not complicated, just ignored.
The Cost Nobody Discusses
Context is not free. Every token gets processed.
The token cost shows up directly in API bills or indirectly through slower responses and degraded output. A developer burning 50K tokens on a single prompt is not getting 50K tokens of value. They are paying for noise. Without a token budget or any sense of token efficiency, the spend compounds. When the output comes back wrong—and it often does—they burn another 50K trying to fix it. Then another. The meter keeps running.
The compound cost adds up. Not just dollars. Time spent reviewing, correcting, re-prompting. One study found 66% of developers spend extra time fixing "almost-right" suggestions. That number is striking because it reveals where the friction actually lives. It is not a capability problem. It is a context problem. The model could have gotten it right with better input.
The Practical Implication
Bigger context windows are a feature, not a strategy.
They provide headroom when needed, but they do not replace the work of deciding what belongs in a prompt and what does not. Token optimization is not glamorous, but it is the difference between consistent results and expensive guessing. The developers who understand this treat context like real estate. Every token earns its place. Selective about what goes in. Ruthless about what stays out.
Less convenient than including everything and hoping the model figures it out. Also works. The inconvenience is the point—curation takes effort because it produces value.
First in a series on context management for AI-assisted development.