When Everything Is Flat, Everything Gets Lost

A new retrieval augmented generation framework hit arXiv this week: KohakuRAG. The pitch is hierarchical document indexing. The concept is not new. The timing is.

RAG systems have spent the past two years getting bigger. Larger embedding models. Wider context windows. More chunks retrieved per query. The assumption: scale solves relevance.

KohakuRAG argues the opposite. The structure of how you index matters more than how much you index.

The Flat Index Problem

Most RAG implementations treat documents as flat collections of chunks. Split the text. Embed the chunks. Store them in a vector database. At query time, find the nearest neighbors and send them to the model.

This works. Until it does not.

The problem shows up when documents have internal structure—and most documents do. A technical specification has sections that reference other sections. A codebase has files that depend on other files. A legal contract has clauses that modify other clauses.

Flat retrieval ignores all of this. The chunking strategy treats every piece as an island. The chunk that embeds closest to the query wins, regardless of whether it makes sense without the context around it. Research on the lost in the middle problem shows that models struggle with relevant information buried in long retrieved sequences—but the deeper issue is that those sequences lack structural coherence in the first place.

The result: retrieved chunks that are individually relevant but collectively incoherent. The model receives pieces from different sections, different contexts, different logical threads. It synthesizes an answer from fragments that were never meant to be read together.

Hierarchy as Structure

Hierarchical indexing solves this by preserving document structure in the index itself.

The approach varies by implementation, but the concept is consistent. Documents get indexed at multiple levels—document summaries, section representations, chunk embeddings—with explicit relationships between levels. Retrieval becomes a traversal rather than a similarity race.

A query might first match a document-level summary, then drill into the relevant section, then retrieve specific chunks within that section. The context window receives chunks that belong together structurally, not just chunks that happen to embed similarly.

This is not a free lunch. Hierarchical indexing adds complexity to both indexing and retrieval. The index structure needs to reflect actual document organization, which means understanding that organization in the first place. Dynamic documents require index maintenance that flat approaches avoid.

But the trade-off is worth examining. The alternative—throwing more compute at flat retrieval—has diminishing returns.

Why Now

The timing matters.

RAG systems have scaled. Context windows have expanded. Token costs have risen. And the quality complaints persist. "It retrieves the wrong context." "The model ignores the relevant parts." "Results are inconsistent across similar queries."

These are structural problems being addressed with brute force. Retrieve more chunks. Apply reranking more aggressively. Throw context compression at the output. Each fix eats into the token budget without addressing the underlying issue: flat indexing loses structure, and structure contains meaning.

Hierarchical approaches like KohakuRAG represent a shift in where intelligence lives in the pipeline. Instead of relying entirely on the language model to sort through retrieved noise, the retrieval system itself becomes structurally aware.

The Implementation Question

Academic papers describe architectures. Production systems require implementation choices.

The questions are practical: How do you extract document structure automatically? What happens when documents lack clear hierarchy? How do you handle mixed-structure corpora where some documents are hierarchical and others are not?

KohakuRAG positions itself as "simple," which in academic terms means feasible to implement without a research team. Whether that simplicity survives contact with production data is the usual question.

The benchmarks will show improvement on standard retrieval metrics. They always do. The harder question is whether the approach handles the messy document collections that actual teams work with—inconsistent formatting, missing section headers, documents that evolved over time without structural coherence.

Codebases Are the Test Case

For developers specifically, codebases present an interesting test case for hierarchical retrieval.

Code has explicit structure. Files belong to modules. Functions call other functions. Classes inherit from other classes. The relationships are not implicit—they are defined in the code itself.

Flat retrieval over codebases surfaces the same problems at scale. A query about authentication retrieves chunks from authentication files, test files, migration scripts, and commented-out code that happens to mention authentication. The model synthesizes from all of them.

Hierarchical indexing could respect the actual structure. Module boundaries. Import relationships. Call graphs. The retrieval system could understand that a chunk from auth/login.py relates differently to a chunk from tests/test_auth.py than to a chunk from database/migrations/0047_auth_refactor.py.

This is harder than document retrieval because the structure is more complex. But the potential payoff is proportionally higher. Code understanding depends heavily on context. The context is structural.

What This Means

KohakuRAG is one paper in a trend. Hierarchical RAG. Structured retrieval. Graph-based approaches. The common thread: acknowledging that flat chunk retrieval loses information that matters.

The practical implication is straightforward. Teams building RAG systems—or using off-the-shelf RAG tools—should be asking what structure gets preserved in their indexing approach. If the answer is "none," the retrieval ceiling is lower than it could be.

More sophisticated retrieval does not necessarily mean more expensive retrieval. In some cases, structure-aware approaches retrieve fewer chunks because they retrieve the right ones. The context window receives signal instead of noise. The model spends attention where it matters.

The flat index era is not ending. It is being supplemented. And for teams hitting quality ceilings with current approaches, that supplementation might be worth investigating.