The Agent Loop: How Router Searches Until It Finds

You ask your AI coding assistant a question about how two parts of your codebase interact. It searches once, pulls some vaguely related code, and gives you an answer that sounds confident but misses the mark. Sound familiar?

This is the single-retrieval problem, and it plagues most RAG-based coding tools. They treat search as a one-shot operation: query goes in, results come out, answer gets generated. For simple questions like "where is the User class defined?" this works fine. But real development questions are rarely that simple.

The Single-Retrieval Trap

Most AI coding assistants follow a straightforward pattern: take the user's question, convert it to an embedding, find the top-k most similar code chunks, stuff them into context, and generate an answer. This approach has a fundamental limitation: it assumes you can find all relevant context in one search.

Consider a question like "How does authentication interact with session management in this codebase?" The answer lives across multiple files: the auth middleware, the session store, the user model, possibly some configuration. A single search query, no matter how well-crafted, cannot surface all of these pieces. The retrieval might find the auth middleware, but miss the session store entirely because those files use different terminology.

The result? Incomplete context leads to incomplete answers. The model hallucinates connections that do not exist, or worse, gives you confident-sounding advice based on partial information.

What a Tool Loop Actually Is

A tool loop is exactly what it sounds like: instead of calling search once and hoping for the best, the agent calls search, examines the results, decides if it needs more context, and searches again with a refined query. This continues until the agent is confident it has enough information to answer correctly.

The key insight is that the model decides when to stop. It is not blindly iterating; it is reasoning about what it knows, what it does not know, and what it needs to find out. Each iteration refines the search based on what was learned in previous iterations.

Think of it like how an experienced developer explores an unfamiliar codebase. You do not read one file and assume you understand the system. You follow imports, check call sites, read related tests, and gradually build a mental model. The tool loop gives the AI assistant the same capability.

Why This Matters for Complex Questions

Complex questions require iterative refinement. When you ask "Why does the payment flow fail when the user has an expired session?", the answer requires understanding multiple systems:

How the payment flow is implemented
How session expiration is handled
How these two systems interact (or fail to)
What error handling exists at the boundary

No single search query can surface all of this. But a sequence of targeted searches, each informed by the previous results, can systematically build the complete picture.

How Router Implements the Loop

Router supports up to 10 tool iterations per request. On each iteration, the model can call any combination of available tools: search_code for semantic search, graph_neighbors to explore import relationships, or graph_impact to understand blast radius.

The model is not forced to use all 10 iterations. It evaluates its current context after each tool call and decides: do I have enough information to answer confidently? If yes, it stops and generates the response. If no, it formulates another query based on what it learned.

This is also cost-aware. You can set a per-request budget that limits the number of iterations or the total token spend. For quick lookups, set a low limit. For complex architectural questions, let it iterate until it finds the answer.

A Worked Example

Let us trace through a real example. A developer asks: "Why does the payment flow fail when the user has an expired session?"

Iteration 1: Find the payment flow

The agent calls search_code("payment flow") and finds src/payments/processor.py. It learns that the payment processor calls validate_session() before processing, but it does not see that function's implementation.

Iteration 2: Explore dependencies

The agent calls graph_neighbors("src/payments/processor.py") to see what the payment module imports. It discovers the import from src/auth/session.py and notes that validate_session is defined there.

Iteration 3: Find session expiry handling

The agent calls search_code("session expiry handler") and finds that validate_session() raises SessionExpiredError when the session TTL has passed. Critically, it also finds that the payment processor catches generic Exception instead of specifically handling SessionExpiredError, causing it to return a vague "payment failed" message instead of prompting re-authentication.

Now the agent has the complete picture: the bug is in error handling. The payment processor does not distinguish between expired sessions and actual payment failures because it catches exceptions too broadly. Three searches, each building on the last, to arrive at an answer no single query could have found.

Try It

You can test the agent loop directly via the API:

curl -X POST https://pyckle-agent.fly.dev/agent/run \
  -H "Content-Type: application/json" \
  -d '{"message": "why does payment fail with expired sessions?", "tools": true}'

The response includes not just the answer, but the sequence of tool calls the agent made to arrive at it. You can see exactly what it searched for, what it found, and how it reasoned about the results.

The Tradeoff: Latency and Cost

More iterations means more latency and more cost. Each tool call adds a round trip to the embedding service, plus additional tokens for the model to reason about results. The 10-iteration cap exists specifically to prevent runaway loops where the agent never decides it has enough information.

For most questions, 2-4 iterations are sufficient. Simple lookups often complete in 1. Complex cross-cutting questions might hit 6-8. If you are regularly hitting the cap, the question might be too broad and worth breaking into smaller pieces.

You can tune this for your use case. Set max_iterations: 3 for fast, cheaper queries. Increase it for deep architectural exploration. The agent respects your budget while still leveraging its ability to iterate.

Single-Shot Is Not Enough

The single-retrieval model made sense when context windows were small and every token was precious. But modern development questions span multiple files, multiple systems, and multiple concerns. Answering them correctly requires the ability to explore, not just recall.

The agent loop is how Router bridges that gap. It searches until it finds, reasons about what it learns, and stops when it has enough context to answer with confidence. The result is answers that are grounded in your actual codebase, not hallucinated from partial information.

Ready to try it? Get started with Router and see how iterative search changes what is possible with AI code assistance.