Why Naive RAG Fails for Coding Agents (And building AST-aware retrieval instead)

Standard text splitting destroys code context. Notes on building Satori, moving past naive RAG, and enforcing deterministic tool contracts for MCP agents.

AI MCP Rag Architecture Systems

Date

May 12, 2026

Read

3 min

If you are building an AI coding agent, you hit the exact same wall everyone else hits: it works perfectly on a 200-line script, but point it at a massive enterprise monorepo, and it starts hallucinating file paths and inventing variables that don’t exist.

The bottleneck isn’t the LLM’s reasoning. The bottleneck is your retrieval pipeline.

Standard Retrieval-Augmented Generation (RAG) is built for human language. Code is not human language—it is a strict dependency graph. When I was architecting Satori (an agent-safe semantic retrieval system), it became obvious that applying standard text-splitting to a codebase destroys the agent’s context window before it even starts thinking.

Here are my notes on why naive RAG fails for code, the mechanics of AST-aware chunking, and the tradeoffs of building it.

The Flaw in Character-Count Chunking

Most RAG tutorials tell you to split documents into 1,000-token chunks with a 200-token overlap, embed them, and push them to a vector store like Milvus.

For a PDF, this is fine. For AuthService.ts, it is catastrophic. Imagine a standard chunker slicing a file exactly in the middle of a critical class method. The vector database returns “Chunk 2” to the agent. The agent sees throw new Error("Invalid User"), but it has absolutely no idea what class it belongs to, what imports are required, or what the function signature is.

It lost the lexical scope. So, it guesses.

The Fix: AST-Aware Enrichment

To fix this, we have to stop treating code like a string of text and start treating it like an Abstract Syntax Tree (AST).

By parsing the code through a tool like Tree-sitter before embedding, we enforce strict semantic boundaries. Instead of splitting by token count, we split by logical nodes:

We chunk entire Class blocks.
We chunk entire Function definitions.

More importantly, we enrich the chunk before embedding it. In Satori, the pipeline prepends the parent scope directly into the text payload.

The agent doesn’t just get a raw function; it gets this:

// [FILE]: src/services/AuthService.ts
// [IMPORTS]: import { db } from '../db'; import { User } from '../types';
// [SCOPE]: class AuthService
export async function validateToken(token: string): Promise<User> { ... }

Now, when the agent retrieves this chunk via Cosine Similarity, it has the exact file path and dependency context required to write a deterministic patch.

The Tradeoff: Why doesn’t everyone do this?

If AST chunking is so much better, why is the industry flooded with naive RAG wrappers?

Because AST parsing is highly brittle across multiple languages. A Python AST looks completely different from a TypeScript AST. Building a pipeline that accurately traverses Tree-sitter nodes for 5 different languages requires writing custom query grammars for each one. Naive regex splitting takes 5 minutes; AST chunking takes weeks of edge-case hardening.

But if you want a reliable agent, it is a non-negotiable architectural cost.

Enforcing Deterministic Tool Contracts

Fixing the chunking is only step one. Next, you have to fix how the agent interacts with the index. AI models are probabilistic; system tools must be deterministic.

If you expose a raw search_codebase tool to an LLM, it will spam the vector database with terrible queries. In Satori, we enforce Lifecycle Checks via the Model Context Protocol (MCP).

Before the agent is allowed to execute a code edit, the MCP server forces it to use an exact_file_read tool. It cannot rely purely on the semantic search snippet; it must prove it verified the actual file state.

The Takeaway

If you want to build coding agents that actually work in production, you have to stop relying on generic LangChain wrappers. You have to build deterministic tool contracts, and you must respect the AST.

When you align your retrieval pipeline with the actual mechanics of the compiler, the hallucinations drop, and the agent turns into a reliable engineering system.