Skip to content
Ahmed Hamza

Flagship independent systems project

Satori

An agent-safe semantic code retrieval system for MCP coding agents, with indexing, hybrid search, deterministic navigation, and hardened lifecycle semantics.

TypeScriptNode.jsMCPVector SearchMilvus/ZillizTree-sitter

Satori

Summary: Satori is my flagship independent systems project: an MCP server and retrieval engine that helps coding agents search real repositories, open exact code spans, inspect symbol outlines, and reason from fresher context before editing.

Why I built it

I kept running into the same failure mode with coding agents: they could edit quickly, but they often started from weak context. File-name guesses, stale indexes, noisy chunks, duplicated results, and broad context dumps made the agent spend extra turns just finding the right code.

Satori started as a direct response to that problem. The goal was not to build a general agent framework. The goal was narrower: make repository investigation more deterministic before an agent touches code.

What it does

Satori indexes a codebase, chunks source with AST-aware boundaries where possible, embeds code into Milvus or Zilliz, and exposes a small MCP surface for coding agents.

The public tool surface is intentionally constrained:

That shape is deliberate. Agents need fewer ambiguous tools, not more knobs. Search finds candidate areas, outline locks symbol spans, call graph adds local relationship context, and read returns bounded file evidence.

Workflow Schematic

[Host IDE / Agent] ─(Stdio JSON-RPC)─➔ [Satori MCP Server]

                                ┌─────────────┴─────────────┐
                                ▼                           ▼
                        [Symbol Graph]               [Vector Engine]
                         (Tree-sitter)                (Milvus/Zilliz)
                                │                           │
                                └─────────────┬─────────────┘

                                    [Bounded Context Chunk]

Terminal Session Output

$ satori search "symbol: getStaticPaths"
[INDEX] Loaded ~/.satori/index.db (142 files, 4,210 chunks)
[SEARCH] Running hybrid search (BM25 + Dense) for "symbol: getStaticPaths"...
[RESULTS] Found 2 matches in 45ms:

  1. src/pages/projects/[slug].astro:L8-14 (Score: 0.94)
     export async function getStaticPaths() {
       const projects = await getCollection('projects');
       return projects.map((project) => ({ ... }));
     }
     
  2. src/pages/posts/[slug].astro:L5-11 (Score: 0.82)

Tech stack

The project is a TypeScript monorepo with three runtime packages:

Under the hood it uses tree-sitter for AST-aware chunking, dense plus BM25 hybrid retrieval, optional VoyageAI reranking, Milvus/Zilliz vector storage, and local snapshot state under ~/.satori.

Key engineering decisions

I treated determinism as the product boundary. A coding agent should know when an index is stale, when a graph is unsupported, when a symbol is ambiguous, and when a reindex is required.

That led to several important constraints:

Problems I ran into

The hardest work was not the first search result. It was keeping lifecycle behavior honest after the happy path broke.

A few examples:

Those bugs changed the architecture. They pushed Satori toward explicit state, retryable failures, deterministic hints, and tests around the edge cases that agents are likely to hit.

Engineering Notes & Lessons Learned

Validation Notes

What I would improve next

I want to keep improving retrieval evaluation, language coverage for call graphs, local-first setup paths, and the public documentation around real workflows. The product boundary should stay small: index, search, navigate, read, and explain lifecycle state clearly.