Building a Claude Code MCP Server in Python: Lessons from codebase-research-agent

The MCP (Model Context Protocol) specification describes a standard way to expose tools to AI agents. The official documentation shows you how to register a tool. What it does not show you is what breaks when you run that server under real load, call it from Claude Code with 40kb of context, or try to stream partial results back to a developer mid-query.

This post covers what I learned building the MCP server that backs codebase-research-agent — a tool-using ReAct agent that researches codebases through semantic search, AST navigation, symbol lookup, and git blame. The agent itself is interesting, but this post is specifically about the server layer: how to wire FastAPI and the MCP SDK together, how Streamable HTTP works in practice, and the four failure modes I hit before the server ran reliably.

What MCP actually is at the protocol level

MCP is JSON-RPC 2.0 over HTTP. The client sends a tools/list request to discover available tools, then sends tools/call requests with arguments. The server responds with structured content — text, images, or embedded resources.

The part that matters for Python servers: MCP supports two transport modes. SSE (Server-Sent Events) is the original transport, where the server pushes events over a long-lived HTTP connection. Streamable HTTP is the newer transport that Claude Code uses — it is closer to regular HTTP with optional streaming, and it handles reconnects and resumption more cleanly than raw SSE.

If you are building for Claude Code specifically, use Streamable HTTP. The SDK makes this straightforward.

Server setup with FastAPI

The MCP Python SDK provides a FastMCP class that handles protocol boilerplate. You define tools as typed Python functions with docstrings, and the SDK generates the JSON schema from the type annotations.

from fastmcp import FastMCP
from fastapi import FastAPI
from pydantic import BaseModel
from typing import Annotated

app = FastAPI()
mcp = FastMCP(
    name="codebase-research-agent",
    instructions="""
    You have access to a codebase research agent. Use semantic_search to find
    relevant code by meaning, then use symbol_lookup or ast_navigate to drill
    into specific functions or classes. Use git_blame for authorship context.
    Always cite file paths and line numbers in your final answer.
    """,
)

mcp.mount(app, path="/mcp")

The instructions field is important. This text is injected into the system prompt when Claude Code uses your server. It should explain what the tools do collectively and how to use them together — not just list them. Claude Code reads this before deciding whether to use your server at all.

Tool registration

Each tool is a typed async function. The docstring becomes the tool description that Claude reads when deciding whether to call it.

@mcp.tool()
async def semantic_search(
    query: Annotated[str, "Natural language description of what you are looking for"],
    top_k: Annotated[int, "Number of results to return (default 10, max 50)"] = 10,
    file_filter: Annotated[str | None, "Optional glob pattern to restrict search, e.g. '**/*.py'"] = None,
) -> str:
    """
    Search the codebase by meaning. Returns ranked code chunks with file paths,
    line ranges, and similarity scores. Best for: finding where a concept is
    implemented, locating related functionality, or exploring unfamiliar code.
    """
    results = await retriever.semantic_search(
        query=query,
        top_k=top_k,
        file_filter=file_filter,
    )
    return format_search_results(results)


@mcp.tool()
async def symbol_lookup(
    symbol: Annotated[str, "Function, class, or method name to look up"],
    include_callers: Annotated[bool, "Whether to include call sites"] = False,
) -> str:
    """
    Look up a specific function, class, or method by name. Returns its
    definition, docstring, signature, and optionally its call sites.
    Use this when you know the exact name of what you are looking for.
    """
    result = await symbol_graph.lookup(
        symbol=symbol,
        include_callers=include_callers,
    )
    if result is None:
        return f"Symbol '{symbol}' not found in the indexed codebase."
    return format_symbol_result(result)

Two things matter here that the documentation undersells. First, the distinction between tools should be clear from their docstrings — Claude decides which tool to call based on that text, not on some internal reasoning about parameter names. If two tools overlap in their descriptions, you will see Claude calling the wrong one. Second, return strings, not dicts. The MCP protocol supports structured content types, but Claude Code currently renders plain text most clearly for developer-facing tools.

Streaming partial results

The default @mcp.tool() decorator returns a single response after the function completes. For long-running operations — indexing a large file, running a complex retrieval pipeline — you want to stream intermediate progress back to the client.

FastMCP supports this via the Context object:

@mcp.tool()
async def index_repository(
    path: Annotated[str, "Absolute path to the repository root"],
    ctx: Context,
) -> str:
    """
    Index a repository for semantic search. This may take 30–120 seconds
    depending on repository size. Progress is streamed as indexing proceeds.
    """
    files = collect_indexable_files(path)
    total = len(files)

    await ctx.report_progress(progress=0, total=total)

    indexed = 0
    errors = []

    for batch in chunk(files, size=20):
        results = await indexer.index_batch(batch)
        indexed += len(batch)
        errors.extend(results.errors)

        await ctx.report_progress(progress=indexed, total=total)

    if errors:
        error_summary = "\n".join(f"  - {e}" for e in errors[:5])
        suffix = f"\n  ... and {len(errors) - 5} more" if len(errors) > 5 else ""
        return f"Indexed {indexed}/{total} files. {len(errors)} errors:\n{error_summary}{suffix}"

    return f"Indexed {indexed} files successfully. Repository is ready for search."

ctx.report_progress sends a progress notification to the client. Claude Code surfaces this as a progress indicator in the UI. The client does not block on it — it continues receiving other events while progress updates stream in.

The four failure modes

1. Context size on large repositories. semantic_search returning 10 results with 200-line chunks each produces ~12,000 tokens of tool output. Multiply that by 5 tool calls in a ReAct loop and you are at 60,000 tokens before Claude has written a word of analysis. The fix is aggressive result truncation: return at most 50 lines per chunk, and include a truncated: true marker so Claude knows to call symbol_lookup for the full definition if needed.

2. Cold start latency on the vector index. The first semantic_search call after server startup loads the pgvector connection pool and warms the embedding model. On a cold Hetzner CPX21 this takes 4–6 seconds. Claude Code has a 10-second tool call timeout. On a slow box this races. The fix: add a health check endpoint that warms the pool on startup, and configure Claude Code's MCP timeout to 30 seconds for this server specifically.

3. Symbol lookup returning nothing on aliased imports. Python's import x as y means the symbol graph stores x but the caller uses y. If Claude asks symbol_lookup("y") it gets nothing. The fix: build an alias map during indexing and resolve aliases before the lookup. The map also handles from module import function — stored as module.function, called as function.

4. Concurrent tool calls corrupting retrieval state. Claude Code sometimes issues two tool calls in parallel when the agent decides two pieces of information are independent. The retriever used a module-level connection that was not safe for concurrent access. The fix: use a connection pool with asyncpg and acquire a connection per-call, not per-server.

Local mode

The server supports a --local flag that swaps the cloud providers for Ollama:

import os
from enum import Enum

class ProviderMode(str, Enum):
    CLOUD = "cloud"
    LOCAL = "local"

def build_retriever(mode: ProviderMode) -> HybridRetriever:
    if mode == ProviderMode.LOCAL:
        return HybridRetriever(
            embedding_model=OllamaEmbeddings(model="nomic-embed-text"),
            reranker=OllamaReranker(model="deepseek-coder"),
            vector_store=PgVectorStore(connection_string=settings.database_url),
        )
    return HybridRetriever(
        embedding_model=OpenAIEmbeddings(model="text-embedding-3-large"),
        reranker=AnthropicReranker(model="claude-sonnet-4-20250514"),
        vector_store=PgVectorStore(connection_string=settings.database_url),
    )

Local mode is slower — nomic-embed-text produces lower-quality embeddings than text-embedding-3-large and Ollama on CPU is not fast — but it means the server can run against a proprietary codebase without sending any code to an external API. For an enterprise client with data-residency requirements this is not optional.

One thing I got wrong

I spent two days trying to make the MCP server also work as a standalone CLI tool, so a developer could run codebase-research ask "where is authentication handled?" without Claude Code. It works, but the maintenance burden is not worth it. The CLI reimplements half the ReAct loop that Claude already handles, and it diverges every time the tool signatures change.

The right split: MCP server for Claude Code integration, separate REST API for programmatic access. The MCP server is for interactive developer workflows. For anything automated — CI, batch analysis, integration tests — a plain FastAPI endpoint with explicit parameters is simpler and more predictable than an agent loop.

Share 𝕏 in

Al Amin Ahamed

Senior software engineer & AI practitioner. 5+ years shipping Laravel platforms, WordPress plugins, WooCommerce extensions, and AI-driven products.

About me →