llmdocumentationBM25mcpsearch

BM25-powered MCP for team documentation

Published: March 17, 2026

The problem

Writing documentation is one of the least loved chores in IT — and one of the best use cases for LLMs. Once the work is done, the chat turns it into a markdown writeup following a shared template and drops it into the repo. The catch is that this kind of assembly-line output quickly becomes hard to navigate — especially for the rest of the team — and opening the repo every time is impractical when most of your work already happens in the chat.

That's where Model Context Protocol (MCP) comes in — an open standard that lets you plug any data source into the LLM as a "tool". Without leaving your flow, you can ask the documentation a question (e.g. how do I safely deploy this app in our infrastructure?) and let the model figure out the right query parameters on its own.

docs-mcp is an MCP server written in Go. It clones a GitHub repo of your choice, indexes the documents with BM25 and exposes them as MCP tools (search, get document, list). The whole setup is one go build and three environment variables.

Why a remote MCP (HTTP) instead of stdio?

MCP defines several transports — local stdio and an HTTP variant. docs-mcp goes with the latter: plain JSON-RPC over HTTP POST. With stdio, every developer needs the repo cloned locally, a token and a running process. Here the server runs centrally and the client only needs a URL:

{
  "mcpServers": {
    "infra-docs": {
      "url": "https://docs-mcp.internal:8000/mcp"
    }
  }
}

Why BM25 instead of embeddings?

A typical Python RAG pipeline requires:

An embedding model (e.g. sentence-transformers) — ~500 MB–2 GB RAM for the model alone
A vector database (FAISS, Chroma, Qdrant)
Python with its dependencies (torch, transformers, numpy)

docs-mcp uses BM25 instead — a classic information-retrieval algorithm based on word statistics (TF-IDF with document-length normalization). Here's how the two approaches compare:

	Python + embeddings	docs-mcp (Go + BM25)
RAM per index	500 MB – 2 GB+ (model + vectors)	~10–50 MB (in-memory inverted index)
Dependencies	torch, transformers, numpy, vector DB	Only Go stdlib + go-git
Startup time	Seconds to minutes (loading the model)	< 1 s (tokenization + index build)
Docker image	2–5 GB	~30 MB (static binary)

BM25 doesn't understand synonyms or semantics — if the docs say "bucket" and you ask about "storage", it won't find a match. docs-mcp works around this with two boosting mechanisms: matches in the file name/path and in tags from the YAML frontmatter:

---
tags: [vpn, staging, wireguard, network]
---
# VPN configuration for the staging environment

It's a simple but effective "human override" for BM25's limitations — the author can add synonyms and domain terms that aren't in the body but tend to show up in real questions. In practice, people searching internal docs use concrete terms ("S3", "terraform", "deploy"), and this small tweak is enough to get relevance to a level where you don't need embeddings at all.

For a team with a few hundred Markdown files, it's a perfectly good solution at a fraction of the operational cost.

How it works in practice

The server clones the repo (shallow clone, single branch)
It parses the Markdown, splits it into chunks and builds the BM25 index
Every SYNC_INTERVAL seconds (30 min by default) it runs git pull and rebuilds the index
Optionally, a GitHub webhook triggers an immediate sync after a push

A developer in Cursor asks: "how do I configure the VPN for staging?" → the AI calls the search_docs tool → BM25 returns the top matching chunks → the AI answers in the context of the team's current documentation.