Skip to main content

Project banner

LlmsTxtKit

LlmsTxtKit is a C#/.NET library that provides a complete pipeline for working with the llms.txt standard: parsing, fetching, validation, caching, and context generation. It also ships as an MCP server, so AI agents can use its capabilities as tools directly.

The Problem It Solves

The llms.txt standard was proposed in late 2024 and has gained real traction among developer-documentation sites--notable adopters include Anthropic, Cloudflare, Stripe, and Vercel. Implementations exist for Python, JavaScript, VitePress, PHP, and Drupal. When LlmsTxtKit was conceived, the entire .NET ecosystem was absent from that list.

Beyond filling the gap, LlmsTxtKit handles a problem most existing implementations ignore: the WAF blocking paradox. AI tools that try to fetch llms.txt files are routinely blocked by the same security infrastructure protecting the sites that publish them. LlmsTxtKit handles this gracefully--configurable retry strategies, user agent management, degradation paths--rather than throwing an exception and calling it done.

What It Does

The library covers five capabilities, each designed to work standalone or as part of the full pipeline:

LlmsTxtKit's five-stage pipeline: raw input through parsing, validation, and context generation.

Parsing takes raw llms.txt content (a Markdown file with a specific structure) and produces a strongly-typed C# object model. It handles well-formed files, malformed files, and the edge cases you encounter in the wild--which are more creative than you'd expect.

Fetching retrieves llms.txt files from the web, including HTTP redirects, WAF challenges, timeouts, and rate limiting. The implementation is designed around the reality that a significant percentage of fetches will be blocked or degraded by security infrastructure. That's not an edge case. It's the default.

Validation checks a parsed file against the specification and reports compliance issues. This overlaps with DocStratum's functionality but is integrated for use in automated pipelines--validation as a gate, not as a standalone analysis.

Caching stores fetched and parsed results with configurable TTL. Particularly important for MCP server usage, where an agent might reference the same site's llms.txt file multiple times during a single task.

Context generation transforms a parsed llms.txt file into structured content optimized for an LLM's context window. The last mile of the pipeline: turning a data structure into something an AI agent can actually use.

MCP Server

LlmsTxtKit ships as an MCP server, exposing its capabilities as tools any MCP-compatible agent can discover and invoke. Fetch, validate, generate context, cache results--all available as tool calls.

Blueprint of an llms.txt context window — the structured output LlmsTxtKit generates for AI agents.

The MCP server is the primary way AI agents interact with LlmsTxtKit. Human developers use the library directly via NuGet.

The Research Connection

LlmsTxtKit is one of three projects in the llms.txt research initiative:

An AI agent and MCP server exchanging structured context across the protocol bridge.

  • LlmsTxtKit provides the tooling
  • DocStratum provides standalone validation with deeper analysis
  • The Context Collapse Mitigation Benchmark uses LlmsTxtKit to test whether curated llms.txt content actually produces better AI responses than raw HTML

The blog documents the research findings as they emerge.

Where to Find It