Skip to content

feat(generate): add generate llms command for per-project llms.txt#7

Merged
dacharyc merged 2 commits into
mainfrom
feat/generate-llms-txt
Jul 2, 2026
Merged

feat(generate): add generate llms command for per-project llms.txt#7
dacharyc merged 2 commits into
mainfrom
feat/generate-llms-txt

Conversation

@dacharyc

@dacharyc dacharyc commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a new generate llms command that produces an llms.txt file for each documentation project, supporting a progressive-disclosure setup: a master llms.txt acts as a sitemap linking to each project's own llms.txt.

For each project's current version (and non-versioned projects), the command:

  • enumerates pages and extracts each page's title (H1) and meta :description:
  • resolves the production URL via the existing resolver, appending .md
  • writes <output-dir>/<project>/llms.txt in the format:
    - [Page Title](https://www.mongodb.com/docs/manual/core/document.md): Description.
  • prints a per-project character-count summary (with and without descriptions), flagging files over the 50,000-character llms.txt guideline

Behavior details

  • Root landing pages use the <root>/index.md form (the <root>.md form 404s in production; <root>/index.md is the real markdown).
  • Substitutions: snooty {+name+} references in titles and descriptions are resolved from the project's snooty.toml [constants].
  • Missing descriptions omit the trailing : description.
  • Excluded: includes/ and code-examples/ directories; deprecated app-services and realm projects; non-project dirs (404, docs-platform, meta, table-of-contents).
  • --no-descriptions flag omits descriptions from written files (useful for oversized projects / iterating on docs).

Flags

--output-dir, --for-project, --no-descriptions, --base-url

Findings from a full run (46 projects)

6 projects exceed 50k with descriptions (atlas, atlas-cli, cloud-manager, manual, mongocli, ops-manager); none of them drop under 50k by removing descriptions alone, so they'll need per-section splitting. The other 40 fit comfortably with descriptions.

Changes

  • New: commands/generate/{generate.go,llms/*}, internal/rst/{meta_parser.go,page_title.go}
  • Modified: main.go (register command), internal/snooty/snooty.go (constants + ResolveSubstitutions), README.md
  • Tests for meta/title parsers, substitution resolution, URL/render helpers

Test plan

  • go build ./...
  • go test ./... ✅ (all packages green)
  • Verified root-index .md URLs return 200 live; confirmed 0 remaining {+...+} refs across all generated files

dacharyc added 2 commits July 2, 2026 10:22
Add a `generate llms` command that produces an llms.txt file for each
documentation project, supporting a progressive-disclosure setup where a
master llms.txt links to each project's own llms.txt.

For each project's current version (and non-versioned projects), it
enumerates pages, extracts the page title and meta :description:, resolves
the production URL (with .md appended), and writes content/<project>/llms.txt.
After writing, it prints a per-project character-count summary (with and
without descriptions) flagging files over the 50k llms.txt guideline.

Details:
- Root landing pages use the <root>/index.md markdown form (no <root>.md).
- Snooty {+name+} substitutions in titles and descriptions are resolved
  from the project's snooty.toml [constants].
- Pages without a description omit the trailing ": description".
- includes/ and code-examples/ dirs and the deprecated app-services and
  realm projects are excluded.
- --no-descriptions flag omits descriptions from written files.

Add internal/rst meta-description and page-title parsers, snooty constants
parsing + ResolveSubstitutions, tests, and README documentation.
Document the generate llms command under a new 0.4.0 release (which also
folds in the previously-unreleased resolve url command) and bump the
version constant in main.go.

Also fix the 0.3.0 release date (2026-01-07, was incorrectly 2025-01-07).

@cbullinger cbullinger left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dacharyc dacharyc merged commit ce07116 into main Jul 2, 2026
2 checks passed
@dacharyc dacharyc deleted the feat/generate-llms-txt branch July 2, 2026 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants