Skip to content

feat: add AI discoverability layer (llms.txt, skill.md, MCP server)#17

Open
avikalpg wants to merge 2 commits into
mainfrom
nia/ai-discoverability
Open

feat: add AI discoverability layer (llms.txt, skill.md, MCP server)#17
avikalpg wants to merge 2 commits into
mainfrom
nia/ai-discoverability

Conversation

@avikalpg

@avikalpg avikalpg commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

What

Adds the full AI discoverability infrastructure for WildestAI, without any third-party SaaS dependency.

Files added

File Purpose
llms.txt Standard AI discovery file at repo root — compact context for LLMs crawling/cloning the repo
skill.md Agent skill file — tells AI agents exactly how to install and use wild CLI
mcp_server.py MCP server exposing run_wild_diff, list_docs, get_docs, search_docs tools

MCP server tools

  • run_wild_diff(repo_path, args) — runs wild diff on any repo and returns structured results
  • list_docs() — indexes all 9 documentation pages
  • get_docs(name) — fetches any doc by slug
  • search_docs(query) — full-text search across all docs

Also exposes a wildestai://llms.txt MCP resource.

Why

Replaces the need for DocsALot ($100/month) — same AI discoverability features built in-house.

Testing

python3 -c "import mcp_server; print('imports OK')"
python3 -c "from mcp_server import list_docs, search_docs; print(list_docs())"

Summary by CodeRabbit

  • New Features

    • Added standardized JSON schema for DiffGraph v2.0 output format with strict validation rules
    • Launched MCP server exposing tools for running diff analysis with configurable output, searching documentation, and retrieving docs
  • Documentation

    • Added comprehensive getting started guide covering installation, configuration, and CLI usage examples
    • Added LLM context documentation file for AI integration

Nia (Avikalp's assistant) added 2 commits June 8, 2026 22:03
Adds diffgraph/schema/diffgraph-v2.schema.json — the JSON Schema 2020-12
draft that operationalises the v2 output contract from design/JSON-SCHEMA.md.

Covers: FileEntry, SymbolEntry, RelationshipEntry, SummaryEntry, Evidence,
Metadata, Warning, AnalysisSource. Required fields enforced; inferred claims
must carry evidence + confidence. privacy_tier is a top-level required
metadata field. Consumers can use this for validation in CI, typed generation,
and VS Code schema hints.

This satisfies one of the four schema ratification criteria in JSON-SCHEMA.md
(the machine-readable file). Still needs: Avikalp sign-off on sub-questions,
one end-to-end worked example validated, PR #11 updated to target this schema.
- llms.txt at repo root: compact context for LLMs crawling the repo
- skill.md: agent skill file so AI agents know how to use wild CLI
- mcp_server.py: MCP server exposing run_wild_diff, list_docs, get_docs,
  search_docs tools and a wildestai://llms.txt resource

Replaces the need for a paid DocsALot subscription — all AI
discoverability infrastructure built in-house.
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

This PR establishes the complete foundation for DiffGraph v2.0: a canonical JSON Schema specifying the artifact structure, an MCP server implementing programmatic access to diff analysis and documentation, and user guides for both CLI and agent-based usage.

Changes

DiffGraph v2.0 Infrastructure

Layer / File(s) Summary
DiffGraph v2.0 Canonical Schema
diffgraph/schema/diffgraph-v2.schema.json
Defines the JSON Schema for DiffGraph v2.0 artifacts, specifying top-level required fields (schema_version, generated_at, wild_version, diff_ref, files, symbols, relationships, metadata), shared type definitions (AnalysisSource, Evidence), and entity structures (FileEntry, SymbolEntry, RelationshipEntry, SummaryEntry) with conditional validation rules requiring evidence when analysis_source is inferred.
MCP Server Implementation
mcp_server.py
Implements FastMCP server exposing run_wild_diff tool (validates repo, executes wild diff subprocess with optional output, enforces 120s timeout), list_docs tool (aggregates markdown pages from slug set and docs directory), get_docs tool (resolves document names via slug map or file path), search_docs tool (case-insensitive substring search with line context), and llms_txt resource (embedded or read from website directory).
User and Agent Documentation
llms.txt, skill.md
Describes project purpose (wild diff wraps git for AI-powered semantic analysis), installation/quick start, available MCP tools, CLI command variants, output behavior (diffgraph.html generation), environment configuration (OPENAI_API_KEY), and usage notes for scoping diffs and performance considerations.

Sequence Diagram

sequenceDiagram
  participant Client
  participant MCPServer
  participant FileSystem
  participant WildDiff as wild diff<br/>Subprocess
  Client->>MCPServer: run_wild_diff(repo_path, args)
  activate MCPServer
  MCPServer->>FileSystem: validate .git directory exists
  MCPServer->>WildDiff: execute wild diff --no-open
  activate WildDiff
  WildDiff->>FileSystem: generate diffgraph.html or custom output
  WildDiff-->>MCPServer: stdout/stderr, return code
  deactivate WildDiff
  MCPServer-->>Client: success, returncode, output_path
  deactivate MCPServer
  Client->>MCPServer: get_docs(name)
  activate MCPServer
  MCPServer->>FileSystem: resolve via slug map or file path
  MCPServer-->>Client: document content
  deactivate MCPServer
  Client->>MCPServer: search_docs(query)
  activate MCPServer
  MCPServer->>FileSystem: scan markdown files, search content
  MCPServer-->>Client: matches with line context
  deactivate MCPServer
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding AI discoverability infrastructure (llms.txt, skill.md, MCP server) that replaces paid SaaS tooling.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch nia/ai-discoverability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (2)
mcp_server.py (2)

131-137: 💤 Low value

External path references may fail in standalone deployments.

Lines 131-132 reference ../../wildestai/docs/DiffGraph-CLI/ which assumes a specific directory structure outside this repository. If the MCP server is deployed standalone or the monorepo structure differs, these docs will silently be unavailable.

Consider making external doc paths configurable via environment variables, or documenting the expected directory layout.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mcp_server.py` around lines 131 - 137, The built-in pages list uses hardcoded
external relative paths (built_in entries and the loop that resolves full via
REPO_ROOT and full.is_relative_to) which will break in standalone deployments;
update the code to make these doc paths configurable (e.g., read from
environment variables or a config dict) and fall back to bundled/internal copies
if the external path does not exist, by replacing the hardcoded rel_path values
with configurable keys and checking env/config before resolving via REPO_ROOT;
ensure you update the path-resolution logic around REPO_ROOT, full.exists(), and
full.is_relative_to to prefer configured paths and log a warning if unavailable
so the server can run standalone.

46-50: 💤 Low value

Function signature includes output_file not documented in skill.md.

The documented contract in skill.md (line 69) specifies run_wild_diff(repo_path, args), but the implementation adds an output_file parameter. While the parameter has a default value making it backward compatible, the documentation should be updated to reflect the actual interface.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mcp_server.py` around lines 46 - 50, The implementation of run_wild_diff now
accepts an extra parameter output_file (see function run_wild_diff(repo_path:
str, args: str = "", output_file: str = "") in mcp_server.py) but skill.md still
documents run_wild_diff(repo_path, args); update skill.md to include the new
optional output_file parameter and its behavior (default value, effect when
provided) to match the actual function signature, or alternatively remove
output_file from the function if the documented contract must be
preserved—ensure the documentation and the run_wild_diff signature are
consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@diffgraph/schema/diffgraph-v2.schema.json`:
- Around line 101-139: Add "additionalProperties": false to the Evidence object
and change the line number minima to match 1-indexing: set "minimum": 1 on the
"line_start" and "line_end" properties in the Evidence schema; update the
Evidence properties block (the "Evidence" object definition) to include
additionalProperties: false and change the "minimum" values for "line_start" and
"line_end" from 0 to 1 so validation is strict and consistent with the
description.
- Around line 341-366: SummaryEntry currently allows omission of the evidence
field even though analysis_source is const "inferred" and the description
requires evidence; update the SummaryEntry schema so "evidence" is included in
the required array (alongside "text" and "analysis_source") and ensure the
existing "evidence" property continues to reference "`#/`$defs/Evidence" so
inferred summaries must supply evidence entries (including at least one
llm_inference and one structural_basis per the Evidence definition).

In `@llms.txt`:
- Around line 35-37: Replace the external links in llms.txt: keep the GitHub
entry ("https://github.com/WildestAI/DiffGraph-CLI") as-is, change the Website
entry from "https://wildest.ai" to the official "https://wildestai.com", and
update the "Full context" URL from "https://wildest.ai/llms-full.txt" to the
repository's full-context resource (use the repo raw file URL, e.g.
"https://raw.githubusercontent.com/WildestAI/DiffGraph-CLI/main/llms-full.txt")
so all three links point to the correct official resources.

In `@mcp_server.py`:
- Around line 204-208: Replace the broad except Exception in get_docs and
search_docs with targeted exception handlers: catch OSError for filesystem/read
errors and UnicodeDecodeError for encoding issues when calling
target.read_text(), and use the module logger (or processLogger) to log the full
exception details (including stack/str) before returning the error payload; keep
the returned structure the same but populate "error" with the logged exception
message for easier debugging.
- Around line 76-80: The code appends unsanitized args to cmd (variable cmd)
allowing CLI injection; implement a sanitize_args(args: str) helper that
enforces an allowlist (e.g., ALLOWED_FLAGS and ALLOWED_PREFIXES) and rejects
unknown flags, then replace the direct args.split() usage with cmd +=
sanitize_args(args). Ensure sanitize_args disallows bare "--output" (or only
permits "--output=" form if desired) so the explicit output_file handling cannot
be overridden, and raise/return an error for disallowed parts instead of
appending them.
- Around line 185-194: The code builds filesystem paths from the user-controlled
variable name (used in candidate/candidate2) without ensuring the resolved path
remains inside allowed roots (REPO_ROOT or DOCS_DIR), enabling path traversal;
fix by resolving the candidate paths and explicitly checking that each resolved
path is a descendant of its allowed base before assigning to target — e.g.,
after computing candidate = (REPO_ROOT / name).resolve() verify
candidate.is_relative_to(REPO_ROOT.resolve()) (or fall back to comparing string
prefixes of resolved paths) and only accept it if true, and do the same for
candidate2 against DOCS_DIR; if neither check passes, reject the request or
return an error.
- Around line 77-78: The code appends user-supplied output_file directly to cmd,
allowing path traversal; validate and constrain output_file to a designated
output directory (or the repository root) before adding "--output". Implement:
define a base output dir (e.g., OUTPUT_DIR or use the existing repo path
variable), join output_file with that base using os.path.join, resolve with
os.path.abspath/os.path.realpath, and verify with os.path.commonpath that the
resolved path is inside the base; if not, raise/return an error. Ensure the code
in the block that builds cmd (where output_file is checked) replaces the raw
value with the sanitized/resolved path and creates parent directories
(os.makedirs(..., exist_ok=True)) before appending to cmd.

In `@skill.md`:
- Around line 74-86: skill.md currently states "Works with Python 3.8+" but
setup.py's python_requires=">=3.7"; update the documentation to match the
package metadata by changing the text in skill.md from "Works with Python 3.8+"
to "Works with Python 3.7+" (or alternatively, if you intend to require 3.8+,
change setup.py's python_requires to ">=3.8"); locate the string in skill.md and
the python_requires in setup.py to keep both consistent.

---

Nitpick comments:
In `@mcp_server.py`:
- Around line 131-137: The built-in pages list uses hardcoded external relative
paths (built_in entries and the loop that resolves full via REPO_ROOT and
full.is_relative_to) which will break in standalone deployments; update the code
to make these doc paths configurable (e.g., read from environment variables or a
config dict) and fall back to bundled/internal copies if the external path does
not exist, by replacing the hardcoded rel_path values with configurable keys and
checking env/config before resolving via REPO_ROOT; ensure you update the
path-resolution logic around REPO_ROOT, full.exists(), and full.is_relative_to
to prefer configured paths and log a warning if unavailable so the server can
run standalone.
- Around line 46-50: The implementation of run_wild_diff now accepts an extra
parameter output_file (see function run_wild_diff(repo_path: str, args: str =
"", output_file: str = "") in mcp_server.py) but skill.md still documents
run_wild_diff(repo_path, args); update skill.md to include the new optional
output_file parameter and its behavior (default value, effect when provided) to
match the actual function signature, or alternatively remove output_file from
the function if the documented contract must be preserved—ensure the
documentation and the run_wild_diff signature are consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c0f994be-218f-4c85-a648-569d1b929022

📥 Commits

Reviewing files that changed from the base of the PR and between a43abac and b703935.

📒 Files selected for processing (4)
  • diffgraph/schema/diffgraph-v2.schema.json
  • llms.txt
  • mcp_server.py
  • skill.md

Comment on lines +101 to +139
"Evidence": {
"type": "object",
"description": "Pointer to what produced a claim. kind determines which fields are present.",
"required": ["kind"],
"properties": {
"kind": {
"type": "string",
"enum": [
"git_diff_stat",
"git_diff_name_status",
"path_pattern",
"ast_parse",
"import_statement",
"call_site",
"llm_inference",
"structural_basis"
]
},
"file": { "type": "string", "description": "Relevant for ast_parse, import_statement, call_site." },
"line_start": { "type": "integer", "minimum": 0, "description": "1-indexed line number." },
"line_end": { "type": "integer", "minimum": 0 },
"snippet": { "type": "string", "description": "Short source excerpt (signature line or import statement)." },
"pattern": { "type": "string", "description": "Glob/regex pattern (kind=path_pattern)." },
"detail": { "type": "string", "description": "Free-text detail (kind=git_diff_stat/name_status)." },
"model": { "type": "string", "description": "LLM model id (kind=llm_inference)." },
"prompt_ref": { "type": "string", "description": "Internal prompt template reference (kind=llm_inference)." },
"temperature": { "type": "number", "minimum": 0, "maximum": 2, "description": "(kind=llm_inference)." },
"symbol_ids": {
"type": "array",
"items": { "type": "string" },
"description": "Symbol IDs that grounded this inferred claim (kind=structural_basis)."
},
"file_ids": {
"type": "array",
"items": { "type": "string" },
"description": "File IDs that grounded this inferred claim (kind=structural_basis)."
}
}
},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Evidence lacks additionalProperties: false and has inconsistent line number constraints.

Two issues in the Evidence definition:

  1. Unlike all other type definitions in this schema, Evidence does not specify additionalProperties: false. This breaks the strict validation pattern and allows arbitrary extra fields.

  2. line_start and line_end have minimum: 0, but the description states "1-indexed line number". For 1-indexed values, minimum should be 1.

Proposed fix
     "Evidence": {
       "type": "object",
       "description": "Pointer to what produced a claim. kind determines which fields are present.",
       "required": ["kind"],
+      "additionalProperties": false,
       "properties": {
         "kind": {
           "type": "string",
           ...
         },
         "file": { "type": "string", "description": "Relevant for ast_parse, import_statement, call_site." },
-        "line_start": { "type": "integer", "minimum": 0, "description": "1-indexed line number." },
-        "line_end": { "type": "integer", "minimum": 0 },
+        "line_start": { "type": "integer", "minimum": 1, "description": "1-indexed line number." },
+        "line_end": { "type": "integer", "minimum": 1 },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"Evidence": {
"type": "object",
"description": "Pointer to what produced a claim. kind determines which fields are present.",
"required": ["kind"],
"properties": {
"kind": {
"type": "string",
"enum": [
"git_diff_stat",
"git_diff_name_status",
"path_pattern",
"ast_parse",
"import_statement",
"call_site",
"llm_inference",
"structural_basis"
]
},
"file": { "type": "string", "description": "Relevant for ast_parse, import_statement, call_site." },
"line_start": { "type": "integer", "minimum": 0, "description": "1-indexed line number." },
"line_end": { "type": "integer", "minimum": 0 },
"snippet": { "type": "string", "description": "Short source excerpt (signature line or import statement)." },
"pattern": { "type": "string", "description": "Glob/regex pattern (kind=path_pattern)." },
"detail": { "type": "string", "description": "Free-text detail (kind=git_diff_stat/name_status)." },
"model": { "type": "string", "description": "LLM model id (kind=llm_inference)." },
"prompt_ref": { "type": "string", "description": "Internal prompt template reference (kind=llm_inference)." },
"temperature": { "type": "number", "minimum": 0, "maximum": 2, "description": "(kind=llm_inference)." },
"symbol_ids": {
"type": "array",
"items": { "type": "string" },
"description": "Symbol IDs that grounded this inferred claim (kind=structural_basis)."
},
"file_ids": {
"type": "array",
"items": { "type": "string" },
"description": "File IDs that grounded this inferred claim (kind=structural_basis)."
}
}
},
"Evidence": {
"type": "object",
"description": "Pointer to what produced a claim. kind determines which fields are present.",
"required": ["kind"],
"additionalProperties": false,
"properties": {
"kind": {
"type": "string",
"enum": [
"git_diff_stat",
"git_diff_name_status",
"path_pattern",
"ast_parse",
"import_statement",
"call_site",
"llm_inference",
"structural_basis"
]
},
"file": { "type": "string", "description": "Relevant for ast_parse, import_statement, call_site." },
"line_start": { "type": "integer", "minimum": 1, "description": "1-indexed line number." },
"line_end": { "type": "integer", "minimum": 1 },
"snippet": { "type": "string", "description": "Short source excerpt (signature line or import statement)." },
"pattern": { "type": "string", "description": "Glob/regex pattern (kind=path_pattern)." },
"detail": { "type": "string", "description": "Free-text detail (kind=git_diff_stat/name_status)." },
"model": { "type": "string", "description": "LLM model id (kind=llm_inference)." },
"prompt_ref": { "type": "string", "description": "Internal prompt template reference (kind=llm_inference)." },
"temperature": { "type": "number", "minimum": 0, "maximum": 2, "description": "(kind=llm_inference)." },
"symbol_ids": {
"type": "array",
"items": { "type": "string" },
"description": "Symbol IDs that grounded this inferred claim (kind=structural_basis)."
},
"file_ids": {
"type": "array",
"items": { "type": "string" },
"description": "File IDs that grounded this inferred claim (kind=structural_basis)."
}
}
},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@diffgraph/schema/diffgraph-v2.schema.json` around lines 101 - 139, Add
"additionalProperties": false to the Evidence object and change the line number
minima to match 1-indexing: set "minimum": 1 on the "line_start" and "line_end"
properties in the Evidence schema; update the Evidence properties block (the
"Evidence" object definition) to include additionalProperties: false and change
the "minimum" values for "line_start" and "line_end" from 0 to 1 so validation
is strict and consistent with the description.

Comment on lines +341 to +366
"SummaryEntry": {
"type": "object",
"required": ["text", "analysis_source"],
"additionalProperties": false,
"properties": {
"text": {
"type": "string",
"description": "Human-readable summary of the change."
},
"analysis_source": {
"type": "string",
"const": "inferred",
"description": "Summaries are always inferred (require LLM interpretation)."
},
"confidence": {
"type": ["number", "null"],
"minimum": 0,
"maximum": 1
},
"evidence": {
"type": "array",
"items": { "$ref": "#/$defs/Evidence" },
"description": "Must include at least one llm_inference entry and one structural_basis entry."
}
}
},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

SummaryEntry does not enforce required evidence despite the description.

SummaryEntry has analysis_source as a const "inferred", meaning it's always inferred. Per the pattern established by SymbolEntry and RelationshipEntry, inferred claims must have evidence. The description on line 363 states evidence "Must include at least one llm_inference entry", but evidence is not in the required array, so the schema allows SummaryEntry without any evidence.

Proposed fix
     "SummaryEntry": {
       "type": "object",
-      "required": ["text", "analysis_source"],
+      "required": ["text", "analysis_source", "evidence"],
       "additionalProperties": false,
       "properties": {
         ...
         "evidence": {
           "type": "array",
           "items": { "$ref": "`#/`$defs/Evidence" },
-          "description": "Must include at least one llm_inference entry and one structural_basis entry."
+          "description": "Must include at least one llm_inference entry and one structural_basis entry.",
+          "minItems": 1
         }
       }
     },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"SummaryEntry": {
"type": "object",
"required": ["text", "analysis_source"],
"additionalProperties": false,
"properties": {
"text": {
"type": "string",
"description": "Human-readable summary of the change."
},
"analysis_source": {
"type": "string",
"const": "inferred",
"description": "Summaries are always inferred (require LLM interpretation)."
},
"confidence": {
"type": ["number", "null"],
"minimum": 0,
"maximum": 1
},
"evidence": {
"type": "array",
"items": { "$ref": "#/$defs/Evidence" },
"description": "Must include at least one llm_inference entry and one structural_basis entry."
}
}
},
"SummaryEntry": {
"type": "object",
"required": ["text", "analysis_source", "evidence"],
"additionalProperties": false,
"properties": {
"text": {
"type": "string",
"description": "Human-readable summary of the change."
},
"analysis_source": {
"type": "string",
"const": "inferred",
"description": "Summaries are always inferred (require LLM interpretation)."
},
"confidence": {
"type": ["number", "null"],
"minimum": 0,
"maximum": 1
},
"evidence": {
"type": "array",
"items": { "$ref": "`#/`$defs/Evidence" },
"description": "Must include at least one llm_inference entry and one structural_basis entry.",
"minItems": 1
}
}
},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@diffgraph/schema/diffgraph-v2.schema.json` around lines 341 - 366,
SummaryEntry currently allows omission of the evidence field even though
analysis_source is const "inferred" and the description requires evidence;
update the SummaryEntry schema so "evidence" is included in the required array
(alongside "text" and "analysis_source") and ensure the existing "evidence"
property continues to reference "`#/`$defs/Evidence" so inferred summaries must
supply evidence entries (including at least one llm_inference and one
structural_basis per the Evidence definition).

Comment thread llms.txt
Comment on lines +35 to +37
- GitHub: https://github.com/WildestAI/DiffGraph-CLI
- Website: https://wildest.ai
- Full context: https://wildest.ai/llms-full.txt

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Is the GitHub repository accessible at https://github.com/WildestAI/DiffGraph-CLI? Is the website accessible at https://wildest.ai? Does the endpoint https://wildest.ai/llms-full.txt exist and serve the full context for LLMs?

💡 Result:

Yes, the GitHub repository at https://github.com/WildestAI/DiffGraph-CLI is accessible [1][2][3]. Regarding the website https://wildest.ai, it is important to distinguish between potential naming collisions. The domain https://wildest.ai is not the official website for the entity "Wildest AI," which hosts the DiffGraph-CLI repository; the organization Wildest AI lists its website as https://wildestai.com [1]. The domain https://wilds.ai (plural) is a separate platform for AI games and characters [4][5]. The endpoint https://wildest.ai/llms-full.txt does not appear to exist or serve documentation for the Wildest AI organization. A file is accessible at https://wilds.ai/llms-full.txt, which serves content related to the "wilds.ai" platform [6], but this is a distinct entity from the Wildest AI organization associated with DiffGraph-CLI [1].

Citations:


Fix external link URLs in llms.txt (lines 35–37).

  • GitHub link (https://github.com/WildestAI/DiffGraph-CLI) is reachable.
  • Website link should use the official Wildest AI domain: https://wildestai.com (the https://wildest.ai domain appears to be different/not official).
  • “Full context” link currently targets https://wildest.ai/llms-full.txt, which doesn’t appear to exist for the Wildest AI project—update it to the correct full-context URL for this repo.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@llms.txt` around lines 35 - 37, Replace the external links in llms.txt: keep
the GitHub entry ("https://github.com/WildestAI/DiffGraph-CLI") as-is, change
the Website entry from "https://wildest.ai" to the official
"https://wildestai.com", and update the "Full context" URL from
"https://wildest.ai/llms-full.txt" to the repository's full-context resource
(use the repo raw file URL, e.g.
"https://raw.githubusercontent.com/WildestAI/DiffGraph-CLI/main/llms-full.txt")
so all three links point to the correct official resources.

Comment thread mcp_server.py
Comment on lines +76 to +80
cmd = ["wild", "diff", "--no-open"]
if output_file:
cmd += ["--output", output_file]
if args:
cmd += args.split()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Command argument injection via unsanitized args parameter.

The args parameter is split and appended directly to the subprocess command without validation. An attacker-controlled agent input could inject arbitrary CLI flags:

  • args="--output /etc/cron.d/malicious" could overwrite sensitive files (bypassing the output_file parameter entirely)
  • args="../../sensitive/repo" could manipulate path-based arguments

Consider implementing an allowlist of permitted argument patterns or parsing args to extract only known safe flags.

Proposed mitigation approach
# Define allowed arguments
ALLOWED_FLAGS = {"--staged", "--no-open", "--json"}
ALLOWED_PREFIXES = {"--output=", "--format="}

def sanitize_args(args: str) -> list[str]:
    """Parse and validate args, rejecting unknown flags."""
    if not args:
        return []
    parts = args.split()
    sanitized = []
    for part in parts:
        if part in ALLOWED_FLAGS:
            sanitized.append(part)
        elif any(part.startswith(p) for p in ALLOWED_PREFIXES):
            sanitized.append(part)
        elif not part.startswith("-"):
            # Likely a commit ref or path - validate further
            sanitized.append(part)
        else:
            raise ValueError(f"Disallowed argument: {part}")
    return sanitized
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mcp_server.py` around lines 76 - 80, The code appends unsanitized args to cmd
(variable cmd) allowing CLI injection; implement a sanitize_args(args: str)
helper that enforces an allowlist (e.g., ALLOWED_FLAGS and ALLOWED_PREFIXES) and
rejects unknown flags, then replace the direct args.split() usage with cmd +=
sanitize_args(args). Ensure sanitize_args disallows bare "--output" (or only
permits "--output=" form if desired) so the explicit output_file handling cannot
be overridden, and raise/return an error for disallowed parts instead of
appending them.

Source: Linters/SAST tools

Comment thread mcp_server.py
Comment on lines +77 to +78
if output_file:
cmd += ["--output", output_file]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

output_file allows arbitrary filesystem write (path traversal).

The output_file parameter is passed directly to --output without path validation. A malicious input like output_file="/etc/cron.d/backdoor" could write to sensitive system locations.

Constrain output_file to be within the target repository or a designated output directory.

Proposed fix
+def _validate_output_path(output_file: str, repo: Path) -> Path:
+    """Ensure output_file is within repo or use default."""
+    if not output_file:
+        return repo / "diffgraph.html"
+    out_path = Path(output_file).expanduser().resolve()
+    # Must be within the repo directory
+    if not out_path.is_relative_to(repo):
+        raise ValueError(f"output_file must be within repo: {repo}")
+    return out_path

 `@mcp.tool`()
 def run_wild_diff(...) -> dict:
     ...
+    try:
+        validated_output = _validate_output_path(output_file, repo)
+    except ValueError as e:
+        return {"success": False, "error": str(e)}
+
     cmd = ["wild", "diff", "--no-open"]
-    if output_file:
-        cmd += ["--output", output_file]
+    cmd += ["--output", str(validated_output)]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mcp_server.py` around lines 77 - 78, The code appends user-supplied
output_file directly to cmd, allowing path traversal; validate and constrain
output_file to a designated output directory (or the repository root) before
adding "--output". Implement: define a base output dir (e.g., OUTPUT_DIR or use
the existing repo path variable), join output_file with that base using
os.path.join, resolve with os.path.abspath/os.path.realpath, and verify with
os.path.commonpath that the resolved path is inside the base; if not,
raise/return an error. Ensure the code in the block that builds cmd (where
output_file is checked) replaces the raw value with the sanitized/resolved path
and creates parent directories (os.makedirs(..., exist_ok=True)) before
appending to cmd.

Comment thread mcp_server.py
Comment on lines +185 to +194
else:
# Try as relative path from repo root
candidate = (REPO_ROOT / name).resolve()
if candidate.exists():
target = candidate
else:
# Try docs subdir
candidate2 = (DOCS_DIR / name).resolve()
if candidate2.exists():
target = candidate2

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Path traversal vulnerability allows reading arbitrary files.

The name parameter is user-controlled and used directly to construct file paths without validating the resolved path stays within allowed directories. An input like name="../../../../etc/passwd" would resolve to /etc/passwd and be read.

Proposed fix
+def _is_safe_path(path: Path, allowed_roots: list[Path]) -> bool:
+    """Check if resolved path is within any allowed root."""
+    resolved = path.resolve()
+    return any(resolved.is_relative_to(root.resolve()) for root in allowed_roots)

 `@mcp.tool`()
 def get_docs(name: str) -> dict:
     ...
     if name in slug_map:
         target = slug_map[name]
     else:
         # Try as relative path from repo root
         candidate = (REPO_ROOT / name).resolve()
-        if candidate.exists():
+        if candidate.exists() and _is_safe_path(candidate, [REPO_ROOT]):
             target = candidate
         else:
             # Try docs subdir
             candidate2 = (DOCS_DIR / name).resolve()
-            if candidate2.exists():
+            if candidate2.exists() and _is_safe_path(candidate2, [DOCS_DIR]):
                 target = candidate2
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
else:
# Try as relative path from repo root
candidate = (REPO_ROOT / name).resolve()
if candidate.exists():
target = candidate
else:
# Try docs subdir
candidate2 = (DOCS_DIR / name).resolve()
if candidate2.exists():
target = candidate2
else:
# Try as relative path from repo root
candidate = (REPO_ROOT / name).resolve()
if candidate.exists() and _is_safe_path(candidate, [REPO_ROOT]):
target = candidate
else:
# Try docs subdir
candidate2 = (DOCS_DIR / name).resolve()
if candidate2.exists() and _is_safe_path(candidate2, [DOCS_DIR]):
target = candidate2
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mcp_server.py` around lines 185 - 194, The code builds filesystem paths from
the user-controlled variable name (used in candidate/candidate2) without
ensuring the resolved path remains inside allowed roots (REPO_ROOT or DOCS_DIR),
enabling path traversal; fix by resolving the candidate paths and explicitly
checking that each resolved path is a descendant of its allowed base before
assigning to target — e.g., after computing candidate = (REPO_ROOT /
name).resolve() verify candidate.is_relative_to(REPO_ROOT.resolve()) (or fall
back to comparing string prefixes of resolved paths) and only accept it if true,
and do the same for candidate2 against DOCS_DIR; if neither check passes, reject
the request or return an error.

Comment thread mcp_server.py
Comment on lines +204 to +208
try:
content = target.read_text(encoding="utf-8")
return {"found": True, "name": name, "content": content, "error": ""}
except Exception as e:
return {"found": False, "name": name, "content": "", "error": str(e)}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Bare except Exception pattern in both get_docs and search_docs.

Both functions (get_docs at line 207, search_docs at line 250) use bare except Exception which masks specific error causes and makes debugging difficult. The root cause is overly broad exception handling. Replace with specific exception types (OSError, UnicodeDecodeError) and add logging.

🧰 Tools
🪛 Ruff (0.15.15)

[warning] 207-207: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mcp_server.py` around lines 204 - 208, Replace the broad except Exception in
get_docs and search_docs with targeted exception handlers: catch OSError for
filesystem/read errors and UnicodeDecodeError for encoding issues when calling
target.read_text(), and use the module logger (or processLogger) to log the full
exception details (including stack/str) before returning the error payload; keep
the returned structure the same but populate "error" with the logged exception
message for easier debugging.

Source: Linters/SAST tools

Comment thread skill.md
Comment on lines +74 to +86
## Configuration

Environment variables:
- `OPENAI_API_KEY` — required for AI analysis
- Copy `.env.example` to `.env` to set locally

## Notes

- Works with Python 3.8+
- Tested on macOS and Linux
- The CLI wraps `git` — it must be run inside a git repository
- Large diffs may be slow; consider scoping with file paths or commit ranges
- The `.env` file is git-ignored — never commit API keys

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for .env.example file
if [ -f ".env.example" ]; then
  echo ".env.example found ✓"
else
  echo ".env.example NOT found ✗"
fi

# Check .gitignore for .env entry
if grep -q "^\.env$" .gitignore 2>/dev/null; then
  echo ".env in .gitignore ✓"
else
  echo ".env NOT in .gitignore ✗"
fi

# Look for Python version requirement
fd -t f '(pyproject.toml|setup.py|setup.cfg|.python-version)' && \
  echo "Checking Python version requirement:" && \
  rg -A2 "requires-python|python_requires|target.version" --type toml --type python || echo "No explicit Python version found"

Repository: WildestAI/DiffGraph-CLI

Length of output: 237


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== skill.md (lines 70-95) =="
if [ -f "skill.md" ]; then
  sed -n '70,95p' skill.md | nl -ba
else
  echo "skill.md not found"
fi

echo
echo "== Python version declarations (repo-wide) =="
rg -n --hidden --no-ignore-vcs -S \
  'requires-python|python_requires|target\.version|\.python-version|setup-python|actions/setup-python|Programming Language :: Python ::' .

echo
echo "== setup.py python_requires excerpt =="
if [ -f "setup.py" ]; then
  rg -n 'python_requires' setup.py || true
  rg -n --context 3 'classifiers\s*=|Programming Language :: Python ::' setup.py || true
else
  echo "setup.py not found"
fi

echo
echo "== pyproject.toml requires-python excerpt =="
if [ -f "pyproject.toml" ]; then
  rg -n 'requires-python' pyproject.toml || true
  rg -n --context 3 'tool\.poetry|project\.requires-python|project\s*\{' pyproject.toml || true
else
  echo "pyproject.toml not found"
fi

echo
echo "== .python-version file =="
if [ -f ".python-version" ]; then
  echo ".python-version exists: $(cat .python-version)"
else
  echo ".python-version not found"
fi

echo
echo "== CI (workflows) setup-python versions =="
if [ -d ".github/workflows" ]; then
  rg -n --hidden --no-ignore-vcs -S 'setup-python|python-version' .github/workflows || true
else
  echo ".github/workflows not found"
fi

Repository: WildestAI/DiffGraph-CLI

Length of output: 157


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== skill.md (lines 70-95) =="
if [ -f "skill.md" ]; then
  awk 'NR>=70 && NR<=95 {printf "%d:%s\n", NR, $0}' skill.md
else
  echo "skill.md not found"
fi

echo
echo "== .gitignore (.env entry) =="
if [ -f ".gitignore" ]; then
  rg -n '^\s*\.env\s*$' .gitignore || true
  echo "-- matches for '.env' in .gitignore:"
  rg -n '\.env' .gitignore || true
else
  echo ".gitignore not found"
fi

echo
echo "== Check for python minimum version declarations =="
for f in pyproject.toml setup.py setup.cfg .python-version .github/workflows/*.yml .github/workflows/*.yaml; do
  if [ -e "$f" ]; then
    echo "-- $f"
    rg -n 'requires-python|python_requires|\.python-version|setup-python|python-version|Programming Language :: Python ::' "$f" || true
  fi
done

echo
echo "== List likely python config files found =="
fd -t f -d 4 -g 'pyproject.toml' -g 'setup.py' -g 'setup.cfg' -g 'requirements*.txt' -g '.python-version' . 2>/dev/null || true
fd -t f -d 5 -g '*.yml' -g '*.yaml' .github/workflows 2>/dev/null || true

Repository: WildestAI/DiffGraph-CLI

Length of output: 1196


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Repo root python config presence =="
ls -la | sed -n '1,200p' | rg -n 'pyproject\.toml|setup\.py|setup\.cfg|\.python-version|requirements.*\.txt|Pipfile|poetry\.lock' || true

echo
echo "== Find pyproject/setup.cfg/.python-version =="
fd -t f -d 4 -g 'pyproject.toml' -g 'setup.cfg' -g '.python-version' . 2>/dev/null || true

echo
echo "== Find CI python versions (workflows) =="
fd -t f -d 6 -g '*.yml' -g '*.yaml' .github/workflows 2>/dev/null || true
rg -n 'setup-python|python-version|actions/setup-python' .github/workflows 2>/dev/null || true

echo
echo "== Repo-wide python version hints =="
rg -n 'requires-python|python_requires|\.python-version|setup-python|python-version|Programming Language :: Python ::' . || true

Repository: WildestAI/DiffGraph-CLI

Length of output: 724


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== .env.example existence =="
if [ -f ".env.example" ]; then
  echo ".env.example found"
  sed -n '1,120p' .env.example | awk '{print NR":"$0}'
else
  echo ".env.example NOT found"
fi

Repository: WildestAI/DiffGraph-CLI

Length of output: 261


Fix Python version requirement mismatch in skill.md

  • .env.example exists at the repo root, and .gitignore ignores .env (while keeping !.env.example), so that guidance is correct.
  • skill.md says “Works with Python 3.8+” (line 82), but setup.py declares python_requires=">=3.7" (line 15); align these by updating skill.md to 3.7+ or tightening setup.py to 3.8+.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skill.md` around lines 74 - 86, skill.md currently states "Works with Python
3.8+" but setup.py's python_requires=">=3.7"; update the documentation to match
the package metadata by changing the text in skill.md from "Works with Python
3.8+" to "Works with Python 3.7+" (or alternatively, if you intend to require
3.8+, change setup.py's python_requires to ">=3.8"); locate the string in
skill.md and the python_requires in setup.py to keep both consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant