feat: add AI discoverability layer (llms.txt, skill.md, MCP server)#17
feat: add AI discoverability layer (llms.txt, skill.md, MCP server)#17avikalpg wants to merge 2 commits into
Conversation
Adds diffgraph/schema/diffgraph-v2.schema.json — the JSON Schema 2020-12 draft that operationalises the v2 output contract from design/JSON-SCHEMA.md. Covers: FileEntry, SymbolEntry, RelationshipEntry, SummaryEntry, Evidence, Metadata, Warning, AnalysisSource. Required fields enforced; inferred claims must carry evidence + confidence. privacy_tier is a top-level required metadata field. Consumers can use this for validation in CI, typed generation, and VS Code schema hints. This satisfies one of the four schema ratification criteria in JSON-SCHEMA.md (the machine-readable file). Still needs: Avikalp sign-off on sub-questions, one end-to-end worked example validated, PR #11 updated to target this schema.
- llms.txt at repo root: compact context for LLMs crawling the repo - skill.md: agent skill file so AI agents know how to use wild CLI - mcp_server.py: MCP server exposing run_wild_diff, list_docs, get_docs, search_docs tools and a wildestai://llms.txt resource Replaces the need for a paid DocsALot subscription — all AI discoverability infrastructure built in-house.
WalkthroughThis PR establishes the complete foundation for DiffGraph v2.0: a canonical JSON Schema specifying the artifact structure, an MCP server implementing programmatic access to diff analysis and documentation, and user guides for both CLI and agent-based usage. ChangesDiffGraph v2.0 Infrastructure
Sequence DiagramsequenceDiagram
participant Client
participant MCPServer
participant FileSystem
participant WildDiff as wild diff<br/>Subprocess
Client->>MCPServer: run_wild_diff(repo_path, args)
activate MCPServer
MCPServer->>FileSystem: validate .git directory exists
MCPServer->>WildDiff: execute wild diff --no-open
activate WildDiff
WildDiff->>FileSystem: generate diffgraph.html or custom output
WildDiff-->>MCPServer: stdout/stderr, return code
deactivate WildDiff
MCPServer-->>Client: success, returncode, output_path
deactivate MCPServer
Client->>MCPServer: get_docs(name)
activate MCPServer
MCPServer->>FileSystem: resolve via slug map or file path
MCPServer-->>Client: document content
deactivate MCPServer
Client->>MCPServer: search_docs(query)
activate MCPServer
MCPServer->>FileSystem: scan markdown files, search content
MCPServer-->>Client: matches with line context
deactivate MCPServer
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (2)
mcp_server.py (2)
131-137: 💤 Low valueExternal path references may fail in standalone deployments.
Lines 131-132 reference
../../wildestai/docs/DiffGraph-CLI/which assumes a specific directory structure outside this repository. If the MCP server is deployed standalone or the monorepo structure differs, these docs will silently be unavailable.Consider making external doc paths configurable via environment variables, or documenting the expected directory layout.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mcp_server.py` around lines 131 - 137, The built-in pages list uses hardcoded external relative paths (built_in entries and the loop that resolves full via REPO_ROOT and full.is_relative_to) which will break in standalone deployments; update the code to make these doc paths configurable (e.g., read from environment variables or a config dict) and fall back to bundled/internal copies if the external path does not exist, by replacing the hardcoded rel_path values with configurable keys and checking env/config before resolving via REPO_ROOT; ensure you update the path-resolution logic around REPO_ROOT, full.exists(), and full.is_relative_to to prefer configured paths and log a warning if unavailable so the server can run standalone.
46-50: 💤 Low valueFunction signature includes
output_filenot documented in skill.md.The documented contract in skill.md (line 69) specifies
run_wild_diff(repo_path, args), but the implementation adds anoutput_fileparameter. While the parameter has a default value making it backward compatible, the documentation should be updated to reflect the actual interface.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mcp_server.py` around lines 46 - 50, The implementation of run_wild_diff now accepts an extra parameter output_file (see function run_wild_diff(repo_path: str, args: str = "", output_file: str = "") in mcp_server.py) but skill.md still documents run_wild_diff(repo_path, args); update skill.md to include the new optional output_file parameter and its behavior (default value, effect when provided) to match the actual function signature, or alternatively remove output_file from the function if the documented contract must be preserved—ensure the documentation and the run_wild_diff signature are consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@diffgraph/schema/diffgraph-v2.schema.json`:
- Around line 101-139: Add "additionalProperties": false to the Evidence object
and change the line number minima to match 1-indexing: set "minimum": 1 on the
"line_start" and "line_end" properties in the Evidence schema; update the
Evidence properties block (the "Evidence" object definition) to include
additionalProperties: false and change the "minimum" values for "line_start" and
"line_end" from 0 to 1 so validation is strict and consistent with the
description.
- Around line 341-366: SummaryEntry currently allows omission of the evidence
field even though analysis_source is const "inferred" and the description
requires evidence; update the SummaryEntry schema so "evidence" is included in
the required array (alongside "text" and "analysis_source") and ensure the
existing "evidence" property continues to reference "`#/`$defs/Evidence" so
inferred summaries must supply evidence entries (including at least one
llm_inference and one structural_basis per the Evidence definition).
In `@llms.txt`:
- Around line 35-37: Replace the external links in llms.txt: keep the GitHub
entry ("https://github.com/WildestAI/DiffGraph-CLI") as-is, change the Website
entry from "https://wildest.ai" to the official "https://wildestai.com", and
update the "Full context" URL from "https://wildest.ai/llms-full.txt" to the
repository's full-context resource (use the repo raw file URL, e.g.
"https://raw.githubusercontent.com/WildestAI/DiffGraph-CLI/main/llms-full.txt")
so all three links point to the correct official resources.
In `@mcp_server.py`:
- Around line 204-208: Replace the broad except Exception in get_docs and
search_docs with targeted exception handlers: catch OSError for filesystem/read
errors and UnicodeDecodeError for encoding issues when calling
target.read_text(), and use the module logger (or processLogger) to log the full
exception details (including stack/str) before returning the error payload; keep
the returned structure the same but populate "error" with the logged exception
message for easier debugging.
- Around line 76-80: The code appends unsanitized args to cmd (variable cmd)
allowing CLI injection; implement a sanitize_args(args: str) helper that
enforces an allowlist (e.g., ALLOWED_FLAGS and ALLOWED_PREFIXES) and rejects
unknown flags, then replace the direct args.split() usage with cmd +=
sanitize_args(args). Ensure sanitize_args disallows bare "--output" (or only
permits "--output=" form if desired) so the explicit output_file handling cannot
be overridden, and raise/return an error for disallowed parts instead of
appending them.
- Around line 185-194: The code builds filesystem paths from the user-controlled
variable name (used in candidate/candidate2) without ensuring the resolved path
remains inside allowed roots (REPO_ROOT or DOCS_DIR), enabling path traversal;
fix by resolving the candidate paths and explicitly checking that each resolved
path is a descendant of its allowed base before assigning to target — e.g.,
after computing candidate = (REPO_ROOT / name).resolve() verify
candidate.is_relative_to(REPO_ROOT.resolve()) (or fall back to comparing string
prefixes of resolved paths) and only accept it if true, and do the same for
candidate2 against DOCS_DIR; if neither check passes, reject the request or
return an error.
- Around line 77-78: The code appends user-supplied output_file directly to cmd,
allowing path traversal; validate and constrain output_file to a designated
output directory (or the repository root) before adding "--output". Implement:
define a base output dir (e.g., OUTPUT_DIR or use the existing repo path
variable), join output_file with that base using os.path.join, resolve with
os.path.abspath/os.path.realpath, and verify with os.path.commonpath that the
resolved path is inside the base; if not, raise/return an error. Ensure the code
in the block that builds cmd (where output_file is checked) replaces the raw
value with the sanitized/resolved path and creates parent directories
(os.makedirs(..., exist_ok=True)) before appending to cmd.
In `@skill.md`:
- Around line 74-86: skill.md currently states "Works with Python 3.8+" but
setup.py's python_requires=">=3.7"; update the documentation to match the
package metadata by changing the text in skill.md from "Works with Python 3.8+"
to "Works with Python 3.7+" (or alternatively, if you intend to require 3.8+,
change setup.py's python_requires to ">=3.8"); locate the string in skill.md and
the python_requires in setup.py to keep both consistent.
---
Nitpick comments:
In `@mcp_server.py`:
- Around line 131-137: The built-in pages list uses hardcoded external relative
paths (built_in entries and the loop that resolves full via REPO_ROOT and
full.is_relative_to) which will break in standalone deployments; update the code
to make these doc paths configurable (e.g., read from environment variables or a
config dict) and fall back to bundled/internal copies if the external path does
not exist, by replacing the hardcoded rel_path values with configurable keys and
checking env/config before resolving via REPO_ROOT; ensure you update the
path-resolution logic around REPO_ROOT, full.exists(), and full.is_relative_to
to prefer configured paths and log a warning if unavailable so the server can
run standalone.
- Around line 46-50: The implementation of run_wild_diff now accepts an extra
parameter output_file (see function run_wild_diff(repo_path: str, args: str =
"", output_file: str = "") in mcp_server.py) but skill.md still documents
run_wild_diff(repo_path, args); update skill.md to include the new optional
output_file parameter and its behavior (default value, effect when provided) to
match the actual function signature, or alternatively remove output_file from
the function if the documented contract must be preserved—ensure the
documentation and the run_wild_diff signature are consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c0f994be-218f-4c85-a648-569d1b929022
📒 Files selected for processing (4)
diffgraph/schema/diffgraph-v2.schema.jsonllms.txtmcp_server.pyskill.md
| "Evidence": { | ||
| "type": "object", | ||
| "description": "Pointer to what produced a claim. kind determines which fields are present.", | ||
| "required": ["kind"], | ||
| "properties": { | ||
| "kind": { | ||
| "type": "string", | ||
| "enum": [ | ||
| "git_diff_stat", | ||
| "git_diff_name_status", | ||
| "path_pattern", | ||
| "ast_parse", | ||
| "import_statement", | ||
| "call_site", | ||
| "llm_inference", | ||
| "structural_basis" | ||
| ] | ||
| }, | ||
| "file": { "type": "string", "description": "Relevant for ast_parse, import_statement, call_site." }, | ||
| "line_start": { "type": "integer", "minimum": 0, "description": "1-indexed line number." }, | ||
| "line_end": { "type": "integer", "minimum": 0 }, | ||
| "snippet": { "type": "string", "description": "Short source excerpt (signature line or import statement)." }, | ||
| "pattern": { "type": "string", "description": "Glob/regex pattern (kind=path_pattern)." }, | ||
| "detail": { "type": "string", "description": "Free-text detail (kind=git_diff_stat/name_status)." }, | ||
| "model": { "type": "string", "description": "LLM model id (kind=llm_inference)." }, | ||
| "prompt_ref": { "type": "string", "description": "Internal prompt template reference (kind=llm_inference)." }, | ||
| "temperature": { "type": "number", "minimum": 0, "maximum": 2, "description": "(kind=llm_inference)." }, | ||
| "symbol_ids": { | ||
| "type": "array", | ||
| "items": { "type": "string" }, | ||
| "description": "Symbol IDs that grounded this inferred claim (kind=structural_basis)." | ||
| }, | ||
| "file_ids": { | ||
| "type": "array", | ||
| "items": { "type": "string" }, | ||
| "description": "File IDs that grounded this inferred claim (kind=structural_basis)." | ||
| } | ||
| } | ||
| }, |
There was a problem hiding this comment.
Evidence lacks additionalProperties: false and has inconsistent line number constraints.
Two issues in the Evidence definition:
-
Unlike all other type definitions in this schema,
Evidencedoes not specifyadditionalProperties: false. This breaks the strict validation pattern and allows arbitrary extra fields. -
line_startandline_endhaveminimum: 0, but the description states "1-indexed line number". For 1-indexed values, minimum should be1.
Proposed fix
"Evidence": {
"type": "object",
"description": "Pointer to what produced a claim. kind determines which fields are present.",
"required": ["kind"],
+ "additionalProperties": false,
"properties": {
"kind": {
"type": "string",
...
},
"file": { "type": "string", "description": "Relevant for ast_parse, import_statement, call_site." },
- "line_start": { "type": "integer", "minimum": 0, "description": "1-indexed line number." },
- "line_end": { "type": "integer", "minimum": 0 },
+ "line_start": { "type": "integer", "minimum": 1, "description": "1-indexed line number." },
+ "line_end": { "type": "integer", "minimum": 1 },📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "Evidence": { | |
| "type": "object", | |
| "description": "Pointer to what produced a claim. kind determines which fields are present.", | |
| "required": ["kind"], | |
| "properties": { | |
| "kind": { | |
| "type": "string", | |
| "enum": [ | |
| "git_diff_stat", | |
| "git_diff_name_status", | |
| "path_pattern", | |
| "ast_parse", | |
| "import_statement", | |
| "call_site", | |
| "llm_inference", | |
| "structural_basis" | |
| ] | |
| }, | |
| "file": { "type": "string", "description": "Relevant for ast_parse, import_statement, call_site." }, | |
| "line_start": { "type": "integer", "minimum": 0, "description": "1-indexed line number." }, | |
| "line_end": { "type": "integer", "minimum": 0 }, | |
| "snippet": { "type": "string", "description": "Short source excerpt (signature line or import statement)." }, | |
| "pattern": { "type": "string", "description": "Glob/regex pattern (kind=path_pattern)." }, | |
| "detail": { "type": "string", "description": "Free-text detail (kind=git_diff_stat/name_status)." }, | |
| "model": { "type": "string", "description": "LLM model id (kind=llm_inference)." }, | |
| "prompt_ref": { "type": "string", "description": "Internal prompt template reference (kind=llm_inference)." }, | |
| "temperature": { "type": "number", "minimum": 0, "maximum": 2, "description": "(kind=llm_inference)." }, | |
| "symbol_ids": { | |
| "type": "array", | |
| "items": { "type": "string" }, | |
| "description": "Symbol IDs that grounded this inferred claim (kind=structural_basis)." | |
| }, | |
| "file_ids": { | |
| "type": "array", | |
| "items": { "type": "string" }, | |
| "description": "File IDs that grounded this inferred claim (kind=structural_basis)." | |
| } | |
| } | |
| }, | |
| "Evidence": { | |
| "type": "object", | |
| "description": "Pointer to what produced a claim. kind determines which fields are present.", | |
| "required": ["kind"], | |
| "additionalProperties": false, | |
| "properties": { | |
| "kind": { | |
| "type": "string", | |
| "enum": [ | |
| "git_diff_stat", | |
| "git_diff_name_status", | |
| "path_pattern", | |
| "ast_parse", | |
| "import_statement", | |
| "call_site", | |
| "llm_inference", | |
| "structural_basis" | |
| ] | |
| }, | |
| "file": { "type": "string", "description": "Relevant for ast_parse, import_statement, call_site." }, | |
| "line_start": { "type": "integer", "minimum": 1, "description": "1-indexed line number." }, | |
| "line_end": { "type": "integer", "minimum": 1 }, | |
| "snippet": { "type": "string", "description": "Short source excerpt (signature line or import statement)." }, | |
| "pattern": { "type": "string", "description": "Glob/regex pattern (kind=path_pattern)." }, | |
| "detail": { "type": "string", "description": "Free-text detail (kind=git_diff_stat/name_status)." }, | |
| "model": { "type": "string", "description": "LLM model id (kind=llm_inference)." }, | |
| "prompt_ref": { "type": "string", "description": "Internal prompt template reference (kind=llm_inference)." }, | |
| "temperature": { "type": "number", "minimum": 0, "maximum": 2, "description": "(kind=llm_inference)." }, | |
| "symbol_ids": { | |
| "type": "array", | |
| "items": { "type": "string" }, | |
| "description": "Symbol IDs that grounded this inferred claim (kind=structural_basis)." | |
| }, | |
| "file_ids": { | |
| "type": "array", | |
| "items": { "type": "string" }, | |
| "description": "File IDs that grounded this inferred claim (kind=structural_basis)." | |
| } | |
| } | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@diffgraph/schema/diffgraph-v2.schema.json` around lines 101 - 139, Add
"additionalProperties": false to the Evidence object and change the line number
minima to match 1-indexing: set "minimum": 1 on the "line_start" and "line_end"
properties in the Evidence schema; update the Evidence properties block (the
"Evidence" object definition) to include additionalProperties: false and change
the "minimum" values for "line_start" and "line_end" from 0 to 1 so validation
is strict and consistent with the description.
| "SummaryEntry": { | ||
| "type": "object", | ||
| "required": ["text", "analysis_source"], | ||
| "additionalProperties": false, | ||
| "properties": { | ||
| "text": { | ||
| "type": "string", | ||
| "description": "Human-readable summary of the change." | ||
| }, | ||
| "analysis_source": { | ||
| "type": "string", | ||
| "const": "inferred", | ||
| "description": "Summaries are always inferred (require LLM interpretation)." | ||
| }, | ||
| "confidence": { | ||
| "type": ["number", "null"], | ||
| "minimum": 0, | ||
| "maximum": 1 | ||
| }, | ||
| "evidence": { | ||
| "type": "array", | ||
| "items": { "$ref": "#/$defs/Evidence" }, | ||
| "description": "Must include at least one llm_inference entry and one structural_basis entry." | ||
| } | ||
| } | ||
| }, |
There was a problem hiding this comment.
SummaryEntry does not enforce required evidence despite the description.
SummaryEntry has analysis_source as a const "inferred", meaning it's always inferred. Per the pattern established by SymbolEntry and RelationshipEntry, inferred claims must have evidence. The description on line 363 states evidence "Must include at least one llm_inference entry", but evidence is not in the required array, so the schema allows SummaryEntry without any evidence.
Proposed fix
"SummaryEntry": {
"type": "object",
- "required": ["text", "analysis_source"],
+ "required": ["text", "analysis_source", "evidence"],
"additionalProperties": false,
"properties": {
...
"evidence": {
"type": "array",
"items": { "$ref": "`#/`$defs/Evidence" },
- "description": "Must include at least one llm_inference entry and one structural_basis entry."
+ "description": "Must include at least one llm_inference entry and one structural_basis entry.",
+ "minItems": 1
}
}
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "SummaryEntry": { | |
| "type": "object", | |
| "required": ["text", "analysis_source"], | |
| "additionalProperties": false, | |
| "properties": { | |
| "text": { | |
| "type": "string", | |
| "description": "Human-readable summary of the change." | |
| }, | |
| "analysis_source": { | |
| "type": "string", | |
| "const": "inferred", | |
| "description": "Summaries are always inferred (require LLM interpretation)." | |
| }, | |
| "confidence": { | |
| "type": ["number", "null"], | |
| "minimum": 0, | |
| "maximum": 1 | |
| }, | |
| "evidence": { | |
| "type": "array", | |
| "items": { "$ref": "#/$defs/Evidence" }, | |
| "description": "Must include at least one llm_inference entry and one structural_basis entry." | |
| } | |
| } | |
| }, | |
| "SummaryEntry": { | |
| "type": "object", | |
| "required": ["text", "analysis_source", "evidence"], | |
| "additionalProperties": false, | |
| "properties": { | |
| "text": { | |
| "type": "string", | |
| "description": "Human-readable summary of the change." | |
| }, | |
| "analysis_source": { | |
| "type": "string", | |
| "const": "inferred", | |
| "description": "Summaries are always inferred (require LLM interpretation)." | |
| }, | |
| "confidence": { | |
| "type": ["number", "null"], | |
| "minimum": 0, | |
| "maximum": 1 | |
| }, | |
| "evidence": { | |
| "type": "array", | |
| "items": { "$ref": "`#/`$defs/Evidence" }, | |
| "description": "Must include at least one llm_inference entry and one structural_basis entry.", | |
| "minItems": 1 | |
| } | |
| } | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@diffgraph/schema/diffgraph-v2.schema.json` around lines 341 - 366,
SummaryEntry currently allows omission of the evidence field even though
analysis_source is const "inferred" and the description requires evidence;
update the SummaryEntry schema so "evidence" is included in the required array
(alongside "text" and "analysis_source") and ensure the existing "evidence"
property continues to reference "`#/`$defs/Evidence" so inferred summaries must
supply evidence entries (including at least one llm_inference and one
structural_basis per the Evidence definition).
| - GitHub: https://github.com/WildestAI/DiffGraph-CLI | ||
| - Website: https://wildest.ai | ||
| - Full context: https://wildest.ai/llms-full.txt |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Is the GitHub repository accessible at https://github.com/WildestAI/DiffGraph-CLI? Is the website accessible at https://wildest.ai? Does the endpoint https://wildest.ai/llms-full.txt exist and serve the full context for LLMs?
💡 Result:
Yes, the GitHub repository at https://github.com/WildestAI/DiffGraph-CLI is accessible [1][2][3]. Regarding the website https://wildest.ai, it is important to distinguish between potential naming collisions. The domain https://wildest.ai is not the official website for the entity "Wildest AI," which hosts the DiffGraph-CLI repository; the organization Wildest AI lists its website as https://wildestai.com [1]. The domain https://wilds.ai (plural) is a separate platform for AI games and characters [4][5]. The endpoint https://wildest.ai/llms-full.txt does not appear to exist or serve documentation for the Wildest AI organization. A file is accessible at https://wilds.ai/llms-full.txt, which serves content related to the "wilds.ai" platform [6], but this is a distinct entity from the Wildest AI organization associated with DiffGraph-CLI [1].
Citations:
- 1: https://github.com/WildestAI
- 2: https://github.com/WildestAI/DiffGraph-CLI/blob/main/.env.example
- 3: https://github.com/WildestAI/DiffGraph-CLI/issues
- 4: https://wilds.ai/
- 5: https://wilds.ai/about
- 6: https://wilds.ai/llms-full.txt
Fix external link URLs in llms.txt (lines 35–37).
- GitHub link (
https://github.com/WildestAI/DiffGraph-CLI) is reachable. - Website link should use the official Wildest AI domain:
https://wildestai.com(thehttps://wildest.aidomain appears to be different/not official). - “Full context” link currently targets
https://wildest.ai/llms-full.txt, which doesn’t appear to exist for the Wildest AI project—update it to the correct full-context URL for this repo.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@llms.txt` around lines 35 - 37, Replace the external links in llms.txt: keep
the GitHub entry ("https://github.com/WildestAI/DiffGraph-CLI") as-is, change
the Website entry from "https://wildest.ai" to the official
"https://wildestai.com", and update the "Full context" URL from
"https://wildest.ai/llms-full.txt" to the repository's full-context resource
(use the repo raw file URL, e.g.
"https://raw.githubusercontent.com/WildestAI/DiffGraph-CLI/main/llms-full.txt")
so all three links point to the correct official resources.
| cmd = ["wild", "diff", "--no-open"] | ||
| if output_file: | ||
| cmd += ["--output", output_file] | ||
| if args: | ||
| cmd += args.split() |
There was a problem hiding this comment.
Command argument injection via unsanitized args parameter.
The args parameter is split and appended directly to the subprocess command without validation. An attacker-controlled agent input could inject arbitrary CLI flags:
args="--output /etc/cron.d/malicious"could overwrite sensitive files (bypassing theoutput_fileparameter entirely)args="../../sensitive/repo"could manipulate path-based arguments
Consider implementing an allowlist of permitted argument patterns or parsing args to extract only known safe flags.
Proposed mitigation approach
# Define allowed arguments
ALLOWED_FLAGS = {"--staged", "--no-open", "--json"}
ALLOWED_PREFIXES = {"--output=", "--format="}
def sanitize_args(args: str) -> list[str]:
"""Parse and validate args, rejecting unknown flags."""
if not args:
return []
parts = args.split()
sanitized = []
for part in parts:
if part in ALLOWED_FLAGS:
sanitized.append(part)
elif any(part.startswith(p) for p in ALLOWED_PREFIXES):
sanitized.append(part)
elif not part.startswith("-"):
# Likely a commit ref or path - validate further
sanitized.append(part)
else:
raise ValueError(f"Disallowed argument: {part}")
return sanitized🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mcp_server.py` around lines 76 - 80, The code appends unsanitized args to cmd
(variable cmd) allowing CLI injection; implement a sanitize_args(args: str)
helper that enforces an allowlist (e.g., ALLOWED_FLAGS and ALLOWED_PREFIXES) and
rejects unknown flags, then replace the direct args.split() usage with cmd +=
sanitize_args(args). Ensure sanitize_args disallows bare "--output" (or only
permits "--output=" form if desired) so the explicit output_file handling cannot
be overridden, and raise/return an error for disallowed parts instead of
appending them.
Source: Linters/SAST tools
| if output_file: | ||
| cmd += ["--output", output_file] |
There was a problem hiding this comment.
output_file allows arbitrary filesystem write (path traversal).
The output_file parameter is passed directly to --output without path validation. A malicious input like output_file="/etc/cron.d/backdoor" could write to sensitive system locations.
Constrain output_file to be within the target repository or a designated output directory.
Proposed fix
+def _validate_output_path(output_file: str, repo: Path) -> Path:
+ """Ensure output_file is within repo or use default."""
+ if not output_file:
+ return repo / "diffgraph.html"
+ out_path = Path(output_file).expanduser().resolve()
+ # Must be within the repo directory
+ if not out_path.is_relative_to(repo):
+ raise ValueError(f"output_file must be within repo: {repo}")
+ return out_path
`@mcp.tool`()
def run_wild_diff(...) -> dict:
...
+ try:
+ validated_output = _validate_output_path(output_file, repo)
+ except ValueError as e:
+ return {"success": False, "error": str(e)}
+
cmd = ["wild", "diff", "--no-open"]
- if output_file:
- cmd += ["--output", output_file]
+ cmd += ["--output", str(validated_output)]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mcp_server.py` around lines 77 - 78, The code appends user-supplied
output_file directly to cmd, allowing path traversal; validate and constrain
output_file to a designated output directory (or the repository root) before
adding "--output". Implement: define a base output dir (e.g., OUTPUT_DIR or use
the existing repo path variable), join output_file with that base using
os.path.join, resolve with os.path.abspath/os.path.realpath, and verify with
os.path.commonpath that the resolved path is inside the base; if not,
raise/return an error. Ensure the code in the block that builds cmd (where
output_file is checked) replaces the raw value with the sanitized/resolved path
and creates parent directories (os.makedirs(..., exist_ok=True)) before
appending to cmd.
| else: | ||
| # Try as relative path from repo root | ||
| candidate = (REPO_ROOT / name).resolve() | ||
| if candidate.exists(): | ||
| target = candidate | ||
| else: | ||
| # Try docs subdir | ||
| candidate2 = (DOCS_DIR / name).resolve() | ||
| if candidate2.exists(): | ||
| target = candidate2 |
There was a problem hiding this comment.
Path traversal vulnerability allows reading arbitrary files.
The name parameter is user-controlled and used directly to construct file paths without validating the resolved path stays within allowed directories. An input like name="../../../../etc/passwd" would resolve to /etc/passwd and be read.
Proposed fix
+def _is_safe_path(path: Path, allowed_roots: list[Path]) -> bool:
+ """Check if resolved path is within any allowed root."""
+ resolved = path.resolve()
+ return any(resolved.is_relative_to(root.resolve()) for root in allowed_roots)
`@mcp.tool`()
def get_docs(name: str) -> dict:
...
if name in slug_map:
target = slug_map[name]
else:
# Try as relative path from repo root
candidate = (REPO_ROOT / name).resolve()
- if candidate.exists():
+ if candidate.exists() and _is_safe_path(candidate, [REPO_ROOT]):
target = candidate
else:
# Try docs subdir
candidate2 = (DOCS_DIR / name).resolve()
- if candidate2.exists():
+ if candidate2.exists() and _is_safe_path(candidate2, [DOCS_DIR]):
target = candidate2📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| else: | |
| # Try as relative path from repo root | |
| candidate = (REPO_ROOT / name).resolve() | |
| if candidate.exists(): | |
| target = candidate | |
| else: | |
| # Try docs subdir | |
| candidate2 = (DOCS_DIR / name).resolve() | |
| if candidate2.exists(): | |
| target = candidate2 | |
| else: | |
| # Try as relative path from repo root | |
| candidate = (REPO_ROOT / name).resolve() | |
| if candidate.exists() and _is_safe_path(candidate, [REPO_ROOT]): | |
| target = candidate | |
| else: | |
| # Try docs subdir | |
| candidate2 = (DOCS_DIR / name).resolve() | |
| if candidate2.exists() and _is_safe_path(candidate2, [DOCS_DIR]): | |
| target = candidate2 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mcp_server.py` around lines 185 - 194, The code builds filesystem paths from
the user-controlled variable name (used in candidate/candidate2) without
ensuring the resolved path remains inside allowed roots (REPO_ROOT or DOCS_DIR),
enabling path traversal; fix by resolving the candidate paths and explicitly
checking that each resolved path is a descendant of its allowed base before
assigning to target — e.g., after computing candidate = (REPO_ROOT /
name).resolve() verify candidate.is_relative_to(REPO_ROOT.resolve()) (or fall
back to comparing string prefixes of resolved paths) and only accept it if true,
and do the same for candidate2 against DOCS_DIR; if neither check passes, reject
the request or return an error.
| try: | ||
| content = target.read_text(encoding="utf-8") | ||
| return {"found": True, "name": name, "content": content, "error": ""} | ||
| except Exception as e: | ||
| return {"found": False, "name": name, "content": "", "error": str(e)} |
There was a problem hiding this comment.
Bare except Exception pattern in both get_docs and search_docs.
Both functions (get_docs at line 207, search_docs at line 250) use bare except Exception which masks specific error causes and makes debugging difficult. The root cause is overly broad exception handling. Replace with specific exception types (OSError, UnicodeDecodeError) and add logging.
🧰 Tools
🪛 Ruff (0.15.15)
[warning] 207-207: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mcp_server.py` around lines 204 - 208, Replace the broad except Exception in
get_docs and search_docs with targeted exception handlers: catch OSError for
filesystem/read errors and UnicodeDecodeError for encoding issues when calling
target.read_text(), and use the module logger (or processLogger) to log the full
exception details (including stack/str) before returning the error payload; keep
the returned structure the same but populate "error" with the logged exception
message for easier debugging.
Source: Linters/SAST tools
| ## Configuration | ||
|
|
||
| Environment variables: | ||
| - `OPENAI_API_KEY` — required for AI analysis | ||
| - Copy `.env.example` to `.env` to set locally | ||
|
|
||
| ## Notes | ||
|
|
||
| - Works with Python 3.8+ | ||
| - Tested on macOS and Linux | ||
| - The CLI wraps `git` — it must be run inside a git repository | ||
| - Large diffs may be slow; consider scoping with file paths or commit ranges | ||
| - The `.env` file is git-ignored — never commit API keys |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check for .env.example file
if [ -f ".env.example" ]; then
echo ".env.example found ✓"
else
echo ".env.example NOT found ✗"
fi
# Check .gitignore for .env entry
if grep -q "^\.env$" .gitignore 2>/dev/null; then
echo ".env in .gitignore ✓"
else
echo ".env NOT in .gitignore ✗"
fi
# Look for Python version requirement
fd -t f '(pyproject.toml|setup.py|setup.cfg|.python-version)' && \
echo "Checking Python version requirement:" && \
rg -A2 "requires-python|python_requires|target.version" --type toml --type python || echo "No explicit Python version found"Repository: WildestAI/DiffGraph-CLI
Length of output: 237
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== skill.md (lines 70-95) =="
if [ -f "skill.md" ]; then
sed -n '70,95p' skill.md | nl -ba
else
echo "skill.md not found"
fi
echo
echo "== Python version declarations (repo-wide) =="
rg -n --hidden --no-ignore-vcs -S \
'requires-python|python_requires|target\.version|\.python-version|setup-python|actions/setup-python|Programming Language :: Python ::' .
echo
echo "== setup.py python_requires excerpt =="
if [ -f "setup.py" ]; then
rg -n 'python_requires' setup.py || true
rg -n --context 3 'classifiers\s*=|Programming Language :: Python ::' setup.py || true
else
echo "setup.py not found"
fi
echo
echo "== pyproject.toml requires-python excerpt =="
if [ -f "pyproject.toml" ]; then
rg -n 'requires-python' pyproject.toml || true
rg -n --context 3 'tool\.poetry|project\.requires-python|project\s*\{' pyproject.toml || true
else
echo "pyproject.toml not found"
fi
echo
echo "== .python-version file =="
if [ -f ".python-version" ]; then
echo ".python-version exists: $(cat .python-version)"
else
echo ".python-version not found"
fi
echo
echo "== CI (workflows) setup-python versions =="
if [ -d ".github/workflows" ]; then
rg -n --hidden --no-ignore-vcs -S 'setup-python|python-version' .github/workflows || true
else
echo ".github/workflows not found"
fiRepository: WildestAI/DiffGraph-CLI
Length of output: 157
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== skill.md (lines 70-95) =="
if [ -f "skill.md" ]; then
awk 'NR>=70 && NR<=95 {printf "%d:%s\n", NR, $0}' skill.md
else
echo "skill.md not found"
fi
echo
echo "== .gitignore (.env entry) =="
if [ -f ".gitignore" ]; then
rg -n '^\s*\.env\s*$' .gitignore || true
echo "-- matches for '.env' in .gitignore:"
rg -n '\.env' .gitignore || true
else
echo ".gitignore not found"
fi
echo
echo "== Check for python minimum version declarations =="
for f in pyproject.toml setup.py setup.cfg .python-version .github/workflows/*.yml .github/workflows/*.yaml; do
if [ -e "$f" ]; then
echo "-- $f"
rg -n 'requires-python|python_requires|\.python-version|setup-python|python-version|Programming Language :: Python ::' "$f" || true
fi
done
echo
echo "== List likely python config files found =="
fd -t f -d 4 -g 'pyproject.toml' -g 'setup.py' -g 'setup.cfg' -g 'requirements*.txt' -g '.python-version' . 2>/dev/null || true
fd -t f -d 5 -g '*.yml' -g '*.yaml' .github/workflows 2>/dev/null || trueRepository: WildestAI/DiffGraph-CLI
Length of output: 1196
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Repo root python config presence =="
ls -la | sed -n '1,200p' | rg -n 'pyproject\.toml|setup\.py|setup\.cfg|\.python-version|requirements.*\.txt|Pipfile|poetry\.lock' || true
echo
echo "== Find pyproject/setup.cfg/.python-version =="
fd -t f -d 4 -g 'pyproject.toml' -g 'setup.cfg' -g '.python-version' . 2>/dev/null || true
echo
echo "== Find CI python versions (workflows) =="
fd -t f -d 6 -g '*.yml' -g '*.yaml' .github/workflows 2>/dev/null || true
rg -n 'setup-python|python-version|actions/setup-python' .github/workflows 2>/dev/null || true
echo
echo "== Repo-wide python version hints =="
rg -n 'requires-python|python_requires|\.python-version|setup-python|python-version|Programming Language :: Python ::' . || trueRepository: WildestAI/DiffGraph-CLI
Length of output: 724
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== .env.example existence =="
if [ -f ".env.example" ]; then
echo ".env.example found"
sed -n '1,120p' .env.example | awk '{print NR":"$0}'
else
echo ".env.example NOT found"
fiRepository: WildestAI/DiffGraph-CLI
Length of output: 261
Fix Python version requirement mismatch in skill.md
.env.exampleexists at the repo root, and.gitignoreignores.env(while keeping!.env.example), so that guidance is correct.skill.mdsays “Works with Python 3.8+” (line 82), butsetup.pydeclarespython_requires=">=3.7"(line 15); align these by updatingskill.mdto 3.7+ or tighteningsetup.pyto 3.8+.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@skill.md` around lines 74 - 86, skill.md currently states "Works with Python
3.8+" but setup.py's python_requires=">=3.7"; update the documentation to match
the package metadata by changing the text in skill.md from "Works with Python
3.8+" to "Works with Python 3.7+" (or alternatively, if you intend to require
3.8+, change setup.py's python_requires to ">=3.8"); locate the string in
skill.md and the python_requires in setup.py to keep both consistent.
What
Adds the full AI discoverability infrastructure for WildestAI, without any third-party SaaS dependency.
Files added
llms.txtskill.mdwildCLImcp_server.pyrun_wild_diff,list_docs,get_docs,search_docstoolsMCP server tools
run_wild_diff(repo_path, args)— runswild diffon any repo and returns structured resultslist_docs()— indexes all 9 documentation pagesget_docs(name)— fetches any doc by slugsearch_docs(query)— full-text search across all docsAlso exposes a
wildestai://llms.txtMCP resource.Why
Replaces the need for DocsALot ($100/month) — same AI discoverability features built in-house.
Testing
Summary by CodeRabbit
New Features
Documentation