diff --git a/AGENTS.md b/AGENTS.md index e3fbf68..c62ce9d 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -583,6 +583,19 @@ Canonical state file for AskUserQuestion. Atomic write via tmp+rename. Schema: ### MCP HTTP transport blocklist `codec_config._HTTP_BLOCKED`: `python_exec`, `terminal`, `process_manager`, `pm2_control`, `ax_control`. These skills are NEVER exposed over HTTP MCP. They remain available locally (voice, chat) and over stdio MCP only. +### Skill creation flow — review-and-approve only (Phase 1 Wave 1, PR-1B — closes D-2 + D-3) + +Skill creation is exclusively via the review-and-approve flow: + + `POST /api/skill/review` → stages code for human review (no disk write) + `POST /api/skill/approve` → writes to disk after explicit operator approval; runs `is_dangerous_skill_code` as the write-time gate + +The legacy direct-write endpoints `/api/save_skill` and `/api/forge` were **removed in PR-1B**. Both were CRITICAL RCE-enabling paths per `docs/audits/PHASE-1-SECURITY.md`: +- `/api/save_skill` (**D-3**) wrote user/LLM-supplied code straight to `/.py` after only a substring blocker. +- `/api/forge` (**D-2**) fetched arbitrary URLs (SSRF), passed the response to the LLM, and wrote the LLM's output directly to disk. + +The Skill Forge UI in `codec_vibe.html` (modal, toolbar buttons, JS handlers) was removed alongside. The URL-fetch capability is intentionally dropped — anyone wanting to import code from a URL now pastes the source into the editor and goes through the review-and-approve flow like any other skill. + ### Skill load-time safety gate (Phase 1 Wave 1, PR-1A — closes D-1) `SkillRegistry.load` (`codec_skill_registry.py`) runs a two-stage check on every skill load — BEFORE `spec.loader.exec_module(mod)` — so a malicious `.py` file dropped in `~/.codec/skills/` cannot execute regardless of how it reached disk: diff --git a/FEATURES.md b/FEATURES.md index aa31e10..c9ccf6f 100644 --- a/FEATURES.md +++ b/FEATURES.md @@ -149,17 +149,14 @@ release, CODEC is now a **9-product system**. | 7 | Inspect mode for element inspection | | 8 | Save file to disk | | 9 | Copy code to clipboard | -| 10 | Save as CODEC Skill | -| 11 | Test Skill (invoke run() function) | -| 12 | Skill Forge modal (3 modes: Paste Code, GitHub URL, Describe) | -| 13 | Project management sidebar (sessions) | -| 14 | Resizable panels (drag handle) | -| 15 | Output console panel | -| 16 | DOMPurify sanitization on all rendered content | -| 17 | Server webcam photo + live PIP | -| 18 | Light/dark theme toggle (syncs Monaco theme) | -| 19 | Skill review + approval workflow (human-in-the-loop) | -| 20 | URL import in Skill Forge (fetch code from GitHub raw URLs) | +| 10 | Test Skill (invoke run() function) | +| 11 | Project management sidebar (sessions) | +| 12 | Resizable panels (drag handle) | +| 13 | Output console panel | +| 14 | DOMPurify sanitization on all rendered content | +| 15 | Server webcam photo + live PIP | +| 16 | Light/dark theme toggle (syncs Monaco theme) | +| 17 | Skill review + approval workflow (human-in-the-loop, exclusive path) | --- diff --git a/README.md b/README.md index 221c88e..aae01fc 100644 --- a/README.md +++ b/README.md @@ -50,7 +50,7 @@ No cloud dependency. No data leaving the machine unless you choose. No subscript | 2 | **CODEC Dictate** | Hold, speak, paste — hands-free F5 live typing at cursor, draft refinement, floating overlays | | 3 | **CODEC Instant** | Right-click → 8 AI services system-wide — proofread, translate, reply, explain | | 4 | **CODEC Chat** | 250K-context conversational AI + 12 autonomous agent crews | -| 5 | **CODEC Vibe** | Browser IDE with Monaco editor + Skill Forge — the framework writes its own plugins | +| 5 | **CODEC Vibe** | Browser IDE with Monaco editor + live preview — new skills land through a human-review approval flow | | 6 | **CODEC Voice** | Real-time voice calls with interrupt detection, screen analysis mid-call | | 7 | **CODEC Overview** | Dashboard + Cortex nerve center + full audit trail — accessible from any device | | 8 | **CODEC Pilot** | Browser automation you can *teach* — record once, replay forever with XPath→CSS→LLM rescue | @@ -121,11 +121,11 @@ The multi-agent framework is under 800 lines. Zero dependencies. No CrewAI. No L **Phase 3 substrate** (autonomous agents) reuses Phase 1+2 components: unified audit envelope (Step 1), plugin lifecycle hooks (Step 2), AskUser + strict-consent + step budget (Step 3), continuous observation loop (Step 5), trigger system (Step 6), end-of-day shift report (Step 7). Per-agent state at `~/.codec/agents//`. Global allowlist tier at `~/.codec/agent_global_grants.json`. 17 new audit events. See `docs/PHASE3-COMPLETE.md` for the full sign-off. -### 5. CODEC Vibe — AI Coding IDE + Skill Forge +### 5. CODEC Vibe — AI Coding IDE Split-screen in the browser. Monaco editor on the left (same engine as VS Code, v0.45.0). AI chat on the right. Describe what's needed — CODEC writes it, click Apply, run it, live preview in browser. -Skill Forge takes it further: three modes — paste code, import from GitHub URL, or describe a capability in plain English. CODEC converts it into a working plugin. The framework writes its own extensions. DOMPurify sanitization on all rendered content. +New skills land through the human-review approval flow (`/api/skill/review` → `/api/skill/approve`) — staged, previewed, then written to disk only after explicit operator approval. Defense in depth pairs with the load-time AST gate in `SkillRegistry.load`. DOMPurify sanitization on all rendered content. ### 6. CODEC Voice — Live Voice Calls @@ -311,7 +311,7 @@ Three smart agents ship built-in: Daily Briefing, Restaurant Decider (location-a | Voice-to-voice calls | WebSocket, real-time | Yes but cloud | Yes but cloud | | Multi-agent workflows | 12 crews, local LLM | No | Limited | | Right-click AI services | 8 system-wide services | No | No | -| Writes its own plugins | Skill Forge | No | No | +| Writes its own plugins | Yes, via review-and-approve flow | No | No | | Hands-free live typing at cursor | Dictate F5 | No | No | | Process watchdog | Auto-kills stuck processes | No | No | | Full audit trail | 16 event categories | No | No | @@ -325,7 +325,7 @@ Three smart agents ship built-in: Daily Briefing, Restaurant Decider (location-a | Pipecat | **Voice** — own WebSocket pipeline | | CrewAI + LangChain | **Chat** — 795-line agent framework, zero dependencies | | SuperWhisper / Apple Dictation | **Dictate** — free, open source, F5 hands-free live typing, 100% local | -| Cursor / Windsurf | **Vibe** — Monaco + AI + Skill Forge | +| Cursor / Windsurf | **Vibe** — Monaco + AI + review-gated skill creation | | Google Assistant / Siri | **Core** — actually controls the computer | | Grammarly | **Instant** — right-click services via local LLM | | ChatGPT | **Chat** — 250K context, fully local | @@ -630,7 +630,7 @@ codec_voice.html — Voice call UI codec_dashboard.py — Web API + dashboard (135+ endpoints across routes/) codec_dashboard.html — Dashboard UI (Flash Chat, History, Audit, Settings, Stats, Skills) codec_chat.html — Chat UI (agents, file upload, voice input) -codec_vibe.html — Vibe Code IDE (Monaco + Skill Forge) +codec_vibe.html — Vibe Code IDE (Monaco editor + live preview) codec_cortex.html — Cortex system overview (neural map, product grid) codec_audit.html — Audit log viewer (16 categories, filterable) codec_audit.py — Audit logger (JSON-line, 50MB rotation, thread-safe) diff --git a/codec_chat.html b/codec_chat.html index 3334f0e..287f192 100644 --- a/codec_chat.html +++ b/codec_chat.html @@ -471,7 +471,7 @@

CODEC

"3. **CODEC Dictate** \u2014 Hold-to-speak dictation that transcribes and refines text with LLM grammar/tone correction in any macOS app.\n" + "4. **CODEC Instant** \u2014 Right-click AI services (Proofread, Elevate, Explain, Translate, Reply, Read Aloud) on any selected text system-wide.\n" + "5. **CODEC Chat** \u2014 That's YOU. Conversational AI with 250K context, file uploads, image analysis, web search, and 12 autonomous agent crews.\n" + -"6. **CODEC Vibe** \u2014 AI coding IDE with Skill Forge that auto-generates and deploys new plugins from natural language.\n" + +"6. **CODEC Vibe** \u2014 AI coding IDE with Monaco editor + live preview. New skills land through the human-review approval flow.\n" + "7. **CODEC Voice** \u2014 Real-time voice-to-voice calls with live transcription and mid-call screen analysis.\n\n" + "Supporting systems: Dashboard (remote access via Cloudflare/Tailscale), Skill Marketplace, MCP Server (exposes skills to Claude/Cursor/VS Code), Task Scheduler, and Memory.\n\n" + "### IDENTITY & PERSONA\n" + diff --git a/codec_config.py b/codec_config.py index 1af01ec..e79b992 100644 --- a/codec_config.py +++ b/codec_config.py @@ -1,6 +1,14 @@ """CODEC Configuration — loads ~/.codec/config.json and exposes all constants""" import os, json -from pynput import keyboard + +# pynput requires a display (X11 / AppKit / win32). On headless CI runners +# (Linux GitHub Actions, Docker) the import raises ImportError. Other modules +# import codec_config for is_dangerous_skill_code, DANGEROUS_PATTERNS, etc. +# without needing the keyboard subsystem — fail gracefully so those imports work. +try: + from pynput import keyboard +except ImportError: + keyboard = None CONFIG_PATH = os.path.expanduser("~/.codec/config.json") DRY_RUN = False @@ -258,6 +266,10 @@ def needs_screen(t): # Key resolution def _resolve_key(name): name = name.lower().strip() + if keyboard is None: + # Headless: codec_config is being imported for its constants only, + # not for live keyboard listening. Skip key resolution. + return None if name.startswith('f') and name[1:].isdigit(): return getattr(keyboard.Key, name, None) if len(name) == 1: diff --git a/codec_cortex.html b/codec_cortex.html index 8b3269b..804a3eb 100644 --- a/codec_cortex.html +++ b/codec_cortex.html @@ -857,11 +857,11 @@

Live Activity

features:['250K Context','File Upload','Image Analysis','Web Search','12 Agent Crews','Streaming','Skill Execution','Voice Reply','Memory Search','Code Blocks','Thinking Mode'], files:['codec_chat.html','codec_dashboard.py'],status:'chat', details:{agents:'Deep Research (10,000-word report → Google Docs), Daily Briefing (morning news + calendar), Competitor Analysis (SWOT + positioning), Trip Planner (full itinerary), Email Handler (triage inbox, draft replies), Social Media (posts for Twitter, LinkedIn, Instagram), Code Review (bugs + security), Data Analysis (trends + insights), Content Writer (blog posts, articles), Meeting Summarizer (action items), Invoice Generator (professional invoices), Custom Agent (define your own role, tools, task)',features:'250K context, file uploads (PDF/image/text), web search with Serper API, skill execution, streaming responses, voice reply toggle, thinking mode toggle, memory search. Multi-agent framework under 800 lines. Zero dependencies. No CrewAI. No LangChain.',scheduling:'Schedule any crew: "Run competitor analysis every Monday at 9am"',port:'8090 (/chat)',files:['codec_chat.html','codec_dashboard.py','routes/agents.py','codec_agents.py']}}, - {name:'CODEC Vibe',subtitle:'AI Coding IDE + Skill Forge',icon:'🎨',color:'#f59e0b', - desc:'Split-screen in the browser. Monaco editor on the left (same engine as VS Code). AI chat on the right. Describe what you need — CODEC writes it, click Apply, run it, live preview. Skill Forge: describe a new capability in plain English, CODEC converts it into a working plugin.', - features:['Monaco Editor','Live Preview','Skill Forge','Auto Deploy','Code Generation','Apply Code','HTML/CSS/JS','Template Library'], + {name:'CODEC Vibe',subtitle:'AI Coding IDE',icon:'🎨',color:'#f59e0b', + desc:'Split-screen in the browser. Monaco editor on the left (same engine as VS Code). AI chat on the right. Describe what you need — CODEC writes it, click Apply, run it, live preview. New skills land through the review-and-approve flow (/api/skill/review → /api/skill/approve) — human-in-the-loop, audit-gated.', + features:['Monaco Editor','Live Preview','Code Generation','Apply Code','HTML/CSS/JS','Template Library','Human Review Gate','Test skill run()'], files:['codec_vibe.html'],status:'vibe', - details:{how:'Natural language → LLM generates HTML/CSS/JS → Live preview in iframe → One-click deploy as skill. Skill Forge takes it further: describe a new capability in plain English, CODEC converts it into a working plugin. The framework writes its own extensions.',port:'8090 (/vibe)',files:['codec_vibe.html']}}, + details:{how:'Natural language → LLM generates HTML/CSS/JS → Live preview in iframe → Skill staging via /api/skill/review → user approves → /api/skill/approve writes to ~/.codec/skills/. Defense in depth with the load-time AST gate added in PR-1A.',port:'8090 (/vibe)',files:['codec_vibe.html']}}, {name:'CODEC Voice',subtitle:'Live Voice Calls',icon:'📞',color:'#34d399', desc:'Real-time voice-to-voice conversations with the AI. WebSocket pipeline — no Pipecat, no external dependencies. Mid-call say "check my screen" — it screenshots, analyzes, and speaks back. Full transcript saved to memory.', features:['WebSocket','VAD','Interruption','Live Transcript','Screen Analysis','Streaming TTS','Low Latency','Memory Save','Searchable History'], diff --git a/codec_vibe.html b/codec_vibe.html index ea3466b..7a2d9f8 100644 --- a/codec_vibe.html +++ b/codec_vibe.html @@ -122,30 +122,6 @@ ::-webkit-scrollbar-track { background: transparent; } ::-webkit-scrollbar-thumb { background: var(--border); border-radius: 2px; } ::-webkit-scrollbar-thumb:hover { background: var(--text-muted); } -/* ── Forge Modal ── */ -.fm-overlay{display:none;position:fixed;inset:0;background:rgba(0,0,0,.7);z-index:50;align-items:center;justify-content:center}.fm-overlay.sh{display:flex} -.fm-box{background:var(--surface);border:1px solid var(--border);border-radius:12px;width:520px;max-width:95vw;max-height:90vh;overflow:hidden;display:flex;flex-direction:column;box-shadow:0 8px 40px rgba(0,0,0,.6)} -.fm-head{display:flex;align-items:center;justify-content:space-between;padding:16px 20px;border-bottom:1px solid var(--border)} -.fm-head h2{font-family:var(--mono);font-size:15px;color:var(--accent);font-weight:700} -.fm-close{background:none;border:none;color:var(--text-muted);font-size:20px;cursor:pointer;width:28px;height:28px;display:flex;align-items:center;justify-content:center;border-radius:4px} -.fm-close:hover{color:var(--text);background:var(--surface-2)} -.fm-tabs{display:flex;border-bottom:1px solid var(--border);padding:0 20px;gap:4px} -.fm-tab{padding:10px 14px;font-size:12px;font-weight:600;color:var(--text-muted);background:none;border:none;cursor:pointer;border-bottom:2px solid transparent;transition:all .2s;font-family:var(--font)} -.fm-tab.active{color:var(--accent);border-bottom-color:var(--accent)} -.fm-tab:hover{color:var(--text)} -.fm-body{padding:20px;display:flex;flex-direction:column;gap:12px;flex:1} -.fm-label{font-size:11px;font-weight:600;color:var(--text-muted);letter-spacing:.05em;text-transform:uppercase;margin-bottom:4px} -.fm-ta{width:100%;height:180px;background:var(--surface-2);border:1px solid var(--border);border-radius:6px;padding:12px;color:var(--text);font-family:var(--mono);font-size:12px;resize:vertical;outline:none;line-height:1.5} -.fm-ta:focus{border-color:var(--accent);box-shadow:0 0 0 2px rgba(232,113,26,.15)} -.fm-input{width:100%;background:var(--surface-2);border:1px solid var(--border);border-radius:6px;padding:10px 14px;color:var(--text);font-family:var(--mono);font-size:13px;outline:none} -.fm-input:focus{border-color:var(--accent);box-shadow:0 0 0 2px rgba(232,113,26,.15)} -.fm-hint{font-size:11px;color:var(--text-muted);margin-top:2px} -.fm-foot{display:flex;align-items:center;gap:10px;padding:14px 20px;border-top:1px solid var(--border);background:var(--surface)} -.fm-forge{padding:9px 20px;background:linear-gradient(135deg,#7c3aed,#5b21b6);color:#fff;border:none;border-radius:6px;font-size:13px;font-weight:700;cursor:pointer;transition:opacity .2s;font-family:var(--font)} -.fm-forge:hover{opacity:.9} -.fm-forge:disabled{opacity:.5;cursor:not-allowed} -.fm-status{font-size:12px;color:var(--text-muted);font-family:var(--mono)} -.fm-panel{display:none}.fm-panel.active{display:block} /* ── Inspect Mode ── */ #inspectBtn.active { background: #E8711A !important; @@ -213,40 +189,6 @@ - -
-
-
-

Skill Forge

- -
-
- - - -
-
-
-
Paste Python or JavaScript code
- -
-
-
Raw file URL
- -
Works with any raw .py or .js URL (GitHub, Gist, Pastebin…)
-
-
-
Describe what you want
- -
CODEC will generate the skill from scratch based on your description
-
-
-
- - -
-
-

PROJECTS

@@ -321,7 +263,7 @@

CODEC

Vibe Coding
Tell me what to build!
-
+
👁 Live Preview
Vibe Coding
@@ -471,7 +413,7 @@

CODEC

"- BUDDY SYSTEM: Maintain a persistent digital familiar. Reflect its mood based on code quality — success = high patience/wisdom, linting errors = high chaos. Use occasional snark if the user ignores best practices.\n" + "- AUTO-MODE: You have a trust score. If a command is standard for the current task, execute without asking. If destructive or unknown, request permission.\n\n" + "### SKILLS & TOOLS\n" + -"You have access to 50+ CODEC skills including: file system, web search, Google Drive/Docs/Sheets, calendar, email, browser automation, screenshot OCR, terminal commands, and the Skill Forge (auto-generate new plugins). Use them when relevant.\n\n" + +"You have access to 50+ CODEC skills including: file system, web search, Google Drive/Docs/Sheets, calendar, email, browser automation, screenshot OCR, and terminal commands. Use them when relevant.\n\n" + "### MEMORY\n" + "All sessions are saved to CODEC shared memory (FTS5 indexed). Reference previous conversations naturally. If the user mentions past work, search memory first."; @@ -795,140 +737,6 @@

CODEC

.catch(function(e) { addOut('er', 'Save error: ' + e.message); }); } -function doSkill() { - var code = ed.getValue(); - if (!code.includes('SKILL_TRIGGERS')) { - addMsg('assistant', 'Not a CODEC skill. Ask me to convert it!'); - return; - } - var fn = document.getElementById('fn').value; - if (!fn.endsWith('.py')) fn += '.py'; - fetch('/api/save_skill', { - method: 'POST', - headers: {'Content-Type': 'application/json'}, - body: JSON.stringify({filename: fn, content: code}) - }) - .then(function(r) { return r.json(); }) - .then(function(d) { addOut('sy', 'Skill saved: ' + d.path); addMsg('assistant', 'Skill saved! Run: pm2 restart ava-autopilot'); }) - .catch(function(e) { addOut('er', e.message); }); -} - -/* ── Forge Modal ── */ -var _forgeMode = 'code'; - -function doForge() { - // Open modal — clear previous inputs - document.getElementById('forgeCodeInput').value = ''; - document.getElementById('forgeUrlInput').value = ''; - document.getElementById('forgeDescInput').value = ''; - document.getElementById('fmStatus').textContent = ''; - document.getElementById('fmForgeBtn').disabled = false; - setForgeMode('code'); - document.getElementById('fmOverlay').classList.add('sh'); -} - -function closeForgeModal() { - document.getElementById('fmOverlay').classList.remove('sh'); -} - -function setForgeMode(mode) { - _forgeMode = mode; - ['code','url','describe'].forEach(function(m) { - var suffix = m === 'describe' ? 'Desc' : m.charAt(0).toUpperCase() + m.slice(1); - document.getElementById('ftCode').classList.toggle('active', m === 'code' && mode === 'code'); - document.getElementById('ftUrl').classList.toggle('active', m === 'url' && mode === 'url'); - document.getElementById('ftDesc').classList.toggle('active', m === 'describe' && mode === 'describe'); - document.getElementById('fmPanel' + (m === 'describe' ? 'Desc' : m.charAt(0).toUpperCase() + m.slice(1))).classList.toggle('active', mode === m); - }); -} - -function submitForge() { - var payload = {}; - if (_forgeMode === 'code') { - var code = document.getElementById('forgeCodeInput').value.trim(); - if (!code) { document.getElementById('fmStatus').textContent = 'Paste some code first'; return; } - payload = {code: code}; - } else if (_forgeMode === 'url') { - var url = document.getElementById('forgeUrlInput').value.trim(); - if (!url) { document.getElementById('fmStatus').textContent = 'Enter a URL first'; return; } - payload = {code: url}; - } else { - var desc = document.getElementById('forgeDescInput').value.trim(); - if (!desc) { document.getElementById('fmStatus').textContent = 'Describe the skill first'; return; } - payload = {description: desc}; - } - - var btn = document.getElementById('fmForgeBtn'); - var status = document.getElementById('fmStatus'); - btn.disabled = true; - status.textContent = 'Forging skill…'; - document.getElementById('rs').textContent = 'Forging...'; - document.getElementById('rs').style.color = '#a78bfa'; - - fetch('/api/forge', { - method: 'POST', - headers: {'Content-Type': 'application/json'}, - body: JSON.stringify(payload) - }) - .then(function(r) { return r.json(); }) - .then(function(d) { - btn.disabled = false; - document.getElementById('rs').textContent = ''; - if (d.error) { - status.textContent = 'Error: ' + d.error; - addOut('er', 'Forge error: ' + d.error); - return; - } - // Load into editor for review - ed.setValue(d.code); - document.getElementById('fn').value = d.skill_name + '.py'; - monaco.editor.setModelLanguage(ed.getModel(), 'python'); - closeForgeModal(); - - // ── Human review gate: stage code for review before writing to disk ── - var skillFilename = d.skill_name + '.py'; - fetch('/api/skill/review', { - method: 'POST', - headers: {'Content-Type': 'application/json'}, - body: JSON.stringify({code: d.code, filename: skillFilename}) - }) - .then(function(r) { return r.json(); }) - .then(function(rev) { - if (rev.error) { addOut('er', 'Review error: ' + rev.error); return; } - var preview = d.code.length > 300 ? d.code.substring(0, 300) + '\n…(truncated)' : d.code; - var approved = confirm('Review forged skill "' + skillFilename + '":\n\n' + preview + '\n\nApprove and save to disk?'); - if (!approved) { - addOut('sy', 'Skill forge cancelled — code is in editor but NOT saved to disk.'); - addMsg('assistant', 'Skill forge cancelled. The code is still in the editor for review.'); - return; - } - // User approved — write to disk - fetch('/api/skill/approve', { - method: 'POST', - headers: {'Content-Type': 'application/json'}, - body: JSON.stringify({review_id: rev.review_id}) - }) - .then(function(r2) { return r2.json(); }) - .then(function(saved) { - if (saved.error) { addOut('er', 'Save error: ' + saved.error); return; } - addOut('sy', 'Forged & approved: ' + saved.path); - addMsg('assistant', - '🔨 Skill ' + esc(d.skill_name) + ' forged & approved!
' + - 'Saved to ' + esc(saved.path) + '. Run: pm2 restart ava-autopilot' - ); - }) - .catch(function(e2) { addOut('er', 'Approve failed: ' + e2.message); }); - }) - .catch(function(e) { addOut('er', 'Review request failed: ' + e.message); }); - }) - .catch(function(e) { - btn.disabled = false; - document.getElementById('rs').textContent = ''; - status.textContent = 'Failed: ' + e.message; - addOut('er', 'Forge failed: ' + e.message); - }); -} - function doTestSkill() { var code = ed.getValue().trim(); if (!code.includes('SKILL_TRIGGERS')) { @@ -1324,14 +1132,6 @@

CODEC

} } -// Close forge modal on overlay click or Escape -document.getElementById('fmOverlay').addEventListener('click', function(e) { - if (e.target === this) closeForgeModal(); -}); -document.addEventListener('keydown', function(e) { - if (e.key === 'Escape') closeForgeModal(); -}); - // Events document.getElementById('sndb').addEventListener('click', doSend); document.getElementById('vi').addEventListener('keydown', function(e) { diff --git a/docs/API.md b/docs/API.md index c249c72..daf62f4 100644 --- a/docs/API.md +++ b/docs/API.md @@ -89,20 +89,20 @@ List all installed skills with metadata. curl -H "Authorization: Bearer $TOKEN" http://localhost:8090/api/skills ``` -### POST /api/forge -Convert code to a CODEC skill via Skill Forge. +### POST /api/skill/review +Stage a skill for review before saving. The only sanctioned write path for new skills — defense in depth with the load-time AST gate (`SkillRegistry.load`) added in PR-1A. Direct-write endpoints (`/api/save_skill`, `/api/forge`) were removed in PR-1B (see `docs/audits/PHASE-1-SECURITY.md` D-2 + D-3). + ```bash curl -X POST -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ - -d '{"code": "def run(task): return str(2+2)", "name": "calculator"}' \ - http://localhost:8090/api/forge + -d '{"code": "SKILL_NAME=\"x\"\nSKILL_DESCRIPTION=\"...\"\nSKILL_TRIGGERS=[\"x\"]\ndef run(t,a=\"\",c=\"\"):return \"ok\"", "filename": "x.py"}' \ + http://localhost:8090/api/skill/review ``` -### POST /api/skill/review -Stage a skill for review before saving. +Returns `{"review_id": "...", "code": "...", "filename": "..."}`. ### POST /api/skill/approve -Approve and save a reviewed skill. +Approve a pending review by `review_id` and write the file to `/`. Runs `is_dangerous_skill_code` as the write-time gate. --- diff --git a/docs/audits/PHASE-1-CONSOLIDATED-TRIAGE.md b/docs/audits/PHASE-1-CONSOLIDATED-TRIAGE.md index 230f522..2465066 100644 --- a/docs/audits/PHASE-1-CONSOLIDATED-TRIAGE.md +++ b/docs/audits/PHASE-1-CONSOLIDATED-TRIAGE.md @@ -143,9 +143,9 @@ If you do all 8: you have layered defenses against D-1 / D-2 / D-3 / D-4 / D-7 u | ID | Audit | Title | Wave | |---|---|---|---| -| **D-1** | Security | Skill registry lazy-load = RCE for anyone who can drop a `.py` file | **W1 — CLOSED (PR-1A)** | -| **D-2** | Security | `/api/forge` fetches arbitrary URL → LLM → writes skill, no review gate | **W1** | -| **D-3** | Security | `/api/save_skill` writes directly to skills/ with only substring check | **W1** | +| **D-1** | Security | Skill registry lazy-load = RCE for anyone who can drop a `.py` file | **W1 — CLOSED ([#42](https://github.com/AVADSA25/codec/pull/42), `48ec5d5`)** | +| **D-2** | Security | `/api/forge` fetches arbitrary URL → LLM → writes skill, no review gate | **W1 — CLOSED (PR-1B)** | +| **D-3** | Security | `/api/save_skill` writes directly to skills/ with only substring check | **W1 — CLOSED (PR-1B)** | | **D-4** | Security | `file_write` skill (MCP-exposed) can write to `~/.codec/skills/` | **W1** | | **D-5** | Security | `permission_gate` accepts path-traversal via `fnmatch` (no realpath) | **W1** | | C-1 | Reliability | `codec.py` daemon ignores SIGINT/SIGTERM; leaks sox + tkinter on every restart | W4 | diff --git a/docs/audits/PHASE-1-SECURITY.md b/docs/audits/PHASE-1-SECURITY.md index 1aeef85..0674f1d 100644 --- a/docs/audits/PHASE-1-SECURITY.md +++ b/docs/audits/PHASE-1-SECURITY.md @@ -61,10 +61,12 @@ No exploits were executed. All bypasses were analyzed against source as text-onl **Recommended fix:** Run `is_dangerous_skill_code` at load-time in `SkillRegistry.load`, BEFORE `exec_module`. Better: route every load through a stable signature check (manifest of approved hash → set of approved files), maintained in a separate file written only by the approve gate. Best: run skills in a sandboxed subprocess by default (the wrapper exists in `codec_sandbox.py` but most paths don't use it — only `SkillRegistry.run(..., sandboxed=True)`). **Effort:** medium (single function change to add AST check; sandbox-by-default is larger surgery). -> **Closed by PR-1A** (branch `fix/pr1a-skill-load-ast-check`). `SkillRegistry.load` now runs a two-stage gate before `spec.loader.exec_module`: (1) sha256 match against `/.manifest.json` for built-ins (hash-pinned at PR-review time via `tools/generate_skill_manifest.py`); (2) `is_dangerous_skill_code` AST check for everything else. Refusals emit `skill_load_blocked` to `~/.codec/audit.log`. The chokepoint also fails closed against D-2, D-3, D-4, D-5 — even if an attacker successfully writes via any of those paths, the file's hash won't match the manifest and the AST check refuses it. 13 tests in `tests/test_skill_registry.py` cover both stages. AGENTS.md §7 documents the contributor workflow for regenerating the manifest after legitimate skill edits. +> **Closed by PR-1A** ([#42](https://github.com/AVADSA25/codec/pull/42), merged as `48ec5d5`). `SkillRegistry.load` now runs a two-stage gate before `spec.loader.exec_module`: (1) sha256 match against `/.manifest.json` for built-ins (hash-pinned at PR-review time via `tools/generate_skill_manifest.py`); (2) `is_dangerous_skill_code` AST check for everything else. Refusals emit `skill_load_blocked` to `~/.codec/audit.log`. The chokepoint also fails closed against D-2, D-3, D-4, D-5 — even if an attacker successfully writes via any of those paths, the file's hash won't match the manifest and the AST check refuses it. 13 tests in `tests/test_skill_registry.py` cover both stages. AGENTS.md §7 documents the contributor workflow for regenerating the manifest after legitimate skill edits. ### D-2 — `/api/forge` endpoint fetches arbitrary URL → LLM → writes skill, no review gate [CRITICAL] **Location:** `routes/skills.py:71-201` (`forge_skill`) + +> **Closed by PR-1B** (branch `fix/pr1b-remove-save-skill-and-forge`). The `/api/forge` endpoint and its handler `forge_skill` were removed entirely from `routes/skills.py`. Skill creation now routes exclusively through `/api/skill/review` → `/api/skill/approve` (the human-review-and-approve flow). The URL-fetch capability is intentionally dropped per Mickael decision Q1 — users wanting to import code from a URL paste the source into the editor and go through the review-and-approve flow. The Skill Forge modal + toolbar button + JS handlers in `codec_vibe.html` were removed alongside. 9 tests in `tests/test_skill_routes.py` verify the endpoint returns 404 and that the replacement flow remains functional. AGENTS.md §7 documents the contributor / operator workflow. **CWE / OWASP:** CWE-94 Code Injection / CWE-918 SSRF / OWASP A10 SSRF / Agentic A01 Memory Poisoning, A02 Tool Misuse **Description:** Endpoint accepts `code` field; if it starts with `http://` or `https://`, fetches the URL via `requests.get(code, timeout=15)` with no localhost / internal IP filtering, no domain allowlist, no MIME check. The fetched body is then passed to the LLM. The LLM's output is filtered through a 10-pattern substring blocklist (line 170-173: `os.system(`, `subprocess.`, `eval(`, `exec(`, `__import__`, `importlib`, `shutil.rmtree`, `open('/etc`, `open('/dev`, `ctypes`) — bypassable in dozens of ways: split string concatenation (`'os.sys' + 'tem('`), getattr indirection (`getattr(__builtins__, chr(95)*2+'import'+chr(95)*2)`), or simply double-quote variants of the `open('/...` patterns (the blocker only matches single-quote `open('/etc`). After substring check + AST compile (which only catches syntax errors), the code is **written directly to `/skills/.py` AND `~/.codec/skills/.py`** (line 186-190). No `/api/skill/review` staging. @@ -81,6 +83,8 @@ After substring check + AST compile (which only catches syntax errors), the code ### D-3 — `/api/save_skill` writes directly to `skills/` with only substring check [CRITICAL] **Location:** `routes/skills.py:14-28` (`save_skill`) + +> **Closed by PR-1B** (branch `fix/pr1b-remove-save-skill-and-forge`). The `/api/save_skill` endpoint and its handler `save_skill` were removed entirely from `routes/skills.py`. Skill creation now routes exclusively through `/api/skill/review` → `/api/skill/approve` (the human-review-and-approve flow), which already runs `is_dangerous_skill_code` at the approve gate. The "Skill" toolbar button in `codec_vibe.html` and the `doSkill()` JS function were removed alongside. Defense-in-depth with PR-1A: even if a malicious file reaches `~/.codec/skills/` via some other path, `SkillRegistry.load` refuses it at load time. Same 9 tests in `tests/test_skill_routes.py` cover D-2 + D-3. **CWE / OWASP:** CWE-94 Code Injection / OWASP A04 Insecure Design **Description:** `/api/save_skill` writes the request body's `content` field directly to `/` after only: 1. Checking that "SKILL_DESCRIPTION" and "def run(" are in the content (presence check, not security). diff --git a/routes/skills.py b/routes/skills.py index 7c655f1..89fcf9f 100644 --- a/routes/skills.py +++ b/routes/skills.py @@ -1,33 +1,31 @@ -"""CODEC Dashboard -- Skill-related routes (save, review, approve, forge, list).""" -import os, json +"""CODEC Dashboard -- Skill-related routes (review, approve, list, triggers). + +Skill creation is exclusively via the review-and-approve flow: + POST /api/skill/review → stages code for human review (no disk write) + POST /api/skill/approve → writes to disk after explicit approval + +The legacy direct-write endpoints `/api/save_skill` (D-3) and `/api/forge` +(D-2) were removed in PR-1B (see `docs/audits/PHASE-1-SECURITY.md`). Both +were CRITICAL RCE-enabling paths: save_skill wrote LLM/user-supplied code +straight to `/.py` after only a substring blocker; forge +fetched arbitrary URLs (SSRF) and turned the response into a skill via the +LLM. Defense in depth pairs with PR-1A's `SkillRegistry.load` AST gate — +even if a malicious file reached disk via some other path, the load-time +hash + AST check refuses it. +""" +import json +import os from fastapi import APIRouter, Request from fastapi.responses import JSONResponse from routes._shared import ( - log, DASHBOARD_DIR, CONFIG_PATH, _get_skills_dir, _pending_skills, + log, _get_skills_dir, _pending_skills, ) router = APIRouter() -@router.post("/api/save_skill") -async def save_skill(request: Request): - body = await request.json() - filename = os.path.basename(body.get("filename", "custom_skill.py")) - if not filename.endswith(".py"): filename += ".py" - content = body.get("content", "") - if "SKILL_DESCRIPTION" not in content or "def run(" not in content: - return JSONResponse({"error": "Invalid skill: must contain SKILL_DESCRIPTION and def run()"}, status_code=400) - from codec_config import is_dangerous_skill_code - dangerous, reason = is_dangerous_skill_code(content) - if dangerous: - return JSONResponse({"error": f"Blocked: {reason}"}, status_code=400) - path = os.path.join(_get_skills_dir(), filename) - with open(path, "w") as f: f.write(content) - return {"path": path, "skill": filename, "size": len(content)} - - @router.post("/api/skill/review") async def skill_review(request: Request): """Stage LLM-generated skill code for human review -- does NOT write to disk.""" @@ -68,139 +66,6 @@ async def skill_approve(request: Request): return {"path": path, "skill": filename, "size": len(code)} -@router.post("/api/forge") -async def forge_skill(request: Request): - """Convert arbitrary code (or a URL to code) into a CODEC skill using the LLM""" - import re as _re - body = await request.json() - code = body.get("code", "").strip() - if not code or len(code) < 4: - return JSONResponse({"error": "No code provided"}, status_code=400) - - # URL import: if code is a URL, fetch the source first - source_url = None - if code.startswith(("http://", "https://")): - try: - import requests as _rq_url - resp = _rq_url.get(code, timeout=15, headers={"User-Agent": "CODEC-Forge/1.0"}) - if resp.status_code != 200: - return JSONResponse({"error": f"URL fetch failed: {resp.status_code} {code}"}, status_code=400) - source_url = code - code = resp.text.strip() - if not code: - return JSONResponse({"error": "URL returned empty content"}, status_code=400) - except Exception as e: - return JSONResponse({"error": f"URL fetch error: {e}"}, status_code=400) - - cfg = {} - try: - with open(CONFIG_PATH) as f: cfg = json.load(f) - except Exception as e: - log.warning(f"Non-critical error: {e}") - - base_url = cfg.get("llm_base_url", "http://localhost:8081/v1") - model = cfg.get("llm_model", "") - api_key = cfg.get("llm_api_key", "") - kwargs = {k: v for k, v in cfg.get("llm_kwargs", {}).items() if k != "enable_thinking"} - - headers = {"Content-Type": "application/json"} - if api_key: headers["Authorization"] = "Bearer " + api_key - - url_note = f"\n(Fetched from: {source_url})" if source_url else "" - - prompt = f"""Convert the following code into a CODEC skill Python file. - -CRITICAL: Convert THIS EXACT CODE below. Do NOT invent a weather skill or any other unrelated skill. -Base the skill NAME, DESCRIPTION, TRIGGERS, and implementation ENTIRELY on the actual code provided.{url_note} - -OUTPUT ONLY the Python file content — no markdown, no backticks, no explanation. - -EXACT FORMAT REQUIRED: -\"\"\"CODEC Skill: [Name derived from the actual code]\"\"\" -SKILL_NAME = "[lowercase_name_matching_what_the_code_does]" -SKILL_DESCRIPTION = "[One line describing what THIS code actually does]" -SKILL_TRIGGERS = ["phrase 1", "phrase 2", "phrase 3", "phrase 4"] - -import os, json # only imports actually needed - -def run(task, app="", ctx=""): - # Wrap the actual code logic here - return "result string" # must return a string - -RULES: -- SKILL_NAME: lowercase, underscores only — name it after what the code ACTUALLY does -- SKILL_TRIGGERS: natural phrases a user would say to run THIS specific skill -- run() must always return a string -- Preserve the core logic of the original code -- Add error handling around external calls - -CODE TO CONVERT: -{code}""" - - try: - import requests as rq_forge - payload = {"model": model, "messages": [{"role": "user", "content": prompt}], - "max_tokens": 1500, "temperature": 0.1, - "chat_template_kwargs": {"enable_thinking": False}} - payload.update(kwargs) - r = rq_forge.post(base_url + "/chat/completions", json=payload, headers=headers, timeout=90) - if r.status_code != 200: - return JSONResponse({"error": f"LLM returned {r.status_code}"}, status_code=502) - - raw = r.json()["choices"][0]["message"].get("content", "").strip() - raw = _re.sub(r'[\s\S]*?', '', raw).strip() - raw = _re.sub(r'^```[\w]*\n?', '', raw).strip() - raw = _re.sub(r'\n?```$', '', raw).strip() - - # Title line: if first line isn't valid Python, wrap it as a docstring - lines = raw.split('\n') - if lines: - first = lines[0].strip() - valid_starts = ('"""', "'''", 'import ', 'from ', 'SKILL_', '#', 'def ', 'class ', '@') - if first and not any(first.startswith(s) for s in valid_starts): - lines[0] = '"""' + first + '"""' - raw = '\n'.join(lines) - - if "SKILL_NAME" not in raw or "def run" not in raw: - return JSONResponse({"error": "LLM output is not a valid skill", "raw": raw}, status_code=422) - - name_match = _re.search(r'SKILL_NAME\s*=\s*["\'](\w+)["\']', raw) - skill_name = name_match.group(1) if name_match else "forged_skill" - - BLOCKED_IN_SKILLS = [ - "os.system(", "subprocess.", "eval(", "exec(", "__import__", - "importlib", "shutil.rmtree", "open('/etc", "open('/dev", "ctypes", - ] - for blocked in BLOCKED_IN_SKILLS: - if blocked in raw: - return JSONResponse({"error": f"Blocked pattern in forged skill: {blocked}", "raw": raw}, status_code=403) - - try: - compile(raw, f"{skill_name}.py", "exec") - except SyntaxError as e: - return JSONResponse({"error": f"Syntax error in generated skill: {e}", "raw": raw}, status_code=422) - - skills_dir = _get_skills_dir() - os.makedirs(skills_dir, exist_ok=True) - filepath = os.path.join(skills_dir, f"{skill_name}.py") - with open(filepath, "w") as f: f.write(raw) - - repo_skills = os.path.join(DASHBOARD_DIR, "skills") - if os.path.isdir(repo_skills): - with open(os.path.join(repo_skills, f"{skill_name}.py"), "w") as f: f.write(raw) - - msg = f"Skill '{skill_name}' forged!" - if source_url: - msg += f" (imported from URL)" - msg += " Run: pm2 restart ava-autopilot" - return {"skill_name": skill_name, "path": filepath, "code": raw, - "source_url": source_url, "message": msg} - - except Exception as e: - import traceback; traceback.print_exc() - return JSONResponse({"error": str(e)}, status_code=500) - - @router.get("/api/skills") async def skills(): """List installed skills""" diff --git a/scripts/feature_audit.py b/scripts/feature_audit.py index f0b1146..947facc 100644 --- a/scripts/feature_audit.py +++ b/scripts/feature_audit.py @@ -768,25 +768,28 @@ def test_vibe(): SKIP(A, 9, "Copy code to clipboard", "UI-only") - # 10-11: Skill save/test + # 10-11: Skill creation goes through review-and-approve flow only. + # /api/save_skill and /api/forge were removed in PR-1B (closes D-2 + D-3, + # see docs/audits/PHASE-1-SECURITY.md). The audit confirms the removal + # by asserting both endpoints now return 404. ok, detail = check_endpoint("/api/save_skill", method="POST") - if "400" in detail or "422" in detail or ok or "skill" in detail.lower(): - PASS(A, 10, "Save as CODEC Skill", "Save skill endpoint responsive") + if "404" in detail: + PASS(A, 10, "Save as CODEC Skill", "Endpoint removed (PR-1B, D-3) — review-and-approve only") else: - FAIL(A, 10, "Save as CODEC Skill", detail) + FAIL(A, 10, "Save as CODEC Skill", f"Expected 404 (PR-1B removal), got: {detail}") ok, detail = check_endpoint("/api/skill/review", method="POST") if "400" in detail or "422" in detail or ok: - PASS(A, 11, "Test Skill", "Skill review endpoint responsive") + PASS(A, 11, "Test Skill (review gate)", "Skill review endpoint responsive") else: - FAIL(A, 11, "Test Skill", detail) + FAIL(A, 11, "Test Skill (review gate)", detail) - # 12: Skill Forge + # 12: /api/forge — removed in PR-1B alongside the Skill Forge UI. ok, detail = check_endpoint("/api/forge", method="POST") - if "400" in detail or "422" in detail or ok or "mode" in detail.lower(): - PASS(A, 12, "Skill Forge modal", "Forge endpoint responsive") + if "404" in detail: + PASS(A, 12, "Skill Forge endpoint", "Endpoint removed (PR-1B, D-2) — SSRF + RCE-write path closed") else: - FAIL(A, 12, "Skill Forge modal", detail) + FAIL(A, 12, "Skill Forge endpoint", f"Expected 404 (PR-1B removal), got: {detail}") # 13: Project management ok, detail = check_endpoint("/api/vibe/sessions") diff --git a/setup_codec.py b/setup_codec.py index 2710344..128f3c2 100644 --- a/setup_codec.py +++ b/setup_codec.py @@ -454,7 +454,7 @@ def main(): print(f" {O}CODEC Instant{X} — 8 right-click text services (Proofread, Elevate, Explain, Prompt, Translate, Reply, Read Aloud, Save)") print(f" {O}CODEC Dictate{X} — Hold right CMD to dictate text anywhere") print(f" {O}CODEC Chat{X} — Deep Chat with 250K context + AI Agents") - print(f" {O}CODEC Vibe{X} — AI-powered IDE with Skill Forge") + print(f" {O}CODEC Vibe{X} — AI-powered IDE (Monaco + live preview)") print(f" {O}CODEC Voice{X} — Live voice calls with skill dispatch") print(f" {O}CODEC Overview{X} — Your AI dashboard — every tool, every agent, one screen") print() diff --git a/tests/test_critical_fixes.py b/tests/test_critical_fixes.py index f0b2045..0aef231 100644 --- a/tests/test_critical_fixes.py +++ b/tests/test_critical_fixes.py @@ -94,67 +94,16 @@ def test_pin_success_resets_counter(self): assert ip not in codec_dashboard._pin_attempts -# ── Fix 4: Skill Forge semantic validation ── - -class TestSkillForgeBlocklist: - """Skill code must be scanned for dangerous imports/calls.""" - - DANGEROUS_SKILL_PATTERNS = [ - "os.system('rm -rf /')", - "subprocess.Popen(['ls'])", - "subprocess.run(['ls'])", - "eval(user_input)", - "exec(code_string)", - "__import__('os').system('id')", - "import importlib; importlib.import_module('os')", - "shutil.rmtree('/important')", - "open('/etc/passwd')", - "open('/dev/sda', 'w')", - "import ctypes; ctypes.CDLL('libc.so.6')", - ] - - SAFE_SKILL_CODE = ''' -SKILL_NAME = "test_skill" -SKILL_DESCRIPTION = "A safe test skill" -SKILL_TRIGGERS = ["test"] - -def run(task, app=None, ctx=None): - return "Hello from test skill" -''' - - @pytest.mark.parametrize("dangerous_code", DANGEROUS_SKILL_PATTERNS) - def test_blocked_patterns_detected(self, dangerous_code): - """Each dangerous pattern must be caught by the blocklist.""" - BLOCKED_IN_SKILLS = [ - "os.system(", "subprocess.", "eval(", "exec(", "__import__", - "importlib", "shutil.rmtree", "open('/etc", "open('/dev", "ctypes", - ] - skill_code = f'SKILL_DESCRIPTION = "test"\ndef run(task):\n {dangerous_code}' - found = any(b in skill_code for b in BLOCKED_IN_SKILLS) - assert found, f"Dangerous pattern not caught: {dangerous_code}" - - def test_safe_skill_passes(self): - """Normal skill code should not be flagged.""" - BLOCKED_IN_SKILLS = [ - "os.system(", "subprocess.", "eval(", "exec(", "__import__", - "importlib", "shutil.rmtree", "open('/etc", "open('/dev", "ctypes", - ] - found = any(b in self.SAFE_SKILL_CODE for b in BLOCKED_IN_SKILLS) - assert not found, "Safe skill code was incorrectly flagged" - - def test_blocklist_has_minimum_patterns(self): - """Blocklist should have at least 10 patterns (expanded from original 5).""" - # Read the actual source to verify - import inspect - import codec_dashboard - source = inspect.getsource(codec_dashboard.save_skill) - # Count blocked patterns by finding the list - assert "subprocess." in source, "subprocess. pattern missing (was subprocess.Popen only)" - assert "importlib" in source, "importlib pattern missing" - assert "shutil.rmtree" in source, "shutil.rmtree pattern missing" - assert "ctypes" in source, "ctypes pattern missing" - assert "open('/etc" in source, "open('/etc pattern missing" - assert "open('/dev" in source, "open('/dev pattern missing" +# ── Fix 4 (former): Skill Forge substring blocker (removed in PR-1B) ── +# +# The TestSkillForgeBlocklist class was deleted in PR-1B alongside the +# /api/save_skill and /api/forge endpoints. The substring blocker it tested +# was the weak validation those endpoints used before writing to disk. +# Coverage of dangerous-pattern detection now lives in +# tests/test_skill_registry.py (AST-based, runs at load time on every skill, +# closes D-1) and in routes/skills.py:skill_approve (AST check at write time +# via the review-and-approve flow). See docs/audits/PHASE-1-SECURITY.md +# findings D-1, D-2, D-3 for the full closure trail. # ── Fix 5: Thread-safe _auth_sessions ── diff --git a/tests/test_dashboard.py b/tests/test_dashboard.py index 9057c83..16cc3ea 100644 --- a/tests/test_dashboard.py +++ b/tests/test_dashboard.py @@ -66,7 +66,20 @@ def test_memory_sessions_api(): assert r.status_code in (200, 401) # 401 valid when auth enabled -def test_forge_requires_input(): +def test_forge_endpoint_removed(): + """PR-1B removed /api/forge (closes D-2). Endpoint must return 404 now. + Replacement: /api/skill/review → /api/skill/approve.""" r = requests.post(f"{BASE}/api/forge", json={}, timeout=5) _skip_if_auth(r) - assert r.status_code in [400, 401, 422] # 401 valid when auth enabled + assert r.status_code == 404, f"/api/forge must be removed, got {r.status_code}" + + +def test_save_skill_endpoint_removed(): + """PR-1B removed /api/save_skill (closes D-3). Endpoint must return 404 now.""" + r = requests.post( + f"{BASE}/api/save_skill", + json={"filename": "x.py", "content": "..."}, + timeout=5, + ) + _skip_if_auth(r) + assert r.status_code == 404, f"/api/save_skill must be removed, got {r.status_code}" diff --git a/tests/test_full_product_audit.py b/tests/test_full_product_audit.py index 3113976..dadb564 100644 --- a/tests/test_full_product_audit.py +++ b/tests/test_full_product_audit.py @@ -267,14 +267,14 @@ def test_skill_review_gate_blocks_dangerous(self): # If it staged it, that's OK — it goes through human review assert "review_id" in data or "staged" in str(data).lower() - def test_forge_endpoint(self): - """Forge endpoint should accept valid requests.""" + def test_forge_endpoint_removed(self): + """PR-1B removed /api/forge (closes D-2 in PHASE-1-SECURITY.md). The + replacement flow is /api/skill/review → /api/skill/approve.""" r = api_post("/api/forge", json={ "code": "print('hello')", - "description": "test skill" + "description": "test skill", }) - # 200 = success, 422 = missing fields (valid rejection) - assert r.status_code in [200, 422], f"Forge returned unexpected {r.status_code}" + assert r.status_code == 404, f"/api/forge must return 404, got {r.status_code}" @requires_dashboard diff --git a/tests/test_security.py b/tests/test_security.py index 41559d5..9fd3b54 100644 --- a/tests/test_security.py +++ b/tests/test_security.py @@ -197,15 +197,12 @@ def test_memory_has_cleanup(): assert "VACUUM" in content, "cleanup should VACUUM the database" -# ── save_skill validation ────────────────────────────────────────────────── - -def test_save_skill_validates_content(): - """save_skill must validate skill structure and block dangerous patterns""" - REPO = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) - content = open(os.path.join(REPO, "codec_dashboard.py")).read() - skill_fn = content.split("async def save_skill")[1].split("\nasync def ")[0] if "async def save_skill" in content else "" - assert "SKILL_DESCRIPTION" in skill_fn, "save_skill must validate SKILL_DESCRIPTION presence" - assert "os.system" in skill_fn or "BLOCKED_IN_SKILLS" in skill_fn, "save_skill must block dangerous patterns" +# ── save_skill validation (REMOVED) ──────────────────────────────────────── +# test_save_skill_validates_content was deleted in PR-1B. The /api/save_skill +# endpoint and its weak substring blocker are gone — see routes/skills.py and +# docs/audits/PHASE-1-SECURITY.md D-3. Skill validation now happens at write +# time via /api/skill/approve (AST check) and at load time via +# SkillRegistry.load (manifest + AST gate, see tests/test_skill_registry.py). # ── Preview frame CSP ────────────────────────────────────────────────────── diff --git a/tests/test_skill_routes.py b/tests/test_skill_routes.py new file mode 100644 index 0000000..4a6c160 --- /dev/null +++ b/tests/test_skill_routes.py @@ -0,0 +1,150 @@ +"""Tests for routes/skills.py — verify /api/save_skill and /api/forge are +removed (closes D-2 + D-3), and that /api/skill/review + /api/skill/approve +remain functional as the replacement flow. + +Reference: docs/audits/PHASE-1-SECURITY.md findings D-2, D-3. +""" +from __future__ import annotations + +import inspect +import sys +from pathlib import Path + +from fastapi import FastAPI +from fastapi.testclient import TestClient + +REPO = Path(__file__).resolve().parent.parent +sys.path.insert(0, str(REPO)) + +from routes import skills as skills_routes # noqa: E402 + + +# ── Static checks (read the source) ─────────────────────────────────────────── + + +def test_save_skill_handler_removed(): + """The save_skill function name must not exist in the routes module.""" + assert not hasattr(skills_routes, "save_skill"), ( + "save_skill must be removed (D-3 — direct-write endpoint with weak validation)" + ) + + +def test_forge_skill_handler_removed(): + """The forge_skill function name must not exist in the routes module.""" + assert not hasattr(skills_routes, "forge_skill"), ( + "forge_skill must be removed (D-2 — URL-fetch → LLM → write, no review gate)" + ) + + +def test_no_route_decorator_strings_for_removed_endpoints(): + """The route-decorator strings must not appear in the module source.""" + src = inspect.getsource(skills_routes) + assert '"/api/save_skill"' not in src, ( + "Route string '/api/save_skill' must be gone from routes/skills.py" + ) + assert '"/api/forge"' not in src, ( + "Route string '/api/forge' must be gone from routes/skills.py" + ) + + +def test_replacement_handlers_present(): + """The /api/skill/review + /api/skill/approve handlers must remain.""" + assert hasattr(skills_routes, "skill_review"), "skill_review handler must remain" + assert hasattr(skills_routes, "skill_approve"), "skill_approve handler must remain" + + +# ── Live TestClient checks (verify the FastAPI app behavior) ────────────────── + + +def _make_client() -> TestClient: + app = FastAPI() + app.include_router(skills_routes.router) + return TestClient(app) + + +def test_post_save_skill_returns_404(): + """A POST to the removed endpoint must return 404 (route not registered).""" + client = _make_client() + r = client.post("/api/save_skill", json={"filename": "x.py", "content": "..."}) + assert r.status_code == 404, ( + f"Expected 404 (route removed), got {r.status_code}: {r.text[:200]}" + ) + + +def test_post_forge_returns_404(): + """A POST to the removed endpoint must return 404 (route not registered).""" + client = _make_client() + r = client.post("/api/forge", json={"code": "http://example.com"}) + assert r.status_code == 404, ( + f"Expected 404 (route removed), got {r.status_code}: {r.text[:200]}" + ) + + +def test_post_skill_review_still_accepts_valid_body(): + """The replacement /api/skill/review endpoint must still accept valid + payloads — same shape as save_skill's content+filename.""" + client = _make_client() + valid_skill = ( + 'SKILL_NAME = "test_review"\n' + 'SKILL_DESCRIPTION = "Probe that /api/skill/review still accepts valid input"\n' + 'SKILL_TRIGGERS = ["test review"]\n' + '\n' + 'def run(task, app="", ctx=""):\n' + ' return "ok"\n' + ) + r = client.post( + "/api/skill/review", + json={"code": valid_skill, "filename": "test_review.py"}, + ) + assert r.status_code == 200, ( + f"/api/skill/review must accept valid input — got {r.status_code}: {r.text[:200]}" + ) + body = r.json() + assert "review_id" in body, "Response must include a review_id for the approve step" + assert body.get("filename") == "test_review.py" + + +def test_skill_review_rejects_empty_body(): + """The replacement endpoint still validates input (sanity check on + semantics — we didn't accidentally widen its contract).""" + client = _make_client() + r = client.post("/api/skill/review", json={}) + assert r.status_code == 400 + + +def test_skill_approve_writes_only_after_review(tmp_path, monkeypatch): + """Full review → approve flow must write the skill file to disk only + after the explicit approve step, not at review.""" + # Redirect _get_skills_dir to a tmp_path so the test doesn't touch + # ~/.codec/skills/. + skills_dir = tmp_path / "skills" + skills_dir.mkdir() + monkeypatch.setattr(skills_routes, "_get_skills_dir", lambda: str(skills_dir)) + + client = _make_client() + valid_skill = ( + 'SKILL_NAME = "test_approve"\n' + 'SKILL_DESCRIPTION = "Probe that /api/skill/approve writes after approval"\n' + 'SKILL_TRIGGERS = ["test approve"]\n' + '\n' + 'def run(task, app="", ctx=""):\n' + ' return "ok"\n' + ) + + # Step 1: review — file MUST NOT be on disk yet. + r1 = client.post( + "/api/skill/review", + json={"code": valid_skill, "filename": "test_approve.py"}, + ) + assert r1.status_code == 200 + review_id = r1.json()["review_id"] + assert not (skills_dir / "test_approve.py").exists(), ( + "Review step must NOT write to disk" + ) + + # Step 2: approve — file MUST be on disk now. + r2 = client.post("/api/skill/approve", json={"review_id": review_id}) + assert r2.status_code == 200 + written = skills_dir / "test_approve.py" + assert written.exists(), "Approve step must write the skill to disk" + assert "test_approve" in written.read_text(encoding="utf-8")