From 082f913e5d16549042ed6f8f0d5681e15f8a54c6 Mon Sep 17 00:00:00 2001 From: saagpatel Date: Sun, 7 Jun 2026 14:38:23 -0700 Subject: [PATCH] chore: remove agent scratchpad and internal planning files from public repo --- .codex/verify.commands | 3 - AGENTS.md | 81 --------- CASE-STUDY.md | 248 -------------------------- DEMO-PLAN.md | 95 ---------- DEMO-SCRIPT.md | 64 ------- IMPLEMENTATION-ROADMAP.md | 365 -------------------------------------- 6 files changed, 856 deletions(-) delete mode 100644 .codex/verify.commands delete mode 100644 AGENTS.md delete mode 100644 CASE-STUDY.md delete mode 100644 DEMO-PLAN.md delete mode 100644 DEMO-SCRIPT.md delete mode 100644 IMPLEMENTATION-ROADMAP.md diff --git a/.codex/verify.commands b/.codex/verify.commands deleted file mode 100644 index aaa9499..0000000 --- a/.codex/verify.commands +++ /dev/null @@ -1,3 +0,0 @@ -# codex-os-managed -python3 -m pytest -q -p no:cacheprovider -ruff check src/ tests/ diff --git a/AGENTS.md b/AGENTS.md deleted file mode 100644 index 1be1e7b..0000000 --- a/AGENTS.md +++ /dev/null @@ -1,81 +0,0 @@ - -# Portfolio Context - -## What This Project Is - -A portfolio audit and operator tool — a "GitHub portfolio operating system" — for -developers with many repositories. It clones every repo on a GitHub account, runs 12 -analyzers across completeness and interest dimensions, assigns letter grades, achievement -badges, and dual-axis scores, preserves historical state in SQLite, and generates aligned -JSON / Markdown / HTML / Excel-workbook / control-center surfaces so you can decide what to -finish, fix, or safely ignore. Published on PyPI as `github-repo-auditor`. The day-to-day -operating surfaces are the Excel workbook and the read-only `audit triage --control-center` -queue. - -## Current State - -The audit tool, portfolio-truth layer, risk/security overlay, Action Sync proposal -lane, and local `audit serve`/desktop-consumer surfaces are active. The public -remote is `canonical`; `origin` remains a stale private archive and should not be -used for PRs. Do not trust hardcoded status or test-count claims in handoff text: -rerun the local gates and inspect `output/portfolio-truth-latest.json` for the -current Portfolio OS state. - -## Stack - -- Language: Python 3.11+ -- GitHub API: REST v3 + GraphQL (raw `requests`) -- Excel: `openpyxl` + committed workbook template; PDF: `fpdf2` -- AI narrative: Anthropic Claude API; complexity analysis: Radon; CLI output: Rich -- Storage: SQLite history warehouse -- `pyproject.toml` is the canonical dependency definition (`requirements.txt` is a synced mirror) - -## How To Run - -```bash -# install (editable, dev + optional extras) -pip install -e ".[dev,serve,semantic,config]" - -# core operator loop -audit run --doctor # preflight diagnostics -audit run --html # full audit + workbook + dashboard -audit triage --control-center # read-only operator queue -audit report --portfolio-truth # regenerate workspace truth layer -python -m src.cli --portfolio-truth --portfolio-truth-include-security # demo truth + security overlay -audit serve # local web UI at http://127.0.0.1:8080/ - -# tests + gates -python3 -m pytest -q -p no:cacheprovider # full suite -python3 -m ruff check src/ tests/ # lint -make demo # token-free sample run from fixture -make workbook-gate # workbook invariant check (workbook code only) -``` - -## Known Risks - -- **Dual remote**: push/PR to the `canonical` remote (public), NOT `origin` — `origin` is a - stale private archive with unrelated history, so PRs against it fail with "no common - history". Solo repo, so merges land with `gh ... --admin`. -- **The context-quality metric is gameable**: injecting generic-filler context blocks zeros - the flag while lying about resumability. Only real harvested content counts — this block - was hand-authored for exactly that reason. -- **Five parallel render surfaces** (Excel workbook, Markdown, HTML, review-pack, handoff) - carry a parity tax: every new signal must be threaded through all five (the motivation for - the deferred Arc F renderer simplification). -- **Partial reruns fail closed**: `--repos` / `--incremental` require a compatible full - baseline; the stored baseline contract rejects mismatched portfolio context rather than - emitting a misleading partial. -- **Manual workbook signoff**: the final release step is opening the generated `standard` - workbook in desktop Excel and recording the outcome with `make workbook-signoff`. - -## Next Recommended Move - -For Portfolio OS demo readiness, refresh `portfolio-truth-latest.json` with -`--portfolio-truth-include-security`, refresh `audit triage --control-center`, -then launch PortfolioCommandCenter with `pnpm demo:desktop`. Current proof -points: 129 projects, 63 open high/critical Dependabot-alert repos, and Weekly -Digest says to start with codexkit. After demo readiness is settled, continue -with the highest-signal live queue item from the current control-center output -rather than reviving old roadmap counts. - - diff --git a/CASE-STUDY.md b/CASE-STUDY.md deleted file mode 100644 index c4c3f4b..0000000 --- a/CASE-STUDY.md +++ /dev/null @@ -1,248 +0,0 @@ -# Operator OS — A Multi-Agent Control Plane for a 129-Repo Portfolio - -> A case study in turning portfolio sprawl into a single source of truth, and -> coordinating two autonomous coding agents against it without stepping on each -> other. - -This document describes a working system — six local services plus a two-agent -coordination model — that I run over my own development portfolio. The metrics -below are pulled verbatim from a real `portfolio-truth-latest.json` snapshot -(schema `0.5.0`), not illustrative numbers. - ---- - -## The problem: `git log` lies about your portfolio - -If you ship fast and start often, you accumulate repositories faster than you can -remember them. The naive way to take inventory — walk every repo and read its -`git log` — answers the wrong question. A recent commit tells you *something -happened*; it does not tell you whether the project is **healthy, drifting, -blocked, or safe to ignore**. - -Concretely, `git log` across 100+ repos can't answer: - -- Which repos have **open high/critical security alerts** right now? -- Which ones are **ship-ready but haven't shipped**? -- Which have **tests and CI**, and which are one bad refactor from silent breakage? -- Which were last touched by **me**, by **Claude Code**, or by **Codex** — and is - that drift expected? -- Which are genuinely **stale** versus merely quiet between releases? - -A timestamp is a fact with no judgement attached. The portfolio needed a layer -that turns raw git/GitHub facts into a *graded, precedence-resolved, historical* -picture — one trustworthy artifact every other tool can consume. That artifact is -the spine of the whole system. - ---- - -## The system: truth → state → events → surface - -The Operator OS is five data services and one desktop shell, arranged as a -one-directional pipeline. Each layer has exactly one job and a clean contract with -the next. - -```mermaid -flowchart TB - subgraph AGENTS["Coordination layer — two builder lanes + a dispatcher"] - CAI["Claude.ai
dispatcher / PM
(handoffs only)"] - CC["Claude Code
builder lane"] - CX["Codex
builder lane"] - end - - subgraph L1["1 · TRUTH"] - AUD["GithubRepoAuditor
Python · 12 analyzers · precedence matrix"] - TJSON[("portfolio-truth-latest.json
+ dated history snapshots")] - end - - subgraph L2["2 · SHARED STATE"] - BDB["bridge-db
SQLite (WAL) · 23 MCP tools
caller-owned writes"] - end - - subgraph L3["3 · EVENTS"] - NH["notification-hub
FastAPI · classify → suppress → route"] - OUT["macOS push · Slack · JSONL log"] - end - - subgraph L4["4 · SURFACES"] - PCC["PortfolioCommandCenter
Tauri 2 desktop"] - PH["portfolio-health
MCP overlay"] - CT["cost-tracker
MCP overlay"] - end - - CC -->|work across 129 repos| AUD - CX -->|work across 129 repos| AUD - AUD --> TJSON - TJSON --> PCC - - CC -->|log activity / pick up handoff| BDB - CX -->|log activity / pick up handoff| BDB - CAI -->|dispatch handoff| BDB - BDB -->|assigned work| CC - BDB -->|assigned work| CX - - BDB -->|watched activity| NH - NH --> OUT - - BDB -->|activity join| PH - PH --> PCC - CT -->|record cost| BDB -``` - -### The components - -| Layer | Component | Stack | One job | -|---|---|---|---| -| **Truth** | **GithubRepoAuditor** | Python 3.11+, SQLite history warehouse, Rich CLI | Scan every repo, run 12 analyzers, resolve a precedence matrix, emit one canonical `portfolio-truth-latest.json` + dated history. | -| **Shared state** | **bridge-db** | SQLite (WAL), MCP over stdio, FTS5 | Single store for cross-agent state: activity, handoffs, snapshots, cost, long-lived context. 23 tools; every write is ownership-gated by `caller`. | -| **Events** | **notification-hub** | Python 3.12, FastAPI, localhost-only | Turn agent/tool events into *routed* notifications: deterministic classify → dedup/quiet-hours/rate-limit suppress → deliver. | -| **Surface** | **PortfolioCommandCenter** | Tauri 2 (Rust shell) + React 18 + TS strict + Vite 6 | A signed desktop app that reads the truth snapshot read-only and renders the portfolio, weekly digest, and security burndown. | -| **Overlay** | **portfolio-health** | Python, MCP, SQLite FTS5 | Join project memory against bridge-db activity to answer "what's active / stale / ship-ready-but-unshipped." | -| **Overlay** | **cost-tracker** | Python, MCP, `ccusage` | Live agent spend: today, per-session, monthly trend, top projects, threshold alerts — persisted back into bridge-db. | - -**Why this shape works:** the auditor is the *only* writer of truth, so every -surface agrees by construction. bridge-db is the *only* writer of shared agent -state, so two agents never disagree about who owns what. There is no shared -daemon — each MCP client spawns its own bridge-db process over stdio, and SQLite -WAL mode plus a busy-timeout makes concurrent writes safe without a coordinator. - ---- - -## Real metrics from the truth snapshot - -Every number here is read directly from the canonical -`portfolio-truth-latest.json` (schema `0.5.0`). It is regenerated on demand; this -is one real snapshot. - -### Portfolio shape — 129 projects - -| Dimension | Breakdown | -|---|---| -| **Total projects** | **129** (128 git repos, 1 non-git working dir) | -| **Activity status** | 21 recent · 91 active · 5 stale · 12 archived | -| **Lifecycle** | 107 active · 6 maintenance · 3 dormant · 12 archived · 1 uncataloged | -| **Recency** | 91 repos touched in the last 7 days · 123 within 30 days · 127 within 90 days · median **4 days** since last meaningful activity | - -The recency curve is the punchline: **only 2 of 129 repos** are older than 90 days. -This isn't a graveyard of abandoned projects — it's an actively churning portfolio, -which is *exactly* why a timestamp-only view is useless. Almost everything looks -"recent." The auditor's job is to grade what "recent" actually means. - -### Health & risk - -| Dimension | Breakdown | -|---|---| -| **Risk tier** | 61 baseline · 27 moderate · 29 elevated · 12 deferred | -| **Security posture** | **63 repos** carry open high/critical Dependabot alerts; **49 repos** are currently classified with security risk | -| **Tests present** | 101 / 129 (78%) | -| **CI present** | 81 / 129 (63%) | -| **License present** | 100 / 129 (78%) | -| **Context quality** | 67 minimum-viable · 29 standard · 16 full · 17 boilerplate | - -That **63** is the single most valuable number the desktop demo puts on screen -and the one `git log` can never give you: a precise, current count of repos with -live high/critical security exposure, ready to be burned down. The truth layer -separately marks **49** repos as active security-risk items after applying its -portfolio risk rules. - -### Agent attribution — who built what - -The truth file records a `tool_provenance` for each repo. Across 129 projects: - -| Builder | Repos attributed | -|---|---| -| **Claude Code** | **53** | -| **Codex** | **22** | -| GPT (other) | 12 | -| Unknown / human-seeded | 42 | - -**75 of 129 repos** are attributable to the two autonomous coding agents this -control plane coordinates. That coordination is the other half of the story. - ---- - -## The coordination model: two agents, one control plane - -Claude Code and Codex both write code across the same 129-repo portfolio. Left -uncoordinated, two autonomous agents on a shared filesystem are a merge-conflict -machine. The Operator OS keeps them out of each other's way with three rules. - -### 1 · Lanes — ownership by area, enforced at the write boundary - -Work is partitioned into **lanes**, and bridge-db enforces lane ownership at the -data layer: every write tool checks the `caller` and rejects writes to state the -caller doesn't own. The recognized writers are `cc` (Claude Code), `codex`, -`claude_ai`, and two ops services. A repo's build provenance, CI workflows, and -sync code belong to one lane; another agent reads them but does not mutate them. -The boundary is structural, not a polite convention — an agent *cannot* clobber -another lane's state even if it tries. - -### 2 · Handoffs — a dispatcher hands work down, builders pick it up - -The handoff protocol mirrors a PM-and-engineers org: - -- **Claude.ai dispatches.** Only the `claude_ai` caller may `create_handoff` — it - is the planning/PM seat and never writes code directly. -- **Builders pick up.** `cc` or `codex` calls `pick_up_handoff` to claim a unit of - work, then `clear_handoff` when it's done. -- **State is shared, not messaged.** Handoffs live in bridge-db, so a builder - starting a fresh session reads its assigned work from the store instead of - needing the originating conversation. Context survives session boundaries. - -### 3 · Push policy — feature branches, never `main`, merge server-side - -The hard rule across every repo: **agents never push to `main`/`master`.** It's -enforced by a pre-tool hook, not trusted to the model. The workflow: - -- Each unit of work happens on a **feature branch** (`docs/...`, `feat/...`, - `fix/...`). -- Commits are small, conventional, and verified (compile + test) before they land. -- When a branch is ready, it merges through a **server-side merge** (e.g. a - reviewed PR merge) rather than a local push to a protected branch — which also - keeps the push-to-main guard satisfied without weakening it. -- Repos can carry **distinct push targets** (a public mirror vs. a private - origin), so "where does this land" is per-repo, never assumed. - -The result: two agents, hundreds of branches, zero pushes to protected branches, -and a truth layer that tells you — after the fact — exactly which agent touched -which repo. - ---- - -## What this demonstrates - -Beyond the portfolio itself, the build exercises a set of platform-engineering -patterns: - -- **One-writer-per-fact architecture.** Truth has a single producer (the auditor); - shared state has a single mutation path with ownership gating (bridge-db). Every - consumer agrees by construction — no reconciliation logic anywhere downstream. -- **Contracts over coupling.** Layers communicate through versioned artifacts - (`schema_version`) and typed load commands, so the desktop shell can render a - snapshot it never has to understand how to compute. -- **Deterministic before probabilistic.** Notification urgency is decided by - keyword rules and explicit policy, not an LLM call — fast, free, and auditable. - The agents reason; the plumbing does not. -- **Safety enforced at the boundary, not requested politely.** No-push-to-main, - caller-owned writes, localhost-only daemons, and secrets read from the OS - keychain (never from repo files) are all structural guarantees. -- **Local-first and private by default.** Every service binds to loopback or runs - over stdio. Nothing in this control plane requires a hosted backend. - ---- - -## Component reference - -| Component | Role in the pipeline | Interface | -|---|---|---| -| GithubRepoAuditor | Produces canonical portfolio truth + history | CLI, JSON/HTML/Markdown/Excel outputs | -| bridge-db | Shared cross-agent state | MCP (stdio), 23 tools, SQLite WAL | -| notification-hub | Event classification + routed delivery | Localhost HTTP intake + bridge file watcher | -| PortfolioCommandCenter | Desktop visualization of truth | Tauri 2 app, read-only truth consumer | -| portfolio-health | Active/stale/unshipped overlay | MCP (stdio), 5 tools, FTS5 | -| cost-tracker | Agent spend visibility | MCP (stdio), 6 tools, `ccusage` + bridge-db | - ---- - -*Metrics in this document are drawn from a real `portfolio-truth-latest.json` -snapshot (schema 0.5.0). Paths are shown home-relative; this is a sanitized, -public write-up of a private local system.* diff --git a/DEMO-PLAN.md b/DEMO-PLAN.md deleted file mode 100644 index 7aa7503..0000000 --- a/DEMO-PLAN.md +++ /dev/null @@ -1,95 +0,0 @@ -# Demo Plan — Operator OS in 90 Seconds - -A shot-by-shot script for a screen recording that makes a hiring manager -*understand the system* — not just see a pretty dashboard. The throughline: -**`git log` can't grade a portfolio; this can — and two agents act on it.** - -The demo is driven entirely by **PortfolioCommandCenter** (the Tauri 2 desktop -shell), because it renders the truth artifact every other layer produces. Five -tabs, one header action, one closing line. - ---- - -## What the viewer should walk away knowing - -1. There's a **single source of truth** over 129 repos — graded, not just dated. -2. It surfaces the number `git log` can't: **63 repos with open high/critical - Dependabot alerts**, plus which package bump clears each advisory group. -3. The weekly digest gives one current decision: **start with codexkit**. - -If those three land in 90 seconds, the demo worked. - -Current screenshot proof for the five-tab local demo is archived in -[`docs/demo-proof/2026-06-07/`](docs/demo-proof/2026-06-07/). - ---- - -## Pre-record setup (off-camera) - -Do this before hitting record so the app opens warm and current: - -1. **Refresh the producer artifacts** so the snapshot is today's: - ```sh - # in the auditor repo — flags FIRST, then username, run via python -m - python -m src.cli --portfolio-truth --portfolio-truth-include-security - python -m src.cli triage --control-center - ``` -2. **Launch the desktop shell** with `pnpm demo:desktop` from - `../PortfolioCommandCenter`. -3. Confirm the header shows the correct **output directory** and a fresh - `generated_at`. -4. Set window to a **clean 1920×1080 capture**; hide the macOS menu bar clutter. - -> **Privacy callout (this is for a public audience):** the Portfolio tab lists -> real repo names. Before publishing, either (a) scroll/zoom to the **aggregate -> counts and risk columns** rather than individual rows, or (b) blur repo-name -> cells in post. Show the *shape* of the portfolio, not the contents. - ---- - -## The 90-second shot list - -| Time | Screen | Action | Line to land | -|---|---|---|---| -| **0:00–0:10** | App launch / **Portfolio** tab | Open cold. Let the full 129-row table paint. | *"Every repo I've ever started — 129 of them — in one graded view. Not a commit log. A judgement."* | -| **0:10–0:28** | **Portfolio** tab | Sort by risk tier; point at the columns: risk, context quality, registry status, **tool**, open high/critical alert count. | *"Each repo carries a risk tier, a context-quality grade, and who built it. `git log` gives you a timestamp; this tells you what the timestamp means."* | -| **0:28–0:48** | **Risk + Security** tab | Filter to elevated-risk; show the posture counts (117 scanned / 63 with open high-critical / 65 critical / 191 high). | *"63 of 129 repos have an open high or critical Dependabot alert. That's the number a timestamp can never give you."* | -| **0:48–1:02** | **Burndown** tab | Show the advisory-grouped fix list — one package bump → the repos it clears. | *"And it's actionable: each advisory is grouped by the single dependency bump that burns it down across every affected repo."* | -| **1:02–1:14** | **Trends** → **Weekly Digest** | Flash the risk/security drift chart across snapshots, then the digest's headline + decision + next-step. | *"It keeps history, so I can see drift over time — and today it says: start with codexkit."* | -| **1:14–1:26** | Header **Run auditor** action | Click **Run auditor** (fast); show the views reload on completion. | *"This isn't a static export. I regenerate the truth live, right from the app."* | -| **1:26–1:30** | Back on **Portfolio**, point at the **tool** column | Rest on the Claude Code / Codex attribution. | *"And it knows which agent built what — because two of them work this portfolio under one control plane."* | - -Total: **90 seconds**, six beats, one number that sticks (**63**). - ---- - -## Optional extended cut (~2:30) — the coordination story - -If the audience is technical and you have extra runway, append a second act that -shows the *control plane*, not just the dashboard: - -| Time | What to show | Point | -|---|---|---| -| +0:00–0:25 | A terminal split: Claude Code on a `feat/...` branch in one repo, Codex on a `fix/...` branch in another. | Two autonomous agents, different lanes, same portfolio. | -| +0:25–0:50 | bridge-db handoff flow: a dispatched handoff being **picked up**, then **cleared** (via the MCP tools or the bridge markdown). | Work is shared state, not chat history — it survives session boundaries. | -| +0:50–1:10 | A blocked push to `main` (the pre-tool guard firing), then the same work landing via a **server-side merge**. | Safety is enforced at the boundary, not requested politely. | -| +1:10–1:30 | A **notification-hub** event arriving (macOS push) after a session completes. | Events are classified and routed deterministically — no LLM in the plumbing. | - ---- - -## Recording checklist - -- [ ] Artifacts regenerated today (`generated_at` is current in the header). -- [ ] Window at 1920×1080, menu-bar/desktop clutter hidden. -- [ ] Individual repo names blurred or kept off-frame; show aggregates. -- [ ] No terminal scrollback exposing absolute home paths, tokens, or hostnames. -- [ ] The number **63** is on screen and called out by voice. -- [ ] Closing line names both agents (Claude Code + Codex) and "one control plane." -- [ ] Final cut ≤ 90 seconds for the core demo. - ---- - -*This plan drives PortfolioCommandCenter against a real -`portfolio-truth-latest.json` snapshot (schema 0.5.0). Keep individual repo names -out of the published frame — show the system's shape, not the portfolio's -contents.* diff --git a/DEMO-SCRIPT.md b/DEMO-SCRIPT.md deleted file mode 100644 index d827848..0000000 --- a/DEMO-SCRIPT.md +++ /dev/null @@ -1,64 +0,0 @@ -# Demo Voiceover Script — Operator OS (90 seconds) - -Record-ready teleprompter script for the demo defined in [DEMO-PLAN.md](DEMO-PLAN.md). -Drive **PortfolioCommandCenter**; read the **bold spoken lines**; do the -`[SCREEN]` action just before each line. Total spoken ≈ 180 words ≈ 72 s of -speech, leaving ~18 s of breathing room inside a 90 s cut. - -**Delivery:** conversational, confident, ~150 wpm. Pause on the dashes. Land hard -on the word **sixty-three** — that's the line that sells the whole system. - ---- - -### 0:00 — Cold open · app launch / Portfolio tab -`[SCREEN]` Open the app cold. Let the full 129-row table paint. Don't narrate the loading. - -> **"This is every repo I've ever started — a hundred and twenty-nine of them — in one graded view. Not a commit log. A judgment call on every single one."** - -### 0:10 — Portfolio columns -`[SCREEN]` Sort by risk tier. Slowly run the cursor across the columns: risk · context quality · registry status · tool · open-alert count. - -> **"Each repo carries a risk tier, a context-quality grade, and which agent built it. `git log` gives you a timestamp. This tells you what that timestamp actually means."** - -### 0:28 — Risk + Security tab -`[SCREEN]` Switch to Risk + Security. Filter to elevated risk; let the security posture counts fill the frame. - -> **"And here's the number a timestamp can never give you — sixty-three of these repos have a live, high-or-critical security alert. Right now."** - -### 0:48 — Burndown tab -`[SCREEN]` Switch to Burndown. Hover one advisory group so the "repos cleared by this bump" list expands. - -> **"It's not just a count — it's a fix list. Every advisory is grouped by the one dependency bump that clears it across every repo it touches."** - -### 1:02 — Trends → Weekly Digest -`[SCREEN]` Flash the Trends drift chart (2–3 s), then cut to the Weekly Digest: headline, decision, next-step. - -> **"It keeps history, so I can watch risk drift over time. And every week it hands me one headline, one decision, one next move."** - -### 1:14 — Run auditor (header action) -`[SCREEN]` Click **Run auditor** (fast mode). Let the views visibly reload on completion. - -> **"This isn't a static export. I regenerate the truth live, right from the app."** - -### 1:26 — Close · back on Portfolio, rest on the tool column -`[SCREEN]` Return to Portfolio. Rest the cursor on the Claude Code / Codex attribution column. Hold for the final beat. - -> **"And it knows which agent built what — because two of them work this portfolio, under one control plane."** - -`[SCREEN]` Hold 1 s on the full table, then cut. - ---- - -## Pickup lines (swap in if a beat runs long or you want a different close) - -- **Tighter cold open:** *"A hundred and twenty-nine repos. One question: which ones are actually worth finishing? This answers it."* -- **Alt security beat:** *"Sixty-three repos with live high-or-critical alerts — and the exact bump that clears each one."* -- **Alt close (coordination-forward):** *"Two autonomous agents, hundreds of branches, one source of truth keeping them honest."* - -## Numbers cheat-sheet (say these exactly) -- **129** total repos · **63** with open high/critical Dependabot alerts · **49** classified security-risk items -- Agent attribution: **Claude Code 53 · Codex 22** (the two coordinated lanes) -- Recency: **91** repos touched in the last 7 days — "almost everything looks recent, which is *why* a timestamp is useless" - -*Pairs with DEMO-PLAN.md. Keep individual repo names blurred or off-frame when -recording — show the system's shape, not the portfolio's contents.* diff --git a/IMPLEMENTATION-ROADMAP.md b/IMPLEMENTATION-ROADMAP.md deleted file mode 100644 index df1ef77..0000000 --- a/IMPLEMENTATION-ROADMAP.md +++ /dev/null @@ -1,365 +0,0 @@ -# GitHub Repo Auditor — Implementation Roadmap - -## Architecture - -### System Overview -``` -[CLI Entry] → [GitHub API Client] → [Repo Fetcher (clone)] → [Analyzer Engine] → [Report Generator] - ↓ ↓ ↓ ↓ - [Rate Limiter] [/tmp/audit-repos/] [Per-Repo Scores] [output/*.json + *.md] -``` - -**Flow:** -1. CLI accepts username + optional token -2. GitHub API fetches all repos (paginated, handles 100+ repos) -3. Each repo is shallow-cloned to a temp directory -4. Analyzer engine runs 10+ dimension checks per repo -5. Results aggregated into JSON + Markdown report -6. Temp clones cleaned up - -### File Structure -``` -github-repo-auditor/ -├── src/ -│ ├── __init__.py -│ ├── cli.py # argparse entry point -│ ├── github_client.py # API calls: list repos, get commit stats, get languages -│ ├── cloner.py # Shallow clone + cleanup -│ ├── analyzers/ -│ │ ├── __init__.py -│ │ ├── base.py # BaseAnalyzer abstract class -│ │ ├── readme.py # README quality scoring -│ │ ├── structure.py # Project structure analysis -│ │ ├── code_quality.py # TODO/FIXME counts, entry points, build configs -│ │ ├── testing.py # Test presence, framework detection -│ │ ├── cicd.py # GitHub Actions / CI detection -│ │ ├── dependencies.py # Lockfile detection, staleness signals -│ │ ├── activity.py # Commit recency, frequency (via API) -│ │ └── completeness.py # Overall completeness heuristic -│ ├── scorer.py # Aggregates analyzer results into per-repo score -│ └── reporter.py # Generates JSON + Markdown output -├── output/ # Generated reports land here -├── requirements.txt -├── CLAUDE.md -├── IMPLEMENTATION-ROADMAP.md -└── README.md -``` - -### Data Model - -No database. All data flows through Python dataclasses in memory and writes to JSON. - -```python -from dataclasses import dataclass, field -from datetime import datetime -from typing import Optional - -@dataclass -class RepoMetadata: - name: str - full_name: str - description: Optional[str] - language: Optional[str] - languages: dict[str, int] # language -> bytes - private: bool - fork: bool - archived: bool - created_at: datetime - updated_at: datetime - pushed_at: datetime - default_branch: str - stars: int - forks: int - open_issues: int - size_kb: int - html_url: str - clone_url: str - topics: list[str] - -@dataclass -class AnalyzerResult: - dimension: str # e.g., "readme", "testing", "structure" - score: float # 0.0 – 1.0 - max_score: float # always 1.0 - findings: list[str] # human-readable notes - details: dict # dimension-specific structured data - -@dataclass -class RepoAudit: - metadata: RepoMetadata - analyzer_results: list[AnalyzerResult] - overall_score: float # weighted composite 0.0 – 1.0 - completeness_tier: str # "shipped", "functional", "wip", "skeleton", "abandoned" - flags: list[str] # e.g., ["no-readme", "no-tests", "stale-2yr"] - -@dataclass -class AuditReport: - username: str - generated_at: datetime - total_repos: int - repos_audited: int # excludes forks if --skip-forks - tier_distribution: dict[str, int] # tier -> count - average_score: float - audits: list[RepoAudit] -``` - -### API Contracts - -**GitHub REST API v3:** - -| Endpoint | Method | Auth | Rate Limit | Purpose | -|----------|--------|------|------------|---------| -| `/users/{username}/repos` | GET | Token (optional) | 60/hr unauth, 5000/hr auth | List all public repos | -| `/user/repos` | GET | Token (required) | 5000/hr | List all repos including private | -| `/repos/{owner}/{repo}/languages` | GET | Token (optional) | 5000/hr | Language breakdown by bytes | -| `/repos/{owner}/{repo}/commits` | GET | Token (optional) | 5000/hr | Recent commit activity | -| `/repos/{owner}/{repo}/stats/commit_activity` | GET | Token (optional) | 5000/hr | Weekly commit counts (last year) | -| `/repos/{owner}/{repo}/stats/contributors` | GET | Token (optional) | 5000/hr | Contributor commit counts | -| `/repos/{owner}/{repo}/topics` | GET | Token (optional) | 5000/hr | Repo topics | - -**Pagination:** All list endpoints use `Link` header with `rel="next"`. Fetch pages until no `next` link. - -**Auth header:** `Authorization: token {GITHUB_TOKEN}` — read from `GITHUB_TOKEN` env var. - -**Rate limit handling:** Check `X-RateLimit-Remaining` header. If < 10, sleep until `X-RateLimit-Reset` timestamp. - -### Dependencies -```bash -pip install requests python-dateutil -``` - -That's it. Two dependencies. Everything else is stdlib. - ---- - -## Scope Boundaries - -**In scope:** -- Fetch all repos (public + private with token) for a given GitHub username -- Shallow clone each repo and run local file analysis -- Score across 10 dimensions (see analyzer details below) -- Classify each repo into a completeness tier -- Generate JSON report (machine-readable, PCC-compatible) -- Generate Markdown summary report (human-readable) -- Handle 100+ repos gracefully with progress output -- Skip forks optionally via `--skip-forks` flag - -**Out of scope:** -- Web UI or dashboard (output is files only) -- Running actual test suites or build commands -- Dependency vulnerability scanning (just detect presence of lockfiles) -- GitHub Actions run history analysis -- Cross-repo dependency detection -- Organization repos (user repos only) - -**Deferred:** -- Integration with project-registry.md reconciliation (Phase 2) -- PCC import format generation (Phase 2) -- Historical trend tracking across multiple audit runs (future) - -## Security & Credentials -- GitHub token read from `GITHUB_TOKEN` environment variable — never passed as CLI arg, never logged -- Token is optional for public-only audits, required for private repos -- Cloned repos are written to a temp directory and cleaned up after analysis -- No data leaves the machine — all analysis is local - ---- - -## Analyzer Dimension Specifications - -Each analyzer scores 0.0–1.0. The overall score is a weighted average. - -### 1. README Quality (`readme.py`) — Weight: 15% -| Check | Points | Detection | -|-------|--------|-----------| -| README exists | 0.2 | `README.md` or `README` or `README.rst` in root | -| Has project description (>50 chars first section) | 0.2 | Parse first heading + paragraph | -| Has installation/setup instructions | 0.2 | Look for headings containing "install", "setup", "getting started", "usage" | -| Has usage examples or screenshots | 0.2 | Look for code blocks or image references | -| Length > 500 chars | 0.1 | Character count | -| Has badges | 0.1 | `![` patterns in first 10 lines | - -### 2. Project Structure (`structure.py`) — Weight: 10% -| Check | Points | Detection | -|-------|--------|-----------| -| Has `.gitignore` | 0.2 | File exists | -| Has `src/` or `lib/` or language-standard structure | 0.3 | Directory detection based on primary language | -| Has config file (package.json, Cargo.toml, pyproject.toml, etc.) | 0.3 | File exists by known names | -| Has LICENSE file | 0.1 | `LICENSE` or `LICENSE.md` in root | -| Not a flat dump (>1 directory depth) | 0.1 | Directory tree depth analysis | - -### 3. Code Quality Signals (`code_quality.py`) — Weight: 15% -| Check | Points | Detection | -|-------|--------|-----------| -| Has identifiable entry point | 0.3 | `main.py`, `index.ts`, `src/main.rs`, `main.go`, `App.tsx`, etc. | -| TODO/FIXME density < 5 per 1000 LOC | 0.2 | Grep + LOC count | -| Has type definitions (if applicable) | 0.2 | `.ts` files, Python type hints, Rust types | -| No large generated/vendored files | 0.15 | Detect `vendor/`, `node_modules/` committed, files >1MB | -| Has meaningful commit messages (last 10) | 0.15 | Via API: check messages aren't all "update" or "fix" | - -### 4. Testing (`testing.py`) — Weight: 15% -| Check | Points | Detection | -|-------|--------|-----------| -| Test directory or test files exist | 0.4 | `test/`, `tests/`, `__tests__/`, `*_test.*`, `*_spec.*`, `test_*.*` | -| Test framework configured | 0.3 | jest in package.json, pytest in pyproject.toml, etc. | -| Test count > 0 (heuristic) | 0.3 | Count files matching test patterns | - -### 5. CI/CD (`cicd.py`) — Weight: 10% -| Check | Points | Detection | -|-------|--------|-----------| -| `.github/workflows/` exists with YAML files | 0.5 | Directory + file check | -| Alternative CI config (`.travis.yml`, `Jenkinsfile`, `.circleci/`, `Dockerfile`) | 0.3 | File exists | -| Has build script in package.json / Makefile | 0.2 | Parse for "build", "test" scripts | - -### 6. Dependency Management (`dependencies.py`) — Weight: 10% -| Check | Points | Detection | -|-------|--------|-----------| -| Has lockfile (`package-lock.json`, `yarn.lock`, `Cargo.lock`, `poetry.lock`, `Pipfile.lock`) | 0.4 | File exists | -| Has dependency manifest (package.json, requirements.txt, Cargo.toml, go.mod) | 0.4 | File exists | -| Dependencies count is reasonable (not 0, not 500+) | 0.2 | Parse manifest for dep count | - -### 7. Activity & Recency (`activity.py`) — Weight: 15% -| Check | Points | Detection | -|-------|--------|-----------| -| Last push within 6 months | 0.3 | `pushed_at` from API | -| Last push within 1 year | 0.2 | `pushed_at` from API (if >6mo, partial credit) | -| More than 10 commits total | 0.2 | Contributor stats API | -| Commits in last 3 months | 0.2 | Commit activity API | -| Not archived | 0.1 | `archived` field from API | - -### 8. Documentation Beyond README (`completeness.py`) — Weight: 5% -| Check | Points | Detection | -|-------|--------|-----------| -| Has `docs/` directory or wiki-style files | 0.3 | Directory check | -| Has CHANGELOG or HISTORY file | 0.3 | File exists | -| Has CONTRIBUTING guide | 0.2 | File exists | -| Has inline code comments (sampling) | 0.2 | Sample 5 largest files, check comment density | - -### 9. Build/Run Readiness (`completeness.py`) — Weight: 5% -| Check | Points | Detection | -|-------|--------|-----------| -| Has Dockerfile or docker-compose | 0.3 | File exists | -| Has Makefile or build script | 0.3 | File exists | -| Has environment example (.env.example, .env.sample) | 0.2 | File exists | -| Has deployment config (Vercel, Netlify, fly.toml, etc.) | 0.2 | File exists | - ---- - -## Completeness Tier Classification - -Based on overall weighted score: - -| Tier | Score Range | Description | -|------|-------------|-------------| -| **Shipped** | 0.75 – 1.0 | Production-ready or clearly complete. README, tests, CI, recent activity. | -| **Functional** | 0.55 – 0.74 | Works but rough edges. Missing tests or CI. Has clear entry point. | -| **WIP** | 0.35 – 0.54 | Active development, partially built. Some structure, some code, incomplete. | -| **Skeleton** | 0.15 – 0.34 | Scaffolded but barely started. Boilerplate only. | -| **Abandoned** | 0.0 – 0.14 | No meaningful content, no recent activity, or just a README. | - -**Override rules:** -- If `archived == true` and score > 0.5 → cap tier at "Functional" (archived = not actively shipped) -- If `fork == true` → add flag `"forked"`, reduce activity weight to 5% -- If last push > 2 years ago → add flag `"stale-2yr"`, cap tier at "WIP" regardless of score -- If repo has 0 files beyond README → force tier to "Skeleton" - ---- - -## Phase 0: Foundation (Day 1) - -**Objective:** Working CLI that fetches repos from GitHub API, clones them, and outputs raw metadata JSON. - -**Tasks:** -1. Scaffold project structure per file tree above — **Acceptance:** All directories and `__init__.py` files exist -2. Implement `github_client.py` — list repos with pagination, rate limit handling — **Acceptance:** `python -m src.cli saagpatel` prints repo names to stdout -3. Implement `cloner.py` — shallow clone to temp dir, cleanup after — **Acceptance:** Repos appear in `/tmp/audit-repos/`, are removed after script exits -4. Implement `cli.py` with argparse — **Acceptance:** `python -m src.cli --help` shows usage; `python -m src.cli saagpatel --token $GITHUB_TOKEN` runs end-to-end -5. Write `RepoMetadata` dataclass and populate from API response — **Acceptance:** `output/raw_metadata.json` contains all repos with all fields populated - -**Verification checklist:** -- [ ] `python -m src.cli saagpatel` → prints list of all public repos -- [ ] `python -m src.cli saagpatel --token $GITHUB_TOKEN` → includes private repos -- [ ] `output/raw_metadata.json` exists and is valid JSON with all repos -- [ ] No repos left in temp directory after script completes -- [ ] Rate limit handling works (check `X-RateLimit-Remaining` logged) - -**Risks:** -- GitHub stats endpoints return 202 (computing) on first call: Retry with exponential backoff (3 attempts, 2s/4s/8s) -- Rate limit hit with 100+ repos: Implement sleep-until-reset using `X-RateLimit-Reset` header - ---- - -## Phase 1: Analyzer Engine (Day 1–2) - -**Objective:** All 9 analyzer dimensions implemented, producing per-repo scores. - -**Tasks:** -1. Implement `BaseAnalyzer` abstract class with `analyze(repo_path: Path, metadata: RepoMetadata) -> AnalyzerResult` — **Acceptance:** Interface defined, type-checked -2. Implement all 9 analyzers per dimension specs above — **Acceptance:** Each returns `AnalyzerResult` with score, findings, details -3. Implement `scorer.py` — weighted aggregation + tier classification with override rules — **Acceptance:** `RepoAudit` objects have `overall_score` and `completeness_tier` populated -4. Wire analyzers into CLI pipeline: fetch → clone → analyze → score — **Acceptance:** `python -m src.cli saagpatel` produces scored results for all repos -5. Add `--verbose` flag that prints per-dimension scores per repo — **Acceptance:** Verbose output shows all 9 dimension scores per repo - -**Verification checklist:** -- [ ] Run against 3 repos of varying quality → scores feel intuitive (high for complete, low for skeletons) -- [ ] Override rules work: archived repos capped, stale repos flagged -- [ ] `--verbose` shows per-dimension breakdown -- [ ] No crashes on empty repos, repos with no code, or repos with unusual structures - -**Risks:** -- Analyzer crashes on unexpected file structures: Wrap each analyzer in try/except, return score 0.0 with finding "analysis failed: {error}" -- Large repos slow down analysis: Set max file scan limit (500 files per repo, skip binary files) - ---- - -## Phase 2: Report Generation (Day 2–3) - -**Objective:** Full JSON + Markdown reports with summary statistics, tier distribution, and per-repo breakdowns. - -**Tasks:** -1. Implement JSON report output — **Acceptance:** `output/audit-report-{username}-{date}.json` matches `AuditReport` schema exactly -2. Implement Markdown report with: - - Summary table (total repos, tier distribution, average score) - - Tier-grouped repo lists with scores and key flags - - Per-repo detail sections (expandable in Markdown viewers) - — **Acceptance:** `output/audit-report-{username}-{date}.md` renders cleanly in GitHub/VS Code preview -3. Add `--skip-forks` flag — **Acceptance:** Fork repos excluded from analysis and report when flag set -4. Add `--output-dir` flag — **Acceptance:** Reports written to specified directory -5. Add progress bar using stderr prints — **Acceptance:** Shows `[12/47] Analyzing repo-name...` during run -6. Add PCC-compatible JSON export — flat array of objects with fields matching PCC project schema (name, status, score, url, last_activity, tier, flags) — **Acceptance:** `output/pcc-import-{username}-{date}.json` is importable into PCC - -**Verification checklist:** -- [ ] JSON report validates against `AuditReport` dataclass -- [ ] Markdown report renders with proper tables and formatting -- [ ] `--skip-forks` correctly excludes forked repos -- [ ] `--output-dir /custom/path` writes reports there -- [ ] Progress output shows on stderr (not mixed with stdout) -- [ ] PCC import file has flat structure ready for dashboard import - -**Risks:** -- Markdown table formatting breaks with long repo names: Truncate names to 40 chars in tables -- JSON serialization fails on datetime objects: Use `.isoformat()` for all datetimes - ---- - -## Phase 3: Polish & Reconciliation (Day 3) - -**Objective:** Cross-reference with local project-registry.md, add summary stats, handle edge cases. - -**Tasks:** -1. Add `--registry` flag accepting path to project-registry.md — **Acceptance:** Report includes "On GitHub but not in registry" and "In registry but not on GitHub" sections -2. Registry parser: extract project names and statuses from markdown — **Acceptance:** Parses the registry format used at `~/Projects/project-registry.md` -3. Add summary statistics to report: most active repos, most neglected, highest/lowest scored, language distribution — **Acceptance:** Summary section in Markdown report has all stats -4. Handle edge cases: empty repos, repos with only a README, repos with >10k files, binary-only repos — **Acceptance:** No crashes, appropriate tier assignments -5. Write README.md for the auditor tool itself — **Acceptance:** Complete with usage, examples, output format docs - -**Verification checklist:** -- [ ] Full audit run against `saagpatel` completes without errors -- [ ] Registry reconciliation correctly identifies gaps in both directions -- [ ] Summary statistics are accurate (spot-check 3 repos manually) -- [ ] README documents all CLI flags and output formats -- [ ] Tool audits itself and scores > 0.6 - -**Risks:** -- Registry format varies: Build a lenient parser that handles common markdown table and list formats -- Too many API calls for large accounts: Cache API responses to `output/.cache/` with 1-hour TTL