Skip to content
View Quigleybits's full-sized avatar

Block or report Quigleybits

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Quigleybits/README.md

Aidan Quigley

Agentic infrastructure engineer & toolsmith · London

I build the systems, standards, and tooling that make AI agents reliable in real engineering work — role-chained pipelines, a CLI conformance standard for MCP (mclip.dev), and a production RAG brain I run daily. The method is the moat: spec-first, with decision logs, adversarial review gates, and documented dead-ends in the open. Not chat demos — a dozen live apps across web, mobile, and games, with the reasoning shown.

Generation is cheap now; judgment isn't. This profile leads with the problems I chose, the options I rejected, and the risk I removed — how the work was done — and links to the record.


Currently

Building agent_os — a personal agentic operating system, run as a private research workshop: an 8-role agent team (planner, paired researchers, coder, tester, always-on critic, doc-maintainer) deployed in my daily harness; a loop runtime with ~20 specialist roles on file but only 2 resident; and a 6-level taxonomy for memory that outlives sessions. The bet underneath: the harness is the moat, not the model — the same model's task success can double across harnesses, so the scaffold is where the engineering lives.

Working through:

  • Self-evolving scaffold — the harness should rewrite itself: run history and knowledge captures feed back into skills, agent definitions, and rules as versioned, measured artifacts.
  • Verification is the bottleneck — generation has outrun checking; autonomous loops compound only through a three-part ratchet (safety veto, minimum-improvement delta, re-baselining on the same evals).
  • Agentic pipelines, not faster typing — agents owning end-to-end handoffs (spec → build → adversarial review → merge), with human review throughput as the binding constraint.

Selected work — judgment, not just artifacts

Chosen for how the work was done. Each entry names the non-obvious call and links to the record.

  • MCLIP — a CLI conformance standard for MCP. The MCP→CLI space was already crowded with non-portable wrappers, so rather than ship a ninth I standardised the translation: a normative spec with tagged rules, backed by 9 executable fixture servers and a Go verify harness that asserts response shape and exit codes. Deliberately scoped to mechanics-not-semantics — MCP doesn't standardise tool names, so the spec refuses to promise what it can't hold — and fronted by an agent-readable door (llms.txt + a machine-readable profile manifest), so the standard is consumable by the agents it governs.
  • 2nd_brain — a production personal RAG. Wrote an explicit trade-off hierarchy (capture-durability › always-on › simplicity …) as the tie-breaker for every design call — "no graph DB; relational tables on the free tier are enough." Shipped self-monitoring as discrete probes, each behind an adversarial-review pass, and captured failure modes as reusable rules instead of silently patching them.
  • Hymn_core — line-by-line hymn → scripture retrieval. Refused to let ranking scores pick the answer: five overlapping sources feed ~300 candidates, but an LLM selects on theological merit — and I kept the weak retrievers for coverage after the data showed they supply 81% of candidates yet 0.8% of final picks. A 27-entry decision log records the measured deltas that overturned intuition, and a local-embedding bake-off shipped only after passing a statistical significance gate — behind a --retriever flag, not a blind swap.
  • cctts — a Claude Code TTS plugin. Deleted an entire storage layer once it proved unused (three rejected alternatives logged), and self-audited the shipped README against reality — logging the drift rather than hiding it.
  • claude-skills — a published agent-skill suite. Skill routing is measured, not asserted: an LLM trigger-eval harness scores each skill against TRIGGER/IGNORE cases, and nothing publishes until it survives a test on a real codebase that isn't its own. Built spec-first, with review gates that caught 10 design issues before any code was written.
  • autoresearch-vision — porting an autonomous research loop to a new domain (private). Adapted Karpathy's autoresearch — an autonomous, single-GPU experiment loop built for LLM pretraining — to a different class of models and a different need: self-driving computer-vision research for clinical organoid / microscopy imaging. The signal is in the translation, not the fork — bits-per-byte → MAE / Dice, BPE tokeniser → image preprocessing, GPT blocks → EfficientNet + task heads — plus one deeper rethink: Karpathy treats the agent as a flat optimiser, so I made the loop stratified, steering it to invent architectures and mine domain literature (the part hyperparameter sweeps can't automate) over grid-searching, with a learnings ledger so dead ends aren't re-tried. The extra structure is logged in-repo as an unproven, empirically-testable bet. Private; walkthrough on request.

How I work

Distilled from agent_os — my private research workshop on agent teams, loops, and memory — and refreshed as the system evolves. Walkthrough on request.

  • An agent team, not a chat window. Eight roles — orchestrator, planner, paired researchers, coder, tester, always-on critic, doc-maintainer — run in my daily harness. "Done" is gated by the tester and challenged by the critic, never declared by the maker.
  • Catalog ≠ payroll. ~20 specialist roles on file, 2 resident; specialists spawn fresh per job on the cheapest model that clears the bar, write their output to disk, and die. Cost scales with work, not headcount.
  • Maker / verifier split. The agent that builds never judges its own work — verification runs in a separate context, often on a different model.
  • Ratchets before autonomy. Self-improving loops pass a three-part gate — safety veto, minimum-improvement delta, re-baselining on the same eval subset — so compounding only points upward.
  • Artifacts, not prose. Handoffs are file paths and traces, never summaries; a summary without its source is context pollution.
  • Evals before automation. Skill routing is measured against trigger/ignore cases before anything runs unattended; skill files are trainable parameters, versioned and scored.
  • A constitution agents can't edit. Principles, kill conditions, and decision logs are human-owned; loops read them, propose changes, and never write them.

Tech I reach for

AI / LLM Claude (Fable / Opus / Sonnet / Haiku), Codex CLI, MCP servers (TypeScript + Go), Voyage + Qwen3 (local) embeddings, FAISS + BM25 hybrid retrieval, PyTorch, Whisper / Voxtral
Frontend Next.js (App Router), SvelteKit, Astro, React Native + Expo, Phaser 3, Tailwind + shadcn
Backend Supabase (Postgres, RLS, Realtime, Edge Functions), Firebase (Firestore, Auth, Cloud Functions, App Check), Neon + Drizzle, FastAPI / Flask
Data & pipelines Python, pgvector, D3, yt-dlp, faster-whisper, Tavily, Firecrawl, GitHub Actions cron
Infra Vercel, Firebase Hosting, Cloud Run, Docker, Playwright


Open to AI / applied-AI, agent & developer-tooling, and AI-infrastructure roles. Live: mclip.dev · 2ndbrain.website · hymncore.net · kanban.website · scosig.com · portfolio

Pinned Loading

  1. cctts cctts Public

    Claude Code Text-To-Speech for Windows - speaks assistant responses via edge-tts. macOS/Linux paths experimental.

    Python

  2. custom_claude_skills custom_claude_skills Public

    Custom skills (slash commands) for Claude Code

    JavaScript 2

  3. mclip mclip Public

    MCLIP — MCP Command-Line Interface Profile. A CLI conformance profile over MCP that defines one canonical CLI surface for every MCP server.

    Go

  4. Quigleybits Quigleybits Public

    Config files for my GitHub profile.