From d0bc6244f70066998acbf087202c4ecd0e842ad8 Mon Sep 17 00:00:00 2001 From: Alfonso Domenech Date: Tue, 21 Apr 2026 18:56:15 +0200 Subject: [PATCH] fix(ops-suite): enforce adapter pattern, fix queue-status auto-chain, add CLAUDE.md - db-query/db-migrate: replace hardcoded kubectl port-forward with reference to port-forward adapter, so non-Kubernetes orchestrators work correctly - port-forward/adapters/kubernetes.md: add separate section for pod env var retrieval (pod_env pattern) vs Kubernetes Secret retrieval - queue-status: change DLQ auto-chain to queue-triage into a suggest (Next steps), preventing unintended Sonnet invocations on a simple status check - queue-triage: remove NestJS-specific patterns from Step 6, replace with generic grep approach + reference to new consumer-patterns.md loaded on-demand - queue-triage/references/consumer-patterns.md: new file with framework-specific consumer search patterns (NestJS, Spring AMQP, Celery) - workflow-deploy: fix cross-skill adapter path from relative ../deploy/ to ${CLAUDE_PLUGIN_ROOT}/skills/deploy/ to avoid path fragility - CLAUDE.md: add project-level guidance for future Claude Code instances Co-Authored-By: Claude Sonnet 4.6 --- CLAUDE.md | 96 +++++++++++++++++++ plugins/ops-suite/skills/db-migrate/SKILL.md | 17 ++-- plugins/ops-suite/skills/db-query/SKILL.md | 10 +- .../port-forward/adapters/kubernetes.md | 12 ++- .../ops-suite/skills/queue-status/SKILL.md | 10 +- .../ops-suite/skills/queue-triage/SKILL.md | 20 +++- .../references/consumer-patterns.md | 93 ++++++++++++++++++ .../ops-suite/skills/workflow-deploy/SKILL.md | 2 +- 8 files changed, 234 insertions(+), 26 deletions(-) create mode 100644 CLAUDE.md create mode 100644 plugins/ops-suite/skills/queue-triage/references/consumer-patterns.md diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..6436779 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,96 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## What this repo is + +Pure Markdown + YAML skill collection for Claude Code. No build, no tests, no linting — validation is manual (read and review skill files). The only commands you'll run are `git` operations. + +## Architecture: Config + Adapters + +Every skill follows the same pattern: + +``` +config.yaml → SKILL.md → adapters/.md +``` + +1. The user defines their stack once in `config.yaml` (`orchestrator: kubernetes`, `message_broker: rabbitmq`, etc.) +2. Each `SKILL.md` reads config and loads the right adapter at runtime +3. Adapters contain actual CLI commands with `{config.X.Y}` placeholders + +**To add support for a new technology**: create `adapters/.md` inside the skill folder — no changes to `SKILL.md` or config schema needed. + +## Plugin layout + +``` +plugins/ + ops-suite/ # Infrastructure: status, logs, deploy, DB, queues + skills// + SKILL.md # Frontmatter + step-by-step instructions + adapters/ # One file per supported technology + references/ # Deep docs loaded on-demand (keep SKILL.md < 500 lines) + commands/ # Command shims (.md) that invoke each skill + hooks/ # hooks.json + session-start.sh + runtime/ # Shared docs: chaining.md, safety.md, session-state.md + config.example.yaml + config.yaml # User's actual config (gitignored) + refinery/ # Roadmap refinement: tickets, design, docs, sprints + creating-skills/ # Meta-skill for authoring new skills +.claude-plugin/ + marketplace.json # Plugin registry (name, version, source paths) +``` + +## SKILL.md frontmatter + +Key fields when creating or editing skills: + +| Field | Notes | +|-------|-------| +| `name` | Kebab-case, becomes the slash-command | +| `description` | **Most critical** — drives auto-invocation. Start with "Use when…" + natural-language triggers. Max 1024 chars. | +| `disable-model-invocation: true` | Required for all destructive ops (deploy, migrate, reprocess). Prevents auto-chaining. | +| `allowed-tools` | Restrict what the skill can call | +| `model` | Override per-skill (e.g. `haiku` for cheap read-only checks) | +| `argument-hint` | Autocomplete hint shown to user | + +## Session state (ops-suite) + +Skills share state through `/tmp/ops-suite-session/`: + +- `config.json` — parsed `config.yaml`, written by the `session-start.sh` hook and cached for the session +- `env.json` — selected environment (planned) +- `credentials.json` — DB/broker credentials (planned) +- `port-forwards.json` — active port-forward PIDs (planned) + +Step 0 in every skill checks for `/tmp/ops-suite-session/config.json` before re-parsing config. + +## Skill chaining rules + +**Read-only skills** (no `disable-model-invocation`) can be auto-invoked mid-step: +``` +Use ops-suite:service-status with arguments: {service} {env_name}. +Use session state from /tmp/ops-suite-session/ — do not re-ask for environment. +``` + +**Destructive skills** (`disable-model-invocation: true`) must be suggested, never auto-invoked: +``` +Next steps: + → Run `/ops-suite:db-migrate {env_name}` to apply pending migrations. +``` + +Chain depth is capped at 3. + +## Safety classification + +| Skill | Type | +|-------|------| +| service-status, service-logs, db-query, queue-status, port-forward | read-only — auto-chainable | +| deploy, db-migrate, queue-reprocess | destructive — suggest only, always ask for explicit confirmation | + +## Adding a new skill + +1. Run `/creating-skills:creating-skills` — it guides the full authoring process +2. Create `plugins//skills//SKILL.md` with the standard frontmatter +3. Add adapters under `adapters/` for each supported technology +4. Add a command shim at `plugins//commands/.md` +5. Test: trigger test (natural language), functional test (`/skill-name`), performance test (SKILL.md < 500 lines) diff --git a/plugins/ops-suite/skills/db-migrate/SKILL.md b/plugins/ops-suite/skills/db-migrate/SKILL.md index d00936f..e9cfe55 100644 --- a/plugins/ops-suite/skills/db-migrate/SKILL.md +++ b/plugins/ops-suite/skills/db-migrate/SKILL.md @@ -51,18 +51,15 @@ Show the user: The migration tool needs a database connection. This typically requires: 1. **Port-forward** to the database using the orchestrator: - ``` - kubectl --context={env.context} port-forward svc/{env.services.database.name} {deploy.local_ports.{env_name}}:{env.services.database.port} -n {env.services.database.namespace || env.namespaces.infra} & - ``` - Note: Use `env.services.database.namespace` if defined — the database/pgbouncer may live in a - different namespace than `env.namespaces.infra`. + Load `${CLAUDE_PLUGIN_ROOT}/skills/port-forward/adapters/{orchestrator}.md` and use its + "Port-forward a service (background)" command for `{env.services.database.name}`, + local port `{deploy.local_ports.{env_name}}`, remote port `{env.services.database.port}`. + Use `{env.services.database.namespace}` if defined, otherwise `{env.namespaces.infra}`. 2. **Credentials**: First check if `env.services.database.credentials_from` is defined in config: - - If `pod_env:`: retrieve from a running app pod: - ``` - kubectl --context={env.context} exec {any_app_pod} -n {env.namespaces.apps} -- printenv - ``` - - Otherwise, use the adapter's "retrieve secret" command or ask the user. + - If `pod_env:`: load `${CLAUDE_PLUGIN_ROOT}/skills/port-forward/adapters/{orchestrator}.md` + and use its "retrieve secret" command to read the variable from a running app pod. + - Otherwise, use the migration adapter's credential retrieval command or ask the user. Never hardcode or display credentials in plain text. 3. **Set environment variables** as required by the migration tool (from adapter). diff --git a/plugins/ops-suite/skills/db-query/SKILL.md b/plugins/ops-suite/skills/db-query/SKILL.md index 7535b52..077724e 100644 --- a/plugins/ops-suite/skills/db-query/SKILL.md +++ b/plugins/ops-suite/skills/db-query/SKILL.md @@ -38,11 +38,11 @@ If `$ARGUMENTS` contains an environment name, use it. Otherwise ask the user. Check if a port-forward is already active on the expected local port (`deploy.local_ports.{env_name}`). If not active: -1. Start a port-forward using the orchestrator: - ``` - kubectl --context={env.context} port-forward svc/{env.services.database.name} {deploy.local_ports.{env_name}}:{env.services.database.port} -n {env.services.database.namespace || env.namespaces.infra} & - ``` -2. Verify the connection is working +1. Load `${CLAUDE_PLUGIN_ROOT}/skills/port-forward/adapters/{orchestrator}.md` and use its + "Port-forward a service (background)" command for `{env.services.database.name}`, + local port `{deploy.local_ports.{env_name}}`, remote port `{env.services.database.port}`, + namespace `{env.services.database.namespace}` if defined, otherwise `{env.namespaces.infra}`. +2. Verify the connection using the adapter's connection check command. Retrieve or ask for credentials. Never hardcode credentials. diff --git a/plugins/ops-suite/skills/port-forward/adapters/kubernetes.md b/plugins/ops-suite/skills/port-forward/adapters/kubernetes.md index ecfabdd..a9680a6 100644 --- a/plugins/ops-suite/skills/port-forward/adapters/kubernetes.md +++ b/plugins/ops-suite/skills/port-forward/adapters/kubernetes.md @@ -56,12 +56,22 @@ nc -z localhost {local_port} 2>/dev/null && echo "Connection OK" || echo "Connec curl -s -o /dev/null -w '%{http_code}' http://localhost:{local_port}/health ``` -## Retrieve secret (generic pattern) +## Retrieve secret (from Kubernetes Secret) ```bash kubectl --context={env.context} get secret {secret_name} -n {namespace} -o jsonpath='{.data.{key}}' | base64 -d ``` +## Retrieve environment variable from a running pod (pod_env pattern) + +Use when `credentials_from: pod_env:` is set in config — reads the variable from a +running app pod instead of a Kubernetes Secret: + +```bash +kubectl --context={env.context} get pods -n {env.namespaces.apps} -l app={service} -o name | head -1 +kubectl --context={env.context} exec {pod} -n {env.namespaces.apps} -- printenv {VAR_NAME} +``` + ## List secrets in namespace ```bash diff --git a/plugins/ops-suite/skills/queue-status/SKILL.md b/plugins/ops-suite/skills/queue-status/SKILL.md index 9cc63c2..3805fdd 100644 --- a/plugins/ops-suite/skills/queue-status/SKILL.md +++ b/plugins/ops-suite/skills/queue-status/SKILL.md @@ -55,7 +55,7 @@ Flag the following conditions: | Condition | Severity | Action | |-----------|----------|--------| -| DLQ with messages > 0 | Warning | Auto-chain to `queue-triage` (see below) | +| DLQ with messages > 0 | Warning | Suggest queue-triage (see Step 6) | | Queue with 0 consumers | Warning | Check if consumer service is running | | Queue with growing message count | Warning | Consumer may be slow or stuck | | Queue in "down" or "stopped" state | Critical | Investigate immediately | @@ -83,7 +83,9 @@ Summary: Queues with 0 consumers: {count} ``` -If any DLQs have messages, automatically triage the first one: +If any DLQs have messages, suggest next steps: -Use ops-suite:queue-triage with arguments: {dlq_name} {env_name}. -Use session state from /tmp/ops-suite-session/ — do not re-ask for environment. +``` +Next steps: + → Run `/ops-suite:queue-triage {dlq_name} {env_name}` to diagnose why messages are failing. +``` diff --git a/plugins/ops-suite/skills/queue-triage/SKILL.md b/plugins/ops-suite/skills/queue-triage/SKILL.md index 3f73b7d..8e97845 100644 --- a/plugins/ops-suite/skills/queue-triage/SKILL.md +++ b/plugins/ops-suite/skills/queue-triage/SKILL.md @@ -98,11 +98,21 @@ python3 scripts/analyze_messages.py {messages_file} If the codebase is available: -1. **Find the subscription config**: Search for the queue name in config files (e.g. `grep -r "queue_name" src/config/`). Identify the subscription name mapped to this queue. -2. **Verify subscribe() call exists**: Search for `subscribe('{subscription_name}'` in the codebase. Compare all subscriptions declared in config vs actual `subscribe()` calls — any mismatch means orphaned config. -3. **Find the consumer handler**: Locate the subscriber class (typically in `application/amqp/`) and read its `onApplicationBootstrap()` method to see which subscriptions are actually registered. -4. **Check error handling patterns**: Look for try/catch, reject, or nack logic in the handler. -5. **Check if the error matches a known code path**: Cross-reference with the failure mode from Step 5. +1. **Find the subscription config**: Search for the queue name in config and source files: + ```bash + grep -r "{queue_name}" src/ --include="*.ts" --include="*.yaml" --include="*.json" -l + ``` + Identify the subscription name or handler mapped to this queue. +2. **Locate the consumer handler**: Search for the handler that processes this subscription: + ```bash + grep -r "{subscription_name}" src/ --include="*.ts" -l + ``` +3. **Check error handling**: Look for try/catch, reject, or nack logic in the handler. +4. **Verify the handler is registered**: Check that the subscription is actually active at runtime (not only declared in config). Any mismatch between config and actual subscriptions means orphaned config. +5. **Cross-reference with the failure mode from Step 5**. + +If the codebase uses a specific framework, load `references/consumer-patterns.md` for +framework-specific search patterns (NestJS + RabbitMQ, Spring AMQP, Celery). ## Step 6b — Check git history for removed handlers diff --git a/plugins/ops-suite/skills/queue-triage/references/consumer-patterns.md b/plugins/ops-suite/skills/queue-triage/references/consumer-patterns.md new file mode 100644 index 0000000..669fa18 --- /dev/null +++ b/plugins/ops-suite/skills/queue-triage/references/consumer-patterns.md @@ -0,0 +1,93 @@ +# Consumer Patterns — Framework-Specific Search Guide + +Load this reference in queue-triage Step 6 when you know the consumer framework. + +--- + +## NestJS + @golevelup/nestjs-rabbitmq + +### Find subscription config + +Subscriptions are typically declared in a module config file: + +```bash +grep -r "subscriptions" src/config/ --include="*.ts" -A 20 +grep -r "RabbitMQModule.forRoot" src/ --include="*.ts" -A 30 +``` + +Look for entries like: +```typescript +subscriptions: { + mySubscriptionName: { routingKey: 'some.routing.key', queue: 'queue_name' } +} +``` + +### Verify subscribe() call exists + +```bash +grep -r "subscribe('" src/ --include="*.ts" +``` + +Compare all subscription names in config vs all `subscribe('...')` calls. Any name in config +without a matching `subscribe()` call = orphaned config (messages go to DLQ with no consumer). + +### Find the subscriber class + +Consumer classes typically live in `src/**/application/amqp/` or `src/**/infrastructure/amqp/`: + +```bash +find src/ -path "*/amqp/*.ts" -o -path "*/subscribers/*.ts" | head -20 +``` + +### Check registered subscriptions at bootstrap + +Look for `onApplicationBootstrap()` in subscriber classes — this is where `subscribe()` calls are made: + +```bash +grep -r "onApplicationBootstrap" src/ --include="*.ts" -l +``` + +Read the method body to see which subscriptions are actually registered at runtime. + +### Common failure modes in NestJS + +| Pattern | What to search for | Root cause | +|---------|-------------------|------------| +| `subscribe()` call missing | `grep -r "subscribe("` returns no match for that name | Handler was removed or renamed | +| Subscription in config, no handler | Config has entry, no `onApplicationBootstrap` with it | Orphaned config | +| Handler throws uncaught error | `grep -r "nack\|reject" src/` — handler may not handle errors | Missing try/catch in handler | +| DTO validation fails | Look for `class-validator` decorators in the DTO | Producer changed payload shape | + +--- + +## Spring AMQP (Java/Kotlin) + +### Find the listener + +```bash +grep -r "@RabbitListener" src/ --include="*.java" --include="*.kt" -l +grep -r "queues = " src/ --include="*.java" --include="*.kt" +``` + +### Check the binding + +```bash +grep -r "@Bean" src/ --include="*.java" --include="*.kt" -A 5 | grep -A 5 "Queue\|Binding" +``` + +--- + +## Celery (Python) + +### Find the task handler + +```bash +grep -r "@app.task\|@celery.task\|@shared_task" . --include="*.py" -l +grep -r "queue=" . --include="*.py" | grep "{queue_name}" +``` + +### Check task routing + +```bash +grep -r "task_routes\|CELERY_ROUTES" . --include="*.py" +``` diff --git a/plugins/ops-suite/skills/workflow-deploy/SKILL.md b/plugins/ops-suite/skills/workflow-deploy/SKILL.md index 4d3f29b..f54598a 100644 --- a/plugins/ops-suite/skills/workflow-deploy/SKILL.md +++ b/plugins/ops-suite/skills/workflow-deploy/SKILL.md @@ -32,7 +32,7 @@ Store answers as: `{ref}`, `{env_name}`, `{migrations}`, `{rollback}`. ## Phase B — Pre-flight (read-only, no confirmation needed) -Load the CI adapter file at `../deploy/adapters/{deploy.ci_provider}.md` and extract: +Load the CI adapter file at `${CLAUDE_PLUGIN_ROOT}/skills/deploy/adapters/{deploy.ci_provider}.md` and extract: - The commands to verify and trigger a deployment - The rollback command (`{ci_rollback_command}`) — used if a rollback plan is requested