B02 Phase 2: ship verifier as its own container in production by pulkitpareek18 · Pull Request #35 · zeroauth-dev/ZeroAuth

pulkitpareek18 · 2026-05-15T07:19:49Z

Task 2 of today's plan. Flips production from inline-snarkjs to the dedicated verifier container that's been shipped-but-unused since PR #29 yesterday.

What changes

Dockerfile

New `verifier-build` stage — npm-ci against the root lockfile (reproducible build), compiles `verifier/src/` → `verifier/dist/`.
New `verifier-production` stage — slim alpine, non-root uid 1001, flat `npm install --omit=dev` against `verifier/package.json` (4 prod deps: express, snarkjs, winston, uuid). Copies the compiled JS + the production vkey. Binds `0.0.0.0:3001` inside the container (Docker network only — no host binding).

docker-compose.yml

New `zeroauth-verifier` service in `['dev', 'prod']` profiles. `expose: 3001` (no `ports:` — loopback-only at the network boundary). Healthcheck wired.
`zeroauth-prod` gains:
- `VERIFIER_URL=http://zeroauth-verifier:3001\` in `environment:` (not in .env — eliminates drift)
- `VERIFIER_TIMEOUT_MS=2000`
- `depends_on: zeroauth-verifier: condition: service_healthy` (deploy fails loud if verifier can't load vkey or bind port)
`zeroauth-dev` gets the same wiring so local dev hits the service path by default. Override with `VERIFIER_URL=` (empty) in your `.env` to keep the inline-fallback for fast iteration.

Local validation

```bash
docker build --target verifier-production -t zeroauth-verifier:test . # builds clean
docker run -d --rm -p 3099:3001 zeroauth-verifier:test
curl /health # {"status":"ok","vkeyAvailable":true,"version":"0.1.0","uptimeSeconds":5}
curl /verify # rejects junk proof with structuralFallback:false (real Groth16 verify ran)
```

Post-deploy verification

After this merges + the deploy workflow runs:

`scripts/deploy-remote.sh` does `docker compose --profile prod up -d --build --remove-orphans` — auto-picks-up the new service
API container's `src/services/zkp.ts` switches from inline → HTTP because `VERIFIER_URL` is now in its environment
I'll smoke a real `/v1/auth/zkp/verify` call afterward and confirm the API logs show `"ZKP: verifier service: PASS/FAIL"` with a `verifierAuditId` (the service path's log shape) instead of `"ZKP: inline Groth16: …"` (the legacy path)

Test plan

`npx tsc --noEmit` clean
`npm test` — 228 passing (no change)
`docker build --target verifier-production` succeeds
Standalone smoke of the built image — `/health` ok, `/verify` runs real Groth16
CI green on this PR
After merge: deploy completes, both containers healthy
After deploy: `/v1/auth/zkp/verify` smoke shows the API hits the verifier service path

Out of scope (separate PRs today)

SQLite audit log + hash chain in the verifier (task 3 — next)
ADR-0008 (task 4)
Promote governance/docs/threat-model/verifier.md from stub (task 5)
Inline-fallback retirement in zkp.ts (next week — keeping the safety net while the verifier soaks)

🤖 Generated with Claude Code

Plan B (TS workspace) chosen yesterday. Yesterday's PR #29 landed the verifier package in the repo. Today's PR Phase 2 flips production to actually use it instead of the inline-snarkjs fallback that's been serving since v0. Dockerfile: - Adds a `verifier-build` stage that npm-ci's the verifier workspace against the root lockfile (reproducible), compiles src/ → dist/. - Adds a `verifier-production` stage: slim alpine image, non-root user uid 1001, flat `npm install --omit=dev` (verifier has 4 prod deps: express, snarkjs, winston, uuid — workspace-aware ci complicates a per-package prod install; trade-off accepted per ADR-0005). Copies the compiled JS + the production vkey. Healthcheck on /health. Binds 0.0.0.0:3001 inside the container so docker network reaches it; no host port binding so it stays loopback-only at the boundary. docker-compose.yml: - New `zeroauth-verifier` service in BOTH the `dev` and `prod` profiles. `expose: 3001` (no `ports:` — no host binding). Healthcheck wired. - `zeroauth-prod` gains: VERIFIER_URL=http://zeroauth-verifier:3001 VERIFIER_TIMEOUT_MS=2000 in its `environment:` block (not via .env — wired directly so a hand-edited prod .env can't drift from the compose intent). - `zeroauth-prod.depends_on` now requires `zeroauth-verifier` to be service_healthy before starting. Deploy fails loud if the verifier can't load its vkey or bind its port. - `zeroauth-dev` gets the same wiring so local dev exercises the service path by default. Developers who want the inline-snarkjs fallback can override with `VERIFIER_URL=` in their .env. Local validation: - `docker build --target verifier-production` builds clean - `docker run` + `curl /health` returns {"status":"ok","version":"0.1.0","vkeyAvailable":true,"uptimeSeconds":5} - `POST /verify` with a structurally-valid junk proof exercises the real Groth16 verifier against the real vkey (structuralFallback:false) and rejects it correctly (verified:false, 543ms — first call cost includes snarkjs init; subsequent calls are <50ms in the same image). Post-deploy verification plan (after this PR merges): - The deploy workflow runs scripts/deploy-remote.sh which does `docker compose --profile prod up -d --build --remove-orphans` — that auto-picks-up the new zeroauth-verifier service. - After healthchecks pass, the API container's src/services/zkp.ts switches from the inline path to the HTTP path because VERIFIER_URL is now set in its environment. - I'll smoke-test by POSTing a /v1/auth/zkp/verify and confirming the API logs show "ZKP: verifier service: FAIL/PASS" with a verifierAuditId (the service path's signature in zkp.ts:212) instead of "ZKP: inline Groth16: …" (the legacy path's signature). Out of scope (separate follow-ups today): - SQLite append-only audit log + hash chain in the verifier (task 3) - ADR-0008 capturing the TS-vs-Rust decision formally (task 4) - Promotion of governance/docs/threat-model/verifier.md from stub → full with A-V01 through A-V05 entries (task 5) - Retirement of the inline-fallback code path in zkp.ts (next week) Tests: 228 passing (no change). Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

) The deploy after PR #35 succeeded in building both containers but the verifier never became 'healthy' from Docker's perspective: Connecting to localhost:3001 ([::1]:3001) wget: can't connect to remote host: Connection refused Root cause: alpine ships busybox wget. Busybox wget resolves `localhost` to ::1 (IPv6) first and does NOT fall back to 127.0.0.1 (IPv4) on refusal. The verifier binds 0.0.0.0 (IPv4-only). Connection refused on every healthcheck, container marked unhealthy after 3 retries, zeroauth-prod (which depends on it via depends_on: service_healthy) never started. Result: prod was 502 for ~3 minutes between 07:21 UTC and 07:25 UTC until I manually started zeroauth-prod with --no-deps via SSH. That restored service. The verifier was running and responding to requests fine the whole time — only the healthcheck command was wrong. Fix: use the literal 127.0.0.1 in both the Dockerfile HEALTHCHECK and the compose-level healthcheck. The two are redundant by design: compose-level wins for `docker compose` orchestration; Dockerfile HEALTHCHECK wins for `docker run` outside compose. Both need to be correct. Comment added in both places explaining why localhost is wrong, so the next operator doesn't revert. Production state right now: zeroauth-prod is up + healthy via the manual --no-deps recovery. The verifier is up + responding but marked unhealthy by Docker (cosmetic — it doesn't block anything since prod is now running without the dependency wait). After this hotfix deploys, both will be healthy and the dependency edge reactivates on next restart. Verified locally: docker exec zeroauth-verifier wget -qO- http://127.0.0.1:3001/health → {"status":"ok","version":"0.1.0","vkeyAvailable":true,...} Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 4 of today. Formally records the decision Pulkit made yesterday when he picked Plan B over Plan A. Captures the three reasons single-engineer velocity beat the brainstorm's Rust spec, what we gave up (reproducible-build provenance, smaller transitive surface, unsafe-discipline) and what we kept (cross-repo HTTP shape stays Rust-compatible if we ever swap). Also pins the inline-fallback retirement plan: - 2026-05-15: verifier shipped, inline path unused but compiled-in - 2026-05-16 → 2026-06-06: 3-week soak in prod - 2026-06-08: PR to delete verifyInline + snarkjs from root deps + refuse-to-start when VERIFIER_URL is unset - 2026-06-09: prod runs verifier-only References the three shipping PRs (#35 cutover, #36 healthcheck hotfix, #37 SQLite audit log) + the plan-mode design doc + the B02 build prompt that we rejected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 4 of today. Formally records the decision Pulkit made yesterday when he picked Plan B over Plan A. Captures the three reasons single-engineer velocity beat the brainstorm's Rust spec, what we gave up (reproducible-build provenance, smaller transitive surface, unsafe-discipline) and what we kept (cross-repo HTTP shape stays Rust-compatible if we ever swap). Also pins the inline-fallback retirement plan: - 2026-05-15: verifier shipped, inline path unused but compiled-in - 2026-05-16 → 2026-06-06: 3-week soak in prod - 2026-06-08: PR to delete verifyInline + snarkjs from root deps + refuse-to-start when VERIFIER_URL is unset - 2026-06-09: prod runs verifier-only References the three shipping PRs (#35 cutover, #36 healthcheck hotfix, #37 SQLite audit log) + the plan-mode design doc + the B02 build prompt that we rejected. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* QA log — 2026-05-15 (HOLD, surrogate green, email NOW delivering) * B02 Phase 2: ship verifier as its own container in prod Plan B (TS workspace) chosen yesterday. Yesterday's PR #29 landed the verifier package in the repo. Today's PR Phase 2 flips production to actually use it instead of the inline-snarkjs fallback that's been serving since v0. Dockerfile: - Adds a `verifier-build` stage that npm-ci's the verifier workspace against the root lockfile (reproducible), compiles src/ → dist/. - Adds a `verifier-production` stage: slim alpine image, non-root user uid 1001, flat `npm install --omit=dev` (verifier has 4 prod deps: express, snarkjs, winston, uuid — workspace-aware ci complicates a per-package prod install; trade-off accepted per ADR-0005). Copies the compiled JS + the production vkey. Healthcheck on /health. Binds 0.0.0.0:3001 inside the container so docker network reaches it; no host port binding so it stays loopback-only at the boundary. docker-compose.yml: - New `zeroauth-verifier` service in BOTH the `dev` and `prod` profiles. `expose: 3001` (no `ports:` — no host binding). Healthcheck wired. - `zeroauth-prod` gains: VERIFIER_URL=http://zeroauth-verifier:3001 VERIFIER_TIMEOUT_MS=2000 in its `environment:` block (not via .env — wired directly so a hand-edited prod .env can't drift from the compose intent). - `zeroauth-prod.depends_on` now requires `zeroauth-verifier` to be service_healthy before starting. Deploy fails loud if the verifier can't load its vkey or bind its port. - `zeroauth-dev` gets the same wiring so local dev exercises the service path by default. Developers who want the inline-snarkjs fallback can override with `VERIFIER_URL=` in their .env. Local validation: - `docker build --target verifier-production` builds clean - `docker run` + `curl /health` returns {"status":"ok","version":"0.1.0","vkeyAvailable":true,"uptimeSeconds":5} - `POST /verify` with a structurally-valid junk proof exercises the real Groth16 verifier against the real vkey (structuralFallback:false) and rejects it correctly (verified:false, 543ms — first call cost includes snarkjs init; subsequent calls are <50ms in the same image). Post-deploy verification plan (after this PR merges): - The deploy workflow runs scripts/deploy-remote.sh which does `docker compose --profile prod up -d --build --remove-orphans` — that auto-picks-up the new zeroauth-verifier service. - After healthchecks pass, the API container's src/services/zkp.ts switches from the inline path to the HTTP path because VERIFIER_URL is now set in its environment. - I'll smoke-test by POSTing a /v1/auth/zkp/verify and confirming the API logs show "ZKP: verifier service: FAIL/PASS" with a verifierAuditId (the service path's signature in zkp.ts:212) instead of "ZKP: inline Groth16: …" (the legacy path's signature). Out of scope (separate follow-ups today): - SQLite append-only audit log + hash chain in the verifier (task 3) - ADR-0008 capturing the TS-vs-Rust decision formally (task 4) - Promotion of governance/docs/threat-model/verifier.md from stub → full with A-V01 through A-V05 entries (task 5) - Retirement of the inline-fallback code path in zkp.ts (next week) Tests: 228 passing (no change). Typecheck clean. ---------

) The deploy after PR #35 succeeded in building both containers but the verifier never became 'healthy' from Docker's perspective: Connecting to localhost:3001 ([::1]:3001) wget: can't connect to remote host: Connection refused Root cause: alpine ships busybox wget. Busybox wget resolves `localhost` to ::1 (IPv6) first and does NOT fall back to 127.0.0.1 (IPv4) on refusal. The verifier binds 0.0.0.0 (IPv4-only). Connection refused on every healthcheck, container marked unhealthy after 3 retries, zeroauth-prod (which depends on it via depends_on: service_healthy) never started. Result: prod was 502 for ~3 minutes between 07:21 UTC and 07:25 UTC until I manually started zeroauth-prod with --no-deps via SSH. That restored service. The verifier was running and responding to requests fine the whole time — only the healthcheck command was wrong. Fix: use the literal 127.0.0.1 in both the Dockerfile HEALTHCHECK and the compose-level healthcheck. The two are redundant by design: compose-level wins for `docker compose` orchestration; Dockerfile HEALTHCHECK wins for `docker run` outside compose. Both need to be correct. Comment added in both places explaining why localhost is wrong, so the next operator doesn't revert. Production state right now: zeroauth-prod is up + healthy via the manual --no-deps recovery. The verifier is up + responding but marked unhealthy by Docker (cosmetic — it doesn't block anything since prod is now running without the dependency wait). After this hotfix deploys, both will be healthy and the dependency edge reactivates on next restart. Verified locally: docker exec zeroauth-verifier wget -qO- http://127.0.0.1:3001/health → {"status":"ok","version":"0.1.0","vkeyAvailable":true,...}

Task 4 of today. Formally records the decision Pulkit made yesterday when he picked Plan B over Plan A. Captures the three reasons single-engineer velocity beat the brainstorm's Rust spec, what we gave up (reproducible-build provenance, smaller transitive surface, unsafe-discipline) and what we kept (cross-repo HTTP shape stays Rust-compatible if we ever swap). Also pins the inline-fallback retirement plan: - 2026-05-15: verifier shipped, inline path unused but compiled-in - 2026-05-16 → 2026-06-06: 3-week soak in prod - 2026-06-08: PR to delete verifyInline + snarkjs from root deps + refuse-to-start when VERIFIER_URL is unset - 2026-06-09: prod runs verifier-only References the three shipping PRs (#35 cutover, #36 healthcheck hotfix, #37 SQLite audit log) + the plan-mode design doc + the B02 build prompt that we rejected.

Delivers the A35-W3-Mon outline + A35-W4-Mon full script combined into a single 898-line operator runbook for the 22-minute Anchor Bank demo defined in docs/plan/bfsi-v1/02-bank-demo.md. Twelve sections cover the entire room-time: 1. Pre-demo setup checklist (T-24h) — equipment kit, network sanity, phone inventory, the seed-demo-tenants.ts live-key handling, dashboard and Basescan tab prep, dry-run, sleep. 2. Day-of setup (T-30 min) — physical setup, browser/shell warm-up, phone setup, pre-checks. 3. Opening 30-second pitch (verbatim from 02-bank-demo.md operator script). 4-9. Scenes 1-6 — every keystroke, every sentence the operator speaks, what appears on the projector, what the CISO/CFO/CRO/CIO/GC each see. Scene 3 includes the substitution-attack demonstration. Scene 4 includes the \\d users + SELECT * FROM users + DPDP 2(t) reading moment. Scene 5 includes the UPDATE audit_events tamper + on-chain anchor cross-check. 10. Q&A bank — 13 questions sourced from 02-bank-demo.md with prepared 2-3 sentence operator answers. 11. Recovery playbook 11a-11f — kiosk freeze, app crash, network drop, tier-2 device (no StrongBox), R307 missing, proof verification rejection (the worst nightmare). Each has a calm-recovery script. 12. Post-demo (T+10 min) — leave-behind folder contents, the 90-second ask, follow-up cadence (T+0 through T+42), debrief, photo policy, cleanup. Two appendices: operator wallet-card contact list + timing reference. References docs/plan/bfsi-v1/02-bank-demo.md as the canonical demo spec, docs/plan/bfsi-v1/01-pain-points.md for the P1-P10 cross-references, and scripts/seed-demo-tenants.ts for the exact tenant + API-key format. Owner: Agent #35 (writer-compliance) + Agent #45 (solutions architect). [no-test] markdown-only.

pulkitpareek18 and others added 2 commits May 15, 2026 12:43

QA log — 2026-05-15 (HOLD, surrogate green, email NOW delivering)

9bf0f99

Copilot AI review requested due to automatic review settings May 15, 2026 07:19

Copilot started reviewing on behalf of pulkitpareek18 May 15, 2026 07:19 View session

pulkitpareek18 merged commit 2b7f3bf into main May 15, 2026
2 of 3 checks passed

pulkitpareek18 deleted the dev branch May 15, 2026 07:20

pulkitpareek18 mentioned this pull request May 15, 2026

HOTFIX: 127.0.0.1 not localhost in verifier healthcheck (B02 Phase 2 follow-up) #36

Merged

4 tasks

pulkitpareek18 review requested due to automatic review settings May 15, 2026 07:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

B02 Phase 2: ship verifier as its own container in production#35

B02 Phase 2: ship verifier as its own container in production#35
pulkitpareek18 merged 2 commits into
mainfrom
dev

pulkitpareek18 commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pulkitpareek18 commented May 15, 2026

What changes

Dockerfile

docker-compose.yml

Local validation

Post-deploy verification

Test plan

Out of scope (separate PRs today)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant