From e836a8ec605219b664c43909aa06c9f2bdc9672e Mon Sep 17 00:00:00 2001
From: Pulkit Pareek <pulkitpareek18@gmail.com>
Date: Fri, 15 May 2026 12:55:53 +0530
Subject: [PATCH] B02 Phase 2 hotfix: 127.0.0.1 not localhost in verifier
 healthcheck
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The deploy after PR #35 succeeded in building both containers but the
verifier never became 'healthy' from Docker's perspective:

  Connecting to localhost:3001 ([::1]:3001)
  wget: can't connect to remote host: Connection refused

Root cause: alpine ships busybox wget. Busybox wget resolves `localhost`
to ::1 (IPv6) first and does NOT fall back to 127.0.0.1 (IPv4) on
refusal. The verifier binds 0.0.0.0 (IPv4-only). Connection refused
on every healthcheck, container marked unhealthy after 3 retries,
zeroauth-prod (which depends on it via depends_on: service_healthy)
never started.

Result: prod was 502 for ~3 minutes between 07:21 UTC and 07:25 UTC
until I manually started zeroauth-prod with --no-deps via SSH. That
restored service. The verifier was running and responding to requests
fine the whole time — only the healthcheck command was wrong.

Fix: use the literal 127.0.0.1 in both the Dockerfile HEALTHCHECK
and the compose-level healthcheck. The two are redundant by design:
compose-level wins for `docker compose` orchestration; Dockerfile
HEALTHCHECK wins for `docker run` outside compose. Both need to be
correct.

Comment added in both places explaining why localhost is wrong, so
the next operator doesn't revert.

Production state right now: zeroauth-prod is up + healthy via the
manual --no-deps recovery. The verifier is up + responding but
marked unhealthy by Docker (cosmetic — it doesn't block anything
since prod is now running without the dependency wait). After this
hotfix deploys, both will be healthy and the dependency edge
reactivates on next restart.

Verified locally:
  docker exec zeroauth-verifier wget -qO- http://127.0.0.1:3001/health
  → {"status":"ok","version":"0.1.0","vkeyAvailable":true,...}

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 Dockerfile         | 6 +++++-
 docker-compose.yml | 5 ++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/Dockerfile b/Dockerfile
index 72d8a79..1911a43 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -107,8 +107,12 @@ ENV VERIFIER_PORT=3001
 
 EXPOSE 3001
 
+# NOTE: 127.0.0.1 not localhost. Alpine's busybox wget resolves
+# `localhost` to IPv6 (::1) first; the verifier binds IPv4 0.0.0.0,
+# so the IPv6 connection is refused and busybox bails without falling
+# back to IPv4. Using the literal IPv4 address sidesteps the resolver.
 HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
-  CMD wget --no-verbose --tries=1 --spider http://localhost:3001/health || exit 1
+  CMD wget --no-verbose --tries=1 --spider http://127.0.0.1:3001/health || exit 1
 
 CMD ["node", "dist/server.js"]
 
diff --git a/docker-compose.yml b/docker-compose.yml
index f8cdac4..5e13b59 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -50,8 +50,11 @@ services:
       - VERIFIER_CIRCUIT_VERSION=v1
       - LOG_LEVEL=info
     restart: unless-stopped
+    # 127.0.0.1 (not localhost) because alpine busybox wget hits IPv6 first
+    # and the verifier binds 0.0.0.0 (IPv4 only). The Dockerfile carries
+    # the same fix; this is belt-and-braces for compose-level overrides.
     healthcheck:
-      test: ['CMD', 'wget', '--no-verbose', '--tries=1', '--spider', 'http://localhost:3001/health']
+      test: ['CMD', 'wget', '--no-verbose', '--tries=1', '--spider', 'http://127.0.0.1:3001/health']
       interval: 30s
       timeout: 10s
       retries: 3