Skip to content

test(celestia-node-fiber): docker-compose 4-val + bridge showcase scaffold#3288

Open
walldiss wants to merge 6 commits intoevstack:julien/fiberfrom
walldiss:feat/fibre-docker-showcase
Open

test(celestia-node-fiber): docker-compose 4-val + bridge showcase scaffold#3288
walldiss wants to merge 6 commits intoevstack:julien/fiberfrom
walldiss:feat/fibre-docker-showcase

Conversation

@walldiss
Copy link
Copy Markdown

Summary

Adds `tools/celestia-node-fiber/testing/docker/`, a self-contained docker-compose stack and Go test driver that exercises the `celestia-node-fiber` adapter end-to-end against a real 4-validator + 1-bridge Fibre network.

The existing in-process `testing/showcase_test.go` (single-validator) proves the adapter wires correctly but doesn't exercise:

  • real consensus 2/3 quorum collection,
  • inter-validator P2P,
  • multiple Fibre servers contributing partial signatures,
  • the `dns:///` host registry resolution path,
  • the bridge syncing real headers off a network it doesn't itself drive.

This stack does. The Go test driver (build-tagged `fibre_docker`) reuses the same Upload → Listen → Download flow as the in-process showcase, just pointed at the running compose endpoints.

Layout

```
tools/celestia-node-fiber/testing/docker/
├── Dockerfile.app celestia-appd + fibre binaries (-tags fibre,ledger)
├── Dockerfile.bridge celestia-node bridge (-tags fibre)
├── compose.yaml bootstrap → val0..val3 → register → bridge
├── scripts/
│ ├── init-genesis.sh 4-val genesis bootstrap
│ ├── start-validator.sh per-val entrypoint (appd + fibre server)
│ ├── register-fsps.sh MsgSetFibreProviderInfo (with dns:/// prefix)
│ │ + MsgDepositToEscrow for the test client
│ └── start-bridge.sh bridge init + JWT export to shared volume
├── docker_test.go TestDockerShowcase — host-side Go driver
└── README.md operator instructions + known rough edges
```

Run

```bash
cd tools/celestia-node-fiber/testing/docker
docker compose up -d --build

wait until `docker compose logs register` writes /shared/setup.done

cd ../..
go test -tags 'fibre fibre_docker' -count=1 -timeout 5m ./testing/docker/...
```

Override host endpoints via `FIBRE_BRIDGE_ADDR` / `FIBRE_CONSENSUS_ADDR` env vars if the default `127.0.0.1:26658` / `127.0.0.1:9090` collide.

Build args

arg Dockerfile default
`CELESTIA_APP_REF` `Dockerfile.app` `main`
`CELESTIA_NODE_REF` `Dockerfile.bridge` `feature/fibre`

Pin to a specific commit via `docker compose build --build-arg CELESTIA_NODE_REF=`.

Honest scope: this is a scaffold

The README lists the iteration points clearly. Each is a known rough edge that the first end-to-end run on your machine will likely surface:

  1. `fibre` binary CLI flags in `start-validator.sh` are illustrative — real flags from `celestia-app/cmd/fibre --help` may differ.
  2. `config.toml`/`app.toml` overrides use `sed` against expected default lines; defaults can drift.
  3. No proper healthchecks on validators (`register` polls `celestia-appd status` instead).
  4. No `make docker-test` wrapper yet — manual `docker compose up` + `go test`.
  5. No build cache for `/go/pkg/mod` / `/root/.cache/go-build` — every `--build` re-clones.

These are documented up front rather than claimed-and-not-true. The value of the scaffold is in unblocking the next iteration step, not in pretending it works flawlessly.

Verified

  • `go build -tags 'fibre fibre_docker' ./...` clean
  • `go vet -tags 'fibre fibre_docker' ./...` clean
  • Existing `TestShowcase` + `TestShowcaseResume` still pass with `-tags fibre` (no regression — new directory is build-tag isolated)
  • Full end-to-end run against the docker stack — not yet validated; iteration items above will likely need fixing on first run

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 26, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 89a639fc-4cce-48a6-8339-c85c931f92e2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

walldiss and others added 6 commits April 27, 2026 14:38
…ffold

Adds tools/celestia-node-fiber/testing/docker/, a self-contained
docker-compose stack that brings up four celestia-app validators (each
running an in-process Fibre server), a celestia-node bridge, and a
one-shot init container that registers FSP hosts via valaddr and
funds an escrow. A Go test driver (build tag fibre_docker) connects
to the running stack and exercises the celestia-node-fiber adapter
end-to-end against real 2/3-quorum Fibre.

Why a docker showcase: the in-process testing/showcase_test.go single-
validator setup proves the adapter wires correctly but doesn't
exercise real consensus quorum, inter-validator P2P, multiple Fibre
servers contributing partial signatures, or dns:/// host registry
resolution. The 4-validator docker stack exercises all of those.

Layout:
- Dockerfile.app: celestia-appd + fibre binaries with -tags fibre,ledger
- Dockerfile.bridge: celestia-node bridge with -tags fibre
- compose.yaml: bootstrap → val0..val3 → register → bridge dependency chain
- scripts/init-genesis.sh: 4-validator genesis bootstrap
- scripts/start-validator.sh: per-validator entrypoint (appd + fibre)
- scripts/register-fsps.sh: MsgSetFibreProviderInfo (with dns:/// prefix)
                             + escrow funding for the test client
- scripts/start-bridge.sh: bridge init + JWT export to shared volume
- docker_test.go: TestDockerShowcase — host-side Go driver
- README.md: operator instructions + the known-rough edges

Build tag fibre_docker keeps the test out of the default go test
runs since it requires the external docker stack to be up.

The scaffold is documented honestly: it lays out the architecture,
build args, and the iteration points (fibre CLI flag confirmation,
config.toml override robustness, healthchecks, build-cache speedups).
The point of landing it now is to unblock the next iteration step
rather than to claim flawless first-run behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First execution surfaced a handful of issues the in-process showcase
masked:

- Dockerfile.app: build path is `./fibre/cmd`, not `./cmd/fibre`.
- start-validator.sh: `fibre start --server-listen-address …
  --signer-grpc-address …` (matches real `cmd/fibre` flags).
- start-validator.sh: replace `nc -z` with bash /dev/tcp; pass
  `--force-no-bbr` to celestia-appd (Linux kernel inside Docker
  Desktop on macOS lacks BBR); poll for first block before launching
  fibre so it can detect chain ID.
- start-validator.sh: set `priv_validator_grpc_laddr` so fibre's
  signer client has something to dial.
- init-genesis.sh: drop `network_min_gas_price` to 0 before
  collecting gentxs (gentxs carry no fee); make script idempotent
  via `peers.txt` flag so re-runs don't crash.
- register-fsps.sh: pass `--node tcp://val0:26657` for `status`
  (default localhost:26657 not reachable in the register container);
  register host-reachable `dns:///127.0.0.1:798X` per validator so
  the test driver on the docker host can dial each fibre server.
- start-bridge.sh: `--core.port` not `--core.grpc.port`; export
  `CELESTIA_CUSTOM=$NETWORK` so celestia-node accepts a private
  network ID; grep the JWT line out of the warning-polluted output.
- compose.yaml: expose val1/val2/val3 fibre ports on host
  7981/7982/7983 (val0 already exposes 7980).

TestDockerShowcase now passes end-to-end:
Upload → BlobID returned, Listen → BlobEvent at height N, Download
→ original payload bytes recovered.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refactors the in-process ev-node + Fibre test into a reusable helper
package and adds a docker-stack counterpart that drives the same flow
against the 4-validator + bridge cluster.

Two ev-node roles are wired explicitly:

- NewFiberAggregator: 200ms-block-time aggregator that signs blocks,
  writes them via the Fibre DA adapter, and exposes its genesis so
  full nodes can join the same chain.
- NewFiberFullNode: passive full node sharing the aggregator's genesis
  and consuming blob events from the same Fibre namespace, no P2P link.

The shared driver RunEvNodeFibreTwoNodeFlow exercises:

  1. observer.Listen on the header namespace before either node starts
  2. start full node first (its DA retriever is listening from the
     captured bridge tip when the aggregator begins posting)
  3. start aggregator, inject a tx, wait for it to land in a block
  4. drain ≥1 Fibre BlobEvent on observer + Download to confirm the
     aggregator's submission round-tripped through Fibre

A separate observer adapter is required because celestia-node's
go-jsonrpc multiplexes blob.Subscribe over a single websocket per
module; cancelling one subscription tears the connection down. The
aggregator, full node, and observer therefore each get their own
api/client.Client.

Adapter.Head was added so tests can pin DAStartHeight to the bridge's
current local-head before either ev-node node starts. Without that,
the full node's DA retriever scans from celestia height 0, never
finds the (later-submitted) Fibre blobs, and stalls.

Docker register-fsps.sh now funds a 500B-utia escrow (was 50M) so
ev-node's high-frequency DA submission cadence doesn't drain the
test client mid-run.

Caveat documented inline: ev-node's full-node syncer creates a fresh
blob.Subscribe per Retrieve call and cancels it on each batch
boundary, which crashes the shared websocket. Until that retriever
holds one persistent Subscribe, the full node side asserts only that
construction + startup succeed, not that the entire chain is
replayed; the round-trip evidence comes from the observer adapter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reproduces the two production upload failures observed when an operator
registers a Fibre provider with a host string that isn't in
canonical form:

  fibre-client.error="rpc error: code = Unavailable desc = invalid
  target address http://10.0.37.242:7980, error info: address
  http://10.0.37.242:7980:443: too many colons in address"

  fibre-client.error="got invalid host 18.202.253.174:7980: parse
  \"18.202.253.174:7980\": first path segment in URL cannot contain
  colon"

Root cause: x/valaddr `MsgSetFibreProviderInfo.ValidateBasic` only
checks that the host is non-empty and ≤100 chars. Anything passes —
including `http://...`, bare `host:port`, or arbitrary garbage. At
read time the fibre client's `HostRegistry.GetHost` runs `url.Parse`
on the registered host: bare `host:port` fails parsing (URL parser
treats the host as a scheme and the port as a path containing `:`),
while `http://host:port` parses fine and breaks downstream because
`grpc.NewClient` doesn't recognise `http` as a resolver scheme and
appends a default `:443`, yielding the "too many colons" error.

The repro reuses the existing 4-validator + bridge docker stack:

  1. Re-register every validator with a bad host (one bad form per
     subtest), confirming the chain accepts the registration.
  2. Construct a fresh adapter so PullAll picks up the new state.
  3. Attempt Upload — verifies it fails because no validator host
     can be dialed.
  4. Restore the canonical `dns:///127.0.0.1:798X` registrations on
     cleanup so sibling tests on the shared stack remain runnable.

Subtests cover both production failure modes:
  - http_scheme_prefix → `http://127.0.0.1:7980` triggers exactly
    "too many colons in address"
  - bare_host_port → `127.0.0.1:7980` triggers exactly "first path
    segment in URL cannot contain colon"

Both subtests pass against the current chain (asserting the bug
exists). Once `ValidateBasic` is tightened to require strict
`host:port` form, this test's assertions need to flip to expect
`setValHost` itself to fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t matrix

Renames TestFibreClient_BadHostRegistration →
TestFibreClient_HostRegistrationFormats and adds a third subtest that
re-registers all four validators with `dns:///host:port` and asserts
Upload SUCCEEDS. Together with the two reject cases this empirically
demonstrates that today's chain accepts any host string but only the
`dns:///host:port` form survives end-to-end:

  http://host:port    → "too many colons in address"
  host:port           → "first path segment in URL cannot contain colon"
  dns:///host:port    → upload ok

The first two reproduce the operator-reported production warnings
verbatim. The third is the positive control showing why talis and
register-fsps.sh both prepend `dns:///` — that prefix is the only
form gRPC's resolver registry recognises among URL-parseable inputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chain (built from celestia-app feat/fibre-payments via the bumped
Dockerfile.app default ref) now enforces strict host:port form on
MsgSetFibreProviderInfo.ValidateBasic. Adjusts the docker stack
accordingly:

- Dockerfile.app default CELESTIA_APP_REF: main → feat/fibre-payments
- Dockerfile.bridge default CELESTIA_NODE_REF: feature/fibre →
  feature/fibre-experimental (which carries the matching app bump)
- register-fsps.sh registers plain `127.0.0.1:798X` (was `dns:///...`)
- docker_test.go celestia-app v8 → v9 imports (cascades from the
  julien/fiber bump merged in evstack#3289)

bad_host_repro_test.go matrix flips:
  - host_port    → set-host succeeds, Upload succeeds (positive)
  - http_prefix  → set-host fails at ValidateBasic
  - dns_prefix   → set-host fails at ValidateBasic
…with assertion against `host must be in host:port form` from the
chain's response. Cleanup uses --output json so the success-code
check parses reliably; sleep widened to 4s between consecutive
set-host calls so the validator account's mempool nonce settles.

Verified locally on docker-arm64:

  TestDockerShowcase                       PASS  5.20s
  TestEvNode_FiberDA_Docker                PASS  8.00s
  TestFibreClient_HostRegistrationFormats  PASS  80.01s

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@walldiss walldiss force-pushed the feat/fibre-docker-showcase branch from dc48e5a to 5d7c17b Compare April 27, 2026 12:55
// - Injects a transaction and waits for block production
// - Confirms the DA submitter pushed blobs to Fiber by receiving events
// on the subscription and round-tripping each through Download
func TestEvNode_FiberDA_Posting(t *testing.T) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this deleted?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants