test(celestia-node-fiber): docker-compose 4-val + bridge showcase scaffold#3288
Open
walldiss wants to merge 6 commits intoevstack:julien/fiberfrom
Open
test(celestia-node-fiber): docker-compose 4-val + bridge showcase scaffold#3288walldiss wants to merge 6 commits intoevstack:julien/fiberfrom
walldiss wants to merge 6 commits intoevstack:julien/fiberfrom
Conversation
Contributor
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…ffold
Adds tools/celestia-node-fiber/testing/docker/, a self-contained
docker-compose stack that brings up four celestia-app validators (each
running an in-process Fibre server), a celestia-node bridge, and a
one-shot init container that registers FSP hosts via valaddr and
funds an escrow. A Go test driver (build tag fibre_docker) connects
to the running stack and exercises the celestia-node-fiber adapter
end-to-end against real 2/3-quorum Fibre.
Why a docker showcase: the in-process testing/showcase_test.go single-
validator setup proves the adapter wires correctly but doesn't
exercise real consensus quorum, inter-validator P2P, multiple Fibre
servers contributing partial signatures, or dns:/// host registry
resolution. The 4-validator docker stack exercises all of those.
Layout:
- Dockerfile.app: celestia-appd + fibre binaries with -tags fibre,ledger
- Dockerfile.bridge: celestia-node bridge with -tags fibre
- compose.yaml: bootstrap → val0..val3 → register → bridge dependency chain
- scripts/init-genesis.sh: 4-validator genesis bootstrap
- scripts/start-validator.sh: per-validator entrypoint (appd + fibre)
- scripts/register-fsps.sh: MsgSetFibreProviderInfo (with dns:/// prefix)
+ escrow funding for the test client
- scripts/start-bridge.sh: bridge init + JWT export to shared volume
- docker_test.go: TestDockerShowcase — host-side Go driver
- README.md: operator instructions + the known-rough edges
Build tag fibre_docker keeps the test out of the default go test
runs since it requires the external docker stack to be up.
The scaffold is documented honestly: it lays out the architecture,
build args, and the iteration points (fibre CLI flag confirmation,
config.toml override robustness, healthchecks, build-cache speedups).
The point of landing it now is to unblock the next iteration step
rather than to claim flawless first-run behavior.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First execution surfaced a handful of issues the in-process showcase masked: - Dockerfile.app: build path is `./fibre/cmd`, not `./cmd/fibre`. - start-validator.sh: `fibre start --server-listen-address … --signer-grpc-address …` (matches real `cmd/fibre` flags). - start-validator.sh: replace `nc -z` with bash /dev/tcp; pass `--force-no-bbr` to celestia-appd (Linux kernel inside Docker Desktop on macOS lacks BBR); poll for first block before launching fibre so it can detect chain ID. - start-validator.sh: set `priv_validator_grpc_laddr` so fibre's signer client has something to dial. - init-genesis.sh: drop `network_min_gas_price` to 0 before collecting gentxs (gentxs carry no fee); make script idempotent via `peers.txt` flag so re-runs don't crash. - register-fsps.sh: pass `--node tcp://val0:26657` for `status` (default localhost:26657 not reachable in the register container); register host-reachable `dns:///127.0.0.1:798X` per validator so the test driver on the docker host can dial each fibre server. - start-bridge.sh: `--core.port` not `--core.grpc.port`; export `CELESTIA_CUSTOM=$NETWORK` so celestia-node accepts a private network ID; grep the JWT line out of the warning-polluted output. - compose.yaml: expose val1/val2/val3 fibre ports on host 7981/7982/7983 (val0 already exposes 7980). TestDockerShowcase now passes end-to-end: Upload → BlobID returned, Listen → BlobEvent at height N, Download → original payload bytes recovered. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refactors the in-process ev-node + Fibre test into a reusable helper
package and adds a docker-stack counterpart that drives the same flow
against the 4-validator + bridge cluster.
Two ev-node roles are wired explicitly:
- NewFiberAggregator: 200ms-block-time aggregator that signs blocks,
writes them via the Fibre DA adapter, and exposes its genesis so
full nodes can join the same chain.
- NewFiberFullNode: passive full node sharing the aggregator's genesis
and consuming blob events from the same Fibre namespace, no P2P link.
The shared driver RunEvNodeFibreTwoNodeFlow exercises:
1. observer.Listen on the header namespace before either node starts
2. start full node first (its DA retriever is listening from the
captured bridge tip when the aggregator begins posting)
3. start aggregator, inject a tx, wait for it to land in a block
4. drain ≥1 Fibre BlobEvent on observer + Download to confirm the
aggregator's submission round-tripped through Fibre
A separate observer adapter is required because celestia-node's
go-jsonrpc multiplexes blob.Subscribe over a single websocket per
module; cancelling one subscription tears the connection down. The
aggregator, full node, and observer therefore each get their own
api/client.Client.
Adapter.Head was added so tests can pin DAStartHeight to the bridge's
current local-head before either ev-node node starts. Without that,
the full node's DA retriever scans from celestia height 0, never
finds the (later-submitted) Fibre blobs, and stalls.
Docker register-fsps.sh now funds a 500B-utia escrow (was 50M) so
ev-node's high-frequency DA submission cadence doesn't drain the
test client mid-run.
Caveat documented inline: ev-node's full-node syncer creates a fresh
blob.Subscribe per Retrieve call and cancels it on each batch
boundary, which crashes the shared websocket. Until that retriever
holds one persistent Subscribe, the full node side asserts only that
construction + startup succeed, not that the entire chain is
replayed; the round-trip evidence comes from the observer adapter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reproduces the two production upload failures observed when an operator registers a Fibre provider with a host string that isn't in canonical form: fibre-client.error="rpc error: code = Unavailable desc = invalid target address http://10.0.37.242:7980, error info: address http://10.0.37.242:7980:443: too many colons in address" fibre-client.error="got invalid host 18.202.253.174:7980: parse \"18.202.253.174:7980\": first path segment in URL cannot contain colon" Root cause: x/valaddr `MsgSetFibreProviderInfo.ValidateBasic` only checks that the host is non-empty and ≤100 chars. Anything passes — including `http://...`, bare `host:port`, or arbitrary garbage. At read time the fibre client's `HostRegistry.GetHost` runs `url.Parse` on the registered host: bare `host:port` fails parsing (URL parser treats the host as a scheme and the port as a path containing `:`), while `http://host:port` parses fine and breaks downstream because `grpc.NewClient` doesn't recognise `http` as a resolver scheme and appends a default `:443`, yielding the "too many colons" error. The repro reuses the existing 4-validator + bridge docker stack: 1. Re-register every validator with a bad host (one bad form per subtest), confirming the chain accepts the registration. 2. Construct a fresh adapter so PullAll picks up the new state. 3. Attempt Upload — verifies it fails because no validator host can be dialed. 4. Restore the canonical `dns:///127.0.0.1:798X` registrations on cleanup so sibling tests on the shared stack remain runnable. Subtests cover both production failure modes: - http_scheme_prefix → `http://127.0.0.1:7980` triggers exactly "too many colons in address" - bare_host_port → `127.0.0.1:7980` triggers exactly "first path segment in URL cannot contain colon" Both subtests pass against the current chain (asserting the bug exists). Once `ValidateBasic` is tightened to require strict `host:port` form, this test's assertions need to flip to expect `setValHost` itself to fail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t matrix Renames TestFibreClient_BadHostRegistration → TestFibreClient_HostRegistrationFormats and adds a third subtest that re-registers all four validators with `dns:///host:port` and asserts Upload SUCCEEDS. Together with the two reject cases this empirically demonstrates that today's chain accepts any host string but only the `dns:///host:port` form survives end-to-end: http://host:port → "too many colons in address" host:port → "first path segment in URL cannot contain colon" dns:///host:port → upload ok The first two reproduce the operator-reported production warnings verbatim. The third is the positive control showing why talis and register-fsps.sh both prepend `dns:///` — that prefix is the only form gRPC's resolver registry recognises among URL-parseable inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chain (built from celestia-app feat/fibre-payments via the bumped Dockerfile.app default ref) now enforces strict host:port form on MsgSetFibreProviderInfo.ValidateBasic. Adjusts the docker stack accordingly: - Dockerfile.app default CELESTIA_APP_REF: main → feat/fibre-payments - Dockerfile.bridge default CELESTIA_NODE_REF: feature/fibre → feature/fibre-experimental (which carries the matching app bump) - register-fsps.sh registers plain `127.0.0.1:798X` (was `dns:///...`) - docker_test.go celestia-app v8 → v9 imports (cascades from the julien/fiber bump merged in evstack#3289) bad_host_repro_test.go matrix flips: - host_port → set-host succeeds, Upload succeeds (positive) - http_prefix → set-host fails at ValidateBasic - dns_prefix → set-host fails at ValidateBasic …with assertion against `host must be in host:port form` from the chain's response. Cleanup uses --output json so the success-code check parses reliably; sleep widened to 4s between consecutive set-host calls so the validator account's mempool nonce settles. Verified locally on docker-arm64: TestDockerShowcase PASS 5.20s TestEvNode_FiberDA_Docker PASS 8.00s TestFibreClient_HostRegistrationFormats PASS 80.01s Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dc48e5a to
5d7c17b
Compare
julienrbrt
reviewed
Apr 27, 2026
| // - Injects a transaction and waits for block production | ||
| // - Confirms the DA submitter pushed blobs to Fiber by receiving events | ||
| // on the subscription and round-tripping each through Download | ||
| func TestEvNode_FiberDA_Posting(t *testing.T) { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds `tools/celestia-node-fiber/testing/docker/`, a self-contained docker-compose stack and Go test driver that exercises the `celestia-node-fiber` adapter end-to-end against a real 4-validator + 1-bridge Fibre network.
The existing in-process `testing/showcase_test.go` (single-validator) proves the adapter wires correctly but doesn't exercise:
This stack does. The Go test driver (build-tagged `fibre_docker`) reuses the same Upload → Listen → Download flow as the in-process showcase, just pointed at the running compose endpoints.
Layout
```
tools/celestia-node-fiber/testing/docker/
├── Dockerfile.app celestia-appd + fibre binaries (-tags fibre,ledger)
├── Dockerfile.bridge celestia-node bridge (-tags fibre)
├── compose.yaml bootstrap → val0..val3 → register → bridge
├── scripts/
│ ├── init-genesis.sh 4-val genesis bootstrap
│ ├── start-validator.sh per-val entrypoint (appd + fibre server)
│ ├── register-fsps.sh MsgSetFibreProviderInfo (with dns:/// prefix)
│ │ + MsgDepositToEscrow for the test client
│ └── start-bridge.sh bridge init + JWT export to shared volume
├── docker_test.go TestDockerShowcase — host-side Go driver
└── README.md operator instructions + known rough edges
```
Run
```bash
cd tools/celestia-node-fiber/testing/docker
docker compose up -d --build
wait until `docker compose logs register` writes /shared/setup.done
cd ../..
go test -tags 'fibre fibre_docker' -count=1 -timeout 5m ./testing/docker/...
```
Override host endpoints via `FIBRE_BRIDGE_ADDR` / `FIBRE_CONSENSUS_ADDR` env vars if the default `127.0.0.1:26658` / `127.0.0.1:9090` collide.
Build args
Pin to a specific commit via `docker compose build --build-arg CELESTIA_NODE_REF=`.
Honest scope: this is a scaffold
The README lists the iteration points clearly. Each is a known rough edge that the first end-to-end run on your machine will likely surface:
These are documented up front rather than claimed-and-not-true. The value of the scaffold is in unblocking the next iteration step, not in pretending it works flawlessly.
Verified
🤖 Generated with Claude Code