Skip to content

feat(tools/talis): vendor talis deployment tool + Fibre experiment runner#3301

Merged
julienrbrt merged 1 commit intoevstack:julien/fiberfrom
walldiss:feat/talis-deploy
Apr 29, 2026
Merged

feat(tools/talis): vendor talis deployment tool + Fibre experiment runner#3301
julienrbrt merged 1 commit intoevstack:julien/fiberfrom
walldiss:feat/talis-deploy

Conversation

@walldiss
Copy link
Copy Markdown

Summary

Brings the celestia-app talis multi-cloud deploy tool into ev-node, plus the wiring needed to deploy a working Fibre DA aggregator end-to-end on top of it. Verified via a fresh AWS run from this branch — talis up → genesis → deploy → setup-fibre → start-fibre → fibre-bootstrap-evnode reaches 24.57 MB/s @ 99.7 % ok rate on a 60 s sustained loadgen (3 × c6in.4xlarge validators + c6in.2xlarge bridge + c6in.8xlarge ev-node + c6in.2xlarge load-gen, all us-east-1).

What's added

  • tools/talis/ — vendored from celestia-app's feat/fibre-payments. Provisions AWS / DO / GCP boxes for one or more validators + bridge + ev-node + load-gen, deploys binaries + init scripts, drives a Fibre setup-fibre + start-fibre flow, and ships an fibre-bootstrap-evnode step that scp's the bridge JWT and Fibre payment keyring onto each ev-node before its init script starts the daemon.
  • tools/celestia-node-fiber/cmd/evnode-fibre/ — long-lived aggregator runner that wires block.NewFiberDAClient on top of the celestia-node-fiber adapter. Compiled by talis genesis and shipped to evnode-* hosts.
  • tools/talis/cmd/evnode-txsim/ — small Go load-gen that pumps the runner's HTTP /tx ingress for a fixed duration; deployed to load-gen boxes and prints a single TXSIM: line on completion.
  • tools/talis/Makefile — cross-compiles celestia-appd, the fibre server + load tool, the bridge/light celestia binary, and both runner binaries to linux/amd64 for talis genesis -b.

ev-node-side foundation work that the runner needs

These changes were what turned a "wired but slow" Fibre DA path into something that actually overlaps uploads and survives sustained load:

  • pkg/config/config.goApplyFiberDefaults() profile (adaptive batching, 1 s DA.BlockTime, 50-deep pending-cache window).
  • block/public.goSetMaxBlobSize to lift the 5 MiB Celestia default to Fibre's 120 MiB headroom; NewFiberDAClient + Fibre type re-exports.
  • block/internal/da/{fiber_client.go, fiber/types.go, fibremock/} — Fibre adapter wired through the DA client interface, with a matching mock used by the testing tree.
  • block/internal/submitting/da_submitter.go — per-stream upload workers, splitByBlobSize chunking, parallel signing pool, and an oversized-blob safety net that advances the cache instead of looping forever (which OOM'd the daemon under sustained Fibre stalls in the AWS run).
  • core/sequencer/sequencing.go + pkg/sequencers/solo/sequencer.goErrQueueFull sentinel + bounded mempool queue so the reaper backs off when the executor is paused on the pending-cache cap, instead of feeding an unbounded queue and getting one giant block on resume.

setup-fibre fixes uncovered during the verified run

  • The bash script for set-host now retries until the validator's host appears in query valaddr providers. The previous one-shot call relied on --yes returning the txhash before block inclusion; if the chain wasn't ready yet, the tx silently bounced and the validator never registered. The Fibre client cached the partial set on startup and uploads cascaded to host not foundvoting power: collected 0.
  • The talis CLI side polls query valaddr providers after the per-validator scripts finish and refuses to return until all validators are registered (5-minute deadline, then errors so the operator can re-run).

External dependency

Documented in tools/talis/fibre.md: a sibling clone of celestia-app on a branch with feat/fibre-payments + the sysrex/fibre_url_fix cherry-pick. Without the URL-parse fix the Fibre client rejects every host:port registration with first path segment in URL cannot contain colon and voting power: collected 0 cascades. (This is a celestia-app fix that needs to land separately.)

Test plan

  • go test ./block/... ./pkg/... ./types/... — all green
  • go build ./... — clean
  • make build-bins — all 6 binaries cross-compile cleanly to linux/amd64
  • End-to-end AWS run from this branch — 24.57 MB/s, 99.7 % ok on a 60 s sustained loadgen

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 00427446-eabd-4764-ac5d-db634c43c298

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@walldiss walldiss force-pushed the feat/talis-deploy branch from dde3881 to 5e0cf38 Compare April 29, 2026 13:15
@walldiss walldiss changed the base branch from julien/speedup-submitter to julien/fiber April 29, 2026 13:25
@walldiss walldiss force-pushed the feat/talis-deploy branch from 5e0cf38 to 4dafc89 Compare April 29, 2026 14:02
…nner

Brings the celestia-app talis multi-cloud deploy tool into ev-node,
plus a long-lived ev-node aggregator runner that wires the existing
celestia-node-fiber adapter behind ev-node's DA client interface.
Verified end-to-end on AWS — talis up → genesis → deploy →
setup-fibre → start-fibre → fibre-bootstrap-evnode reaches
24.57 MB/s @ 99.7 % ok on a 60 s sustained loadgen
(3 × c6in.4xlarge validators + c6in.2xlarge bridge +
c6in.8xlarge ev-node + c6in.2xlarge load-gen, us-east-1).

What this adds:

  • tools/talis/                — vendored from celestia-app's
    feat/fibre-payments. Provisions AWS / DO / GCP boxes for
    validators + bridge + ev-node + load-gen, deploys binaries +
    init scripts, drives the Fibre setup-fibre + start-fibre flow,
    and ships a fibre-bootstrap-evnode step that scp's the bridge
    JWT and Fibre payment keyring onto each ev-node before its
    init script starts the daemon.
  • tools/celestia-node-fiber/cmd/evnode-fibre/  — the long-lived
    aggregator runner. Wires block.NewFiberDAClient on top of the
    celestia-node-fiber adapter that julien/fiber already ships,
    plus the in-memory executor + HTTP /tx ingress used by
    evnode-txsim. Distinct from the existing fiber-bench cmd.
  • tools/talis/cmd/evnode-txsim/ — small Go load-gen that pumps
    the runner's HTTP /tx ingress for a fixed duration; deployed
    to load-gen boxes and prints a single TXSIM: line on completion.

Two small ev-node-side helpers the runner calls:

  • block/public.go: SetMaxBlobSize(n) — overrides the per-blob
    byte cap so the runner can lift Celestia's 5 MiB default to
    Fibre's 120 MiB headroom.
  • pkg/config/config.go: Config.ApplyFiberDefaults() — flips the
    DA config to Fibre-friendly settings (adaptive batching, 1 s
    DA.BlockTime, 50-deep pending-cache window) when the Fiber
    profile is enabled, so a runner can opt in with one call.

setup-fibre robustness fixes uncovered during the verified run:

  • bash script for set-host now retries until the validator's
    host appears in `query valaddr providers`. The previous one-
    shot call relied on `--yes` returning the txhash before block
    inclusion; if the chain wasn't ready, the tx silently bounced.
    The Fibre client cached the partial set on startup and uploads
    cascaded to "host not found" → "voting power: collected 0".
  • talis-CLI side polls `query valaddr providers` after the per-
    validator scripts finish and refuses to return until all
    validators are registered (5-minute deadline).

External dependency (documented in tools/talis/fibre.md):

  • Sibling clone of celestia-app on a branch with feat/fibre-payments
    + sysrex/fibre_url_fix cherry-picked. Without the URL-parse fix
    the Fibre client rejects every host:port registration.

Tested:
  - go build ./... — clean
  - go test ./block/internal/submitting ./pkg/config (the two
    pre-existing test failures on julien/fiber — TestAddFlags
    and TestFiberClient_Submit_BlobTooLarge — are not introduced
    by this PR and reproduce on raw julien/fiber)
  - End-to-end AWS deploy from this branch — 24.57 MB/s, 99.7 % ok
@walldiss walldiss force-pushed the feat/talis-deploy branch from 4dafc89 to d77175d Compare April 29, 2026 14:13
@julienrbrt julienrbrt merged commit ddd77ae into evstack:julien/fiber Apr 29, 2026
14 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants