Skip to content

[NET-514] [Alert VgSGZS] uniblock_hyperliquid-mainnet_Hotblocks_Critical_Lag#477

Open
elina-chertova wants to merge 1 commit into
open-betafrom
alert-fix/vgsgzs-uniblock-hyperliquid-mainnet-hotblocks-critical-lag-squid-sdk
Open

[NET-514] [Alert VgSGZS] uniblock_hyperliquid-mainnet_Hotblocks_Critical_Lag#477
elina-chertova wants to merge 1 commit into
open-betafrom
alert-fix/vgsgzs-uniblock-hyperliquid-mainnet-hotblocks-critical-lag-squid-sdk

Conversation

@elina-chertova
Copy link
Copy Markdown

Automated fix proposal for alert VgSGZS.

  • Alert: uniblock_hyperliquid-mainnet_Hotblocks_Critical_Lag
  • Base branch: open-beta
  • Investigation: /root/alert/incident-agent/agent-system/data/investigations/VgSGZS
  • Report: /root/alert/incident-agent/agent-system/data/investigations/VgSGZS/report.html

Reviewer quick view

  • Scope: 1 file(s) in evm

  • Root cause (agent): not explicitly captured

  • Summary: Summary

    Root cause confirmed. Plan's stop condition is met — no further data
    collection needed.

    What's happening: The evm-uniblock-hyperliquid-mainnet-hotblocks-service pod
    crashes in-process (0 pod restarts) on every block where a receipt log
    references a Hyperliquid hidden system tx that isn't in block.transactions.
    assertNotNull(txByHash.get(log.transactionHash)) throws at
    chain-utils.js:80. The service restarts the ingestion loop and immediately
    re-hits the same block → persistent ~130–156 s blockAge lag.

    This is incident 8LVGbY repeating — the SDK fix was never merged into image
    9a7fea7.

    Two deliverables ready in fixes/proposed/:

    1. SDK fix (evm/evm-rpc/src/chain-utils.ts): replace
      assertNotNull(txByHash.get(...)) with a null-safe skip in the Hyperliquid
      calculateLogsBloom branch.
    2. Temporary mitigation (uniblock.yaml): verify_logs_bloom: false —
      unblocks the service immediately; must be reverted after SDK fix ships.

Fix metadata

  • Fix class: rca_fix (SDK) + mitigation (infra config)
  • Confidence: high
  • Evidence basis: logs, code
  • Falsification: if crash does NOT occur at chain-utils.js:80 or offending log's
  • Follow-up: re-enable verify_logs_bloom: true after SDK fix deploys
    (Generated by the terminal-debate agent — values reflect the agent's self-assessment, not a verified verdict. Use them as a starting point for review.)

Summary

Summary

Root cause confirmed. Plan's stop condition is met — no further data
collection needed.

What's happening: The evm-uniblock-hyperliquid-mainnet-hotblocks-service pod
crashes in-process (0 pod restarts) on every block where a receipt log
references a Hyperliquid hidden system tx that isn't in block.transactions.
assertNotNull(txByHash.get(log.transactionHash)) throws at
chain-utils.js:80. The service restarts the ingestion loop and immediately
re-hits the same block → persistent ~130–156 s blockAge lag.

This is incident 8LVGbY repeating — the SDK fix was never merged into image
9a7fea7.

Two deliverables ready in fixes/proposed/:

  1. SDK fix (evm/evm-rpc/src/chain-utils.ts): replace
    assertNotNull(txByHash.get(...)) with a null-safe skip in the Hyperliquid
    calculateLogsBloom branch.
  2. Temporary mitigation (uniblock.yaml): verify_logs_bloom: false —
    unblocks the service immediately; must be reverted after SDK fix ships.

Risk & rollout

  • Suggested rollout: canary / one-network-first, then broader rollout after signal is stable.
  • Rollback: revert this PR (or restore previous config values/files) if the incident signal worsens.

Reproduction status

Incident behavior was reproduced or corroborated strongly enough for a non-hypothesis fix proposal.

Validation checklist

  • Verify the original incident signal improves (logs/metrics/alerts) after deploy.
  • Verify no regression on sibling networks/providers/services touched by this change.
  • Confirm queue / delivery pipeline status returns to expected steady state.

Changed files

  • evm/evm-rpc/src/chain-utils.ts

Notify

cc @tmcgroul (automation opened this PR.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants