feat(node-observer): wait for topograph health in-process#370
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #370 +/- ##
==========================================
+ Coverage 72.24% 72.45% +0.20%
==========================================
Files 86 88 +2
Lines 5441 5510 +69
==========================================
+ Hits 3931 3992 +61
- Misses 1317 1321 +4
- Partials 193 197 +4 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Greptile SummaryMoves the topograph readiness wait from a Helm init container into the
Confidence Score: 5/5Safe to merge; the change is self-contained and well-tested. The new health-poll loop is straightforward: idiomatic stopped-timer pattern, correct No files require special attention. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Main as node-observer main
participant C as Controller.Start()
participant W as waitForTopograph()
participant H as topograph /healthz
participant S as StatusInformer
Main->>C: Start()
C->>W: waitForTopograph(ctx, healthURL, 2s, 1m)
loop Until 2xx or timeout
W->>H: GET /healthz
alt HTTP 2xx
H-->>W: 200 OK
W-->>C: nil
else non-2xx / network error
H-->>W: 503 / error
W->>W: wait 2s (timer)
end
end
alt timeout / context cancelled
W-->>C: error (DeadlineExceeded / Canceled)
C-->>Main: error → pod restarts
else ready
C->>S: StatusInformer.Start()
S-->>C: running
end
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Main as node-observer main
participant C as Controller.Start()
participant W as waitForTopograph()
participant H as topograph /healthz
participant S as StatusInformer
Main->>C: Start()
C->>W: waitForTopograph(ctx, healthURL, 2s, 1m)
loop Until 2xx or timeout
W->>H: GET /healthz
alt HTTP 2xx
H-->>W: 200 OK
W-->>C: nil
else non-2xx / network error
H-->>W: 503 / error
W->>W: wait 2s (timer)
end
end
alt timeout / context cancelled
W-->>C: error (DeadlineExceeded / Canceled)
C-->>Main: error → pod restarts
else ready
C->>S: StatusInformer.Start()
S-->>C: running
end
Reviews (2): Last reviewed commit: "feat(node-observer): wait for topograph ..." | Re-trigger Greptile |
Move the topograph readiness wait out of the chart's wait init container and into the node-observer binary. The controller polls /healthz (derived from generateTopologyUrl) every 2s and gives up after 1m, reporting the actual elapsed time on timeout. Remove the wait init container, waitImage value, and node-observer.waitImage helper. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com> Signed-off-by: Dmitry Shmulevich <dshmulevich@nvidia.com>
Move the topograph readiness wait out of the chart's wait init container and into the node-observer binary. The controller polls /healthz (derived from generateTopologyUrl) every 2s and gives up after 1m, reporting the actual elapsed time on timeout. Remove the wait init container, waitImage value, and node-observer.waitImage helper.
Checklist
git commit -s).