feat(tamanu): TAM-6782: add reload subcommand for safe rolling restart#313
Open
dannash100 wants to merge 11 commits into
Open
feat(tamanu): TAM-6782: add reload subcommand for safe rolling restart#313dannash100 wants to merge 11 commits into
dannash100 wants to merge 11 commits into
Conversation
Restarts running tamanu services one at a time with a wait between each,
mirroring the approach in the ops repo's tamanu-single-upgrade playbook.
On Linux, drives systemd units `tamanu-{kind}-*`, `tamanu-frontend@*` and
`tamanu-patientportal`, reloading caddy + flushing systemd-resolved
between restarts so caddy picks up the new container IP. On Windows,
restarts every online pm2 process. Optional HTTP probe via --check-url.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`systemctl is-active` only proves podman started the container, not that the Node app inside is serving. After is-active, look up the container by its `PODMAN_SYSTEMD_UNIT` label, get its netavark IP, and probe http://<ip>:3000/ until it responds (any non-network-error counts). Skip for known workers (`*-tasks`, `*-fhir-*`) that don't listen on a port. Opt-out via --no-strict. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trip the typos CI check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ed45587 to
eb9dfcf
Compare
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
macos-15 runners may ship without rustup pre-installed; without this, `rustup toolchain install --no-self-update` fails and downstream `cargo auditable build` ends up routed to `rustup-init`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
USAGE.md is the Linux-canonical form; update-usage.sh on macOS leaks the macOS data dir string. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pm2 path now rolls one pm_id at a time (so scaled apps like tamanu-api roll instance-by-instance instead of all at once) and the strict probe hits http://127.0.0.1:<PORT>/ per process, using the resolved PORT env var from pm2's increment_var. Workers without a PORT skip the probe, matching the Linux worker carve-out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
passcod
requested changes
May 17, 2026
- Deserialize systemd `active` as bool instead of String - Collapse detect_backend into a single if/else chain - Rename --no-strict to --no-probe-http for clarity Co-authored-by: Félix Saparelli <felix@passcod.name>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
passcod
approved these changes
May 18, 2026
Member
|
LGTM but please test builds on actual non-prod before we merge :) |
Member
|
Yeah, that's not working right Also needs to call sudo And I don't see it check that the frontends are up before moving on? |
Author
|
@passcod on it! |
If invoked without root on a systemd host, re-execs the same args via sudo so users don't have to remember the prefix. Promotes the per-unit active/HTTP probe logs from debug to info so each restart visibly shows the readiness gate before the cooldown. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
|
Ah, something that was missing but I didn't say explicitly there is that this was just restarting the frontends and not doing the apis/workers/tasks/etc. Maybe it's just matching on |
…efixed
The deployment uses unprefixed unit names (tamanu-api, tamanu-tasks,
tamanu-sync, tamanu-fhir-*), same as on pm2. The kind-prefix filter
('tamanu-central-' / 'tamanu-facility-') matched nothing, so reload
was only catching the explicit tamanu-frontend@* and tamanu-patientportal
units. Broaden to all active 'tamanu-*' units, excluding templates,
'tamanu-meta*', and 'tamanu-alertd'. Switch systemd kind detection to
the same tamanu-sync heuristic the pm2 path uses.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
While the fix probably works, the reasoning is bonkers lmao. Wonder what it thinks of: $ sudo systemctl list-units tamanu-*
UNIT LOAD ACTIVE SUB DESCRIPTION
tamanu-central-api@1.service loaded active running Tamanu Central server API
tamanu-central-api@2.service loaded active running Tamanu Central server API
tamanu-central-fhir-refresh.service loaded active running Tamanu Central FHIR worker (refresh)
tamanu-central-fhir-resolve.service loaded active running Tamanu Central FHIR worker (resolve)
tamanu-central-tasks.service loaded active running Tamanu Central server scheduled tasks
tamanu-frontend@a.service loaded active running Tamanu frontend
tamanu-frontend@b.service loaded active running Tamanu frontend
tamanu-patientportal.service loaded active running Tamanu patient portal |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Restarts running tamanu services one at a time with a wait between each,
mirroring the approach in the ops repo's tamanu-single-upgrade playbook.
On Linux, drives systemd units
tamanu-{kind}-*,tamanu-frontend@*andtamanu-patientportal, reloading caddy + flushing systemd-resolvedbetween restarts so caddy picks up the new container IP. On Windows,
restarts every online pm2 process. Optional HTTP probe via --check-url.
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com