Skip to content

OCPBUGS-88333: Fix bootupd workaround in old nodes#6199

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
pablintino:ocpbugs-88333
Jun 27, 2026
Merged

OCPBUGS-88333: Fix bootupd workaround in old nodes#6199
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
pablintino:ocpbugs-88333

Conversation

@pablintino

@pablintino pablintino commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Closes: #OCPBUGS-88333

- What I did

The shim version check (shimIsSafe) looked at the RPM database in /usr, which reflects the system image. For nodes born from older releases, the shim in /usr is up to date but the ESP still contains old bootloader binaries, causing the workaround to be incorrectly skipped.

Replace the shim version check with a node-level bootupctl update on RHEL 9.6+ before falling back to the container-based approach.

- How to verify it

  1. Setup a Secure Boot enabled 4.15 cluster and update it to 4.20 or setup a 4.20 cluster and manually rollback the shim & GRUB binaries in the ESP to the ones from 4.15 or earlier.
  2. Trigger the update to 4.22

- Description for the changelog

Fix bootloader update being skipped on nodes with stale bootloader binaries

Summary by CodeRabbit

Release Notes

  • New Features

    • Bootloader updates now use smarter decision logic based on CoreOS variant, supported CPU architecture, and Secure Boot status.
    • For RHEL 9.6+, the updater attempts a node-level update first, then falls back to the container-based approach (and ultimately to a direct EFI copy if needed).
  • Bug Fixes

    • Improved OS major-version handling when selecting the appropriate update binary.
    • Enhanced OS version parsing by providing numeric major/minor components.
  • Tests

    • Expanded OS release parsing tests to validate major/minor extraction.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 17, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@pablintino: This pull request references Jira Issue OCPBUGS-88333, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Closes: #OCPBUGS-88333

- What I did

The shim version check (shimIsSafe) looked at the RPM database in /usr, which reflects the system image. For nodes born from older releases, the shim in /usr is up to date but the ESP still contains old bootloader binaries, causing the workaround to be incorrectly skipped.

Replace the shim version check with a node-level bootupctl update on RHEL 9.6+ before falling back to the container-based approach.

- How to verify it

  1. Setup a Secure Boot enabled 4.15 cluster and update it to 4.20 or setup a 4.20 cluster and manually rollback the shim & GRUB binaries in the ESP to the ones from 4.15 or earlier.
  2. Trigger the update to 4.22

- Description for the changelog

Fix bootloader update being skipped on nodes with stale bootloader binaries

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

OS base version parsing now returns numeric major and minor components. Bootloader updates use a new gated update path with a direct node-level attempt for RHEL 9.6+ and container fallback. RHEL target-root binary selection now compares integer major versions.

Changes

Bootloader and version handling

Layer / File(s) Summary
OS version parsing
pkg/daemon/osrelease/osrelease.go, pkg/daemon/osrelease/osrelease_test.go
BaseVersionMajor now returns an int, BaseVersionMinor is added, and the table-driven OS release tests assert both parsed values across RHCOS, Fedora, SCOS, and FCOS cases.
Target-root binary selection
pkg/daemon/daemon.go
ReexecuteForTargetRoot now matches RHEL major versions as integers and sets sourceBinarySuffix explicitly for the supported transitions.
Bootloader update flow
pkg/daemon/bootupd.go, pkg/daemon/update.go
runBootloaderUpdate replaces the previous shim/version gating helpers, removes the container-path early returns, attempts bootupctl update for RHEL 9.6+, falls back to runBootupdViaContainer, and updateLayeredOS calls the new entry point.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

lgtm, qe-approved, verified

Suggested reviewers

  • proietfb
  • dkhater-redhat
🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: updating the bootupd workaround for older nodes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed No Ginkgo titles were added; the touched test names are static table cases like "RHCOS 8.6" and "FCOS" with no dynamic values.
Test Structure And Quality ✅ Passed The only changed test is a plain testify unit test with no Ginkgo, cluster resources, or waits; the custom Ginkgo-specific quality rules don’t apply.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests were added; the only test change is a plain Go unit test in pkg/daemon/osrelease/osrelease_test.go with no MicroShift-sensitive APIs.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo/e2e tests were added; the PR changes daemon logic and unit tests only, so there are no SNO-specific multi-node assumptions to flag.
Topology-Aware Scheduling Compatibility ✅ Passed PASS: The PR only changes daemon bootloader/version parsing logic; no deployments/controllers, pod specs, nodeSelectors, anti-affinity, spread constraints, or PDBs were added.
Ote Binary Stdout Contract ✅ Passed Touched files add no main/init/TestMain/suite setup stdout writes; no new fmt.Print-style process-level output was introduced.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests were added; the only test change is a standard testify unit test, with no IPv4-only assumptions or public-network dependencies.
No-Weak-Crypto ✅ Passed No MD5/SHA1/DES/RC4/3DES/Blowfish/ECB or secret-comparison code appears in the touched diff; only SHA256 checksum logging is used.
Container-Privileges ✅ Passed The PR only changes Go logic; no container/K8s manifests or securityContext fields were added or modified.
No-Sensitive-Data-In-Logs ✅ Passed New logs are operational only; they print statuses, device paths, or checksums, with no passwords/tokens/PII/session IDs/internal hostnames added.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 17, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/daemon/bootupd.go`:
- Around line 360-370: The direct call to runCmdSync("bootupctl", "update") does
not properly execute in a systemd context and lacks the INVOCATION_ID
environment variable that bootupctl requires to pass systemd checks. Modify the
bootupctl command invocation to run through systemd-run by changing the command
arguments passed to runCmdSync so that bootupctl executes within a systemd
context, similar to how the container-based fallback path handles INVOCATION_ID
injection. This ensures the node-level bootupctl update attempt will succeed on
RHEL 9.6+ nodes instead of prematurely falling back to the container path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2938802f-abaf-4248-8fb0-bacb0c6e05d4

📥 Commits

Reviewing files that changed from the base of the PR and between d0c3a5e and bfffcd4.

📒 Files selected for processing (5)
  • pkg/daemon/bootupd.go
  • pkg/daemon/daemon.go
  • pkg/daemon/osrelease/osrelease.go
  • pkg/daemon/osrelease/osrelease_test.go
  • pkg/daemon/update.go

Comment thread pkg/daemon/bootupd.go Outdated
@djoshy

djoshy commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Generally makes sense to me from the MCO POV, I'll leave tagging to coreos folks cc @travier

@djoshy

djoshy commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

I think the fastest way to test this would be:

  1. Create a secureboot cluster on 4.22/5.0 without any of the bootupd fixes from https://redhat.atlassian.net/browse/MCO-2227. 4.22.0-rc.2 is the latest release without them.
  2. Disable bootimage updates for the cluster.
  3. Backdate the bootimage to 4.15 and scale up a new node.
  4. Upgrade to a release with this fix.
  5. Bootloader update should have taken place via the new path on the newly created node in step 4.

@isabella-janssen

Copy link
Copy Markdown
Member

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 18, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@isabella-janssen: This pull request references Jira Issue OCPBUGS-88333, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@travier travier left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tested this but the logic/code LGTM. Thanks!

@travier

travier commented Jun 18, 2026

Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2026
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-ovn
/test e2e-aws-ovn-upgrade
/test e2e-gcp-op-ocl-part1
/test e2e-gcp-op-ocl-part2
/test e2e-gcp-op-part1
/test e2e-gcp-op-part2
/test e2e-gcp-op-single-node
/test e2e-hypershift

@djoshy

djoshy commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

/retest-required

@ptalgulk01

Copy link
Copy Markdown
Contributor

Pre-merge verification-

Setup:
To get the secure boot cluster we need to use prow job like here https://github.com/openshift/release/pull/80899/changes#diff-9b9ad4fb616269370824c342d66f9a9a04496a5e0e59fc530b8b3416c69d4678R1174
Platform: GCP

Steps:

  • Get the 4.22 SB cluster
$ oc debug node/ci-op-4zhxpp00-56d51-vbzsw-worker-c-9vg86 -- chroot /host mokutil --sb-state
Starting pod/ci-op-4zhxpp00-56d51-vbzsw-worker-c-9vg86-debug-wp7fd ...
To use host binaries, run `chroot /host`
SecureBoot enabled

Removing debug pod ...
  • Disable the bootimage
$ oc patch machineconfiguration cluster --type=merge -p  '{"spec":{"managedBootImages":{"machineManagers":[{"resource":"machinesets","apiGroup":"machine.openshift.io","selection":{"mode":"None"}}]}}}'

$   oc get machineconfiguration cluster -o jsonpath='{.spec.managedBootImages}'
{"machineManagers":[{"apiGroup":"machine.openshift.io","resource":"machinesets","selection":{"mode":"None"}}]}

  • Patch the 4.15 Rhcos image in GCP cluster by first duplicating the existing MS and patch the rhcos image
....
      disks:
      - autoDelete: false
        boot: false
        image: projects/rhcos-cloud/global/images/rhcos-415-92-202402201450-0-gcp-x86-64
        labels: null
        sizeGb: 0
        type: pd-standard
  • Scale the node of duplicate MS
$ oc scale --replicas 1  machinesets.machine.openshift.io -n openshift-machine-api ci-op-4zhxpp00-56d51-vbzsw-worker-415test
machineset.machine.openshift.io/ci-op-4zhxpp00-56d51-vbzsw-worker-415test scaled
  • Check the below on new node
  oc debug node/ci-op-4zhxpp00-56d51-vbzsw-worker-415test-grj25 -- chroot /host sh -c "
  echo '======================================'
  echo 'BOOTLOADER DEBUG INFO'
  echo '======================================'
  echo ''
  echo '=== 1. Bootupctl Status ==='
  bootupctl status
  echo ''
  echo '=== 2. Installed Packages ==='
  rpm -q shim-x64 grub2-efi-x64 bootupd
  echo ''
  echo '=== 3. OS Version ==='
  rpm-ostree status | head -10
  echo ''
  echo '=== 4. EFI Mount ==='
  mount | grep -i efi
  echo ''
  echo '=== 5. EFI Files ==='
  find /boot/efi -name '*.efi' 2>/dev/null | head -10
  echo ''
  echo '=== 6. Bootupctl Validate ==='
  bootupctl validate 2>&1
  echo ''
  echo '=== 7. EFI Boot Manager ==='
  efibootmgr -v 2>&1 | head -20
  "
Starting pod/ci-op-4zhxpp00-56d51-vbzsw-worker-415test-grj25-debug-ddjgp ...
To use host binaries, run `chroot /host`
======================================
BOOTLOADER DEBUG INFO
======================================

=== 1. Bootupctl Status ===
Available components: BIOS EFI

=== 2. Installed Packages ===
shim-x64-15.8-4.el9_3.x86_64
grub2-efi-x64-2.06-105.el9_6.x86_64
bootupd-0.2.27-5.el9_6.x86_64

=== 3. OS Version ===
State: idle
Deployments:
* ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:aa141f406b68f5476102928f20d1fa562944e1003d1fb8cf9ad583b6dd3a9a37
                   Digest: sha256:aa141f406b68f5476102928f20d1fa562944e1003d1fb8cf9ad583b6dd3a9a37
                  Version: 9.6.20260130-0 (2026-02-02T14:01:31Z)

=== 4. EFI Mount ===
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)

=== 5. EFI Files ===

=== 6. Bootupctl Validate ===
Failed to start transient service unit: Connection reset by peer

=== 7. EFI Boot Manager ===
BootCurrent: 0002
Timeout: 0 seconds
BootOrder: 0002,0001,0000
Boot0000* UiApp	FvVol(7cb8bdc9-f8eb-4f34-aaea-3ee4af6516a1)/FvFile(462caa21-7614-4503-836e-8ab6f4662331)
Boot0001* UEFI nvme_card-pd	PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)N.....YM....R,Y.
Boot0002* redhat	PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(2,GPT,ad03a38f-5285-4a37-a073-178bd70f2695,0x1000,0x3f800)/File(\EFI\redhat\shimx64.efi)

Removing debug pod ...
  • Upgrade the cluster to the fix
$ oc adm upgrade --to-image=registry.build10.ci.openshift.org/ci-ln-72s61vb/release:latest --allow-explicit-upgrade  --force
  • After upgrade is completed re-run the above cmd on node
  oc debug node/ci-op-4zhxpp00-56d51-vbzsw-worker-415test-grj25 -- chroot /host sh -c "
  echo '======================================'
  echo 'BOOTLOADER DEBUG INFO'
  echo '======================================'
  echo ''
  echo '=== 1. Bootupctl Status ==='
  bootupctl status
  echo ''
  echo '=== 2. Installed Packages ==='
  rpm -q shim-x64 grub2-efi-x64 bootupd
  echo ''
  echo '=== 3. OS Version ==='
  rpm-ostree status | head -10
  echo ''
  echo '=== 4. EFI Mount ==='
  mount | grep -i efi
  echo ''
  echo '=== 5. EFI Files ==='
  find /boot/efi -name '*.efi' 2>/dev/null | head -10
  echo ''
  echo '=== 6. Bootupctl Validate ==='
  bootupctl validate 2>&1
  echo ''
  echo '=== 7. EFI Boot Manager ==='
  efibootmgr -v 2>&1 | head -20
  "
Starting pod/ci-op-4zhxpp00-56d51-vbzsw-worker-415test-grj25-debug-c4qqb ...
To use host binaries, run `chroot /host`
======================================
BOOTLOADER DEBUG INFO
======================================

=== 1. Bootupctl Status ===
Available components: BIOS EFI

=== 2. Installed Packages ===
shim-x64-16.1-7.el9.x86_64
grub2-efi-x64-2.06-126.el9_8.x86_64
bootupd-0.2.31-1.el9.x86_64

=== 3. OS Version ===
State: idle
Deployments:
* ostree-unverified-image:containers-storage:registry.build10.ci.openshift.org/ci-ln-72s61vb/stable@sha256:dd4a77b00748fd136da5673b4f2fff4770bd4f9c1a012bc7592c7f02080b6786
                   Digest: sha256:dd4a77b00748fd136da5673b4f2fff4770bd4f9c1a012bc7592c7f02080b6786
                  Version: 9.8.20260617-0 (2026-06-17T21:22:31Z)

=== 4. EFI Mount ===
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)

=== 5. EFI Files ===

=== 6. Bootupctl Validate ===
Failed to start transient service unit: Connection reset by peer

=== 7. EFI Boot Manager ===
BootCurrent: 0002
Timeout: 0 seconds
BootOrder: 0002,0001,0000
Boot0000* UiApp	FvVol(7cb8bdc9-f8eb-4f34-aaea-3ee4af6516a1)/FvFile(462caa21-7614-4503-836e-8ab6f4662331)
Boot0001* UEFI nvme_card-pd	PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)N.....YM....R,Y.
Boot0002* redhat	PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(2,GPT,ad03a38f-5285-4a37-a073-178bd70f2695,0x1000,0x3f800)/File(\EFI\redhat\shimx64.efi)

Removing debug pod ...
  • Check the bootupctl logs
$  oc debug node/ci-op-4zhxpp00-56d51-vbzsw-worker-415test-grj25 -- chroot /host journalctl --since "2 hours ago" | grep -i "bootupctl"
Starting pod/ci-op-4zhxpp00-56d51-vbzsw-worker-415test-grj25-debug-7jztr ...
To use host binaries, run `chroot /host`
Jun 24 13:48:11 ci-op-4zhxpp00-56d51-vbzsw-worker-415test-grj25 root[32429]: machine-config-daemon[29204]: "runBootloaderUpdate: RHEL 9.6+ detected, attempting node-level bootupctl update"
Jun 24 13:48:11 ci-op-4zhxpp00-56d51-vbzsw-worker-415test-grj25 root[32432]: machine-config-daemon[29204]: "runBootloaderUpdate: node-level bootupctl update failed, falling back to container-based update"
Jun 24 13:49:21 ci-op-4zhxpp00-56d51-vbzsw-worker-415test-grj25 root[33874]: machine-config-daemon[29204]: "bootupctl: running update from container image"
Jun 24 13:49:21 ci-op-4zhxpp00-56d51-vbzsw-worker-415test-grj25 systemd[1]: Started /usr/bin/podman run --env INVOCATION_ID=fe42e94b-8023-49ea-9cbe-7db5526f73d6 --privileged --pid=host --net=host --rm -v /boot:/boot:rslave -v /dev:/dev registry.build10.ci.openshift.org/ci-ln-72s61vb/stable@sha256:dd4a77b00748fd136da5673b4f2fff4770bd4f9c1a012bc7592c7f02080b6786 bootupctl update.

Removing debug pod ...
  • Able to see that ESP was updated
 System Packages:
  shim-x64-16.1-7.el9.x86_64
  grub2-efi-x64-2.06-126.el9_8.x86_64

  ESP Installed (from bootupd):
  grub2-efi-x64-1:2.06-126.el9_8.x86_64,shim-x64-16.1-7.el9.x86_64

  ESP Update Time:
  2026-03-09T22:46:10Z

@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 24, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@pablintino: This pull request references Jira Issue OCPBUGS-88333, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

Closes: #OCPBUGS-88333

- What I did

The shim version check (shimIsSafe) looked at the RPM database in /usr, which reflects the system image. For nodes born from older releases, the shim in /usr is up to date but the ESP still contains old bootloader binaries, causing the workaround to be incorrectly skipped.

Replace the shim version check with a node-level bootupctl update on RHEL 9.6+ before falling back to the container-based approach.

- How to verify it

  1. Setup a Secure Boot enabled 4.15 cluster and update it to 4.20 or setup a 4.20 cluster and manually rollback the shim & GRUB binaries in the ESP to the ones from 4.15 or earlier.
  2. Trigger the update to 4.22

- Description for the changelog

Fix bootloader update being skipped on nodes with stale bootloader binaries

Summary by CodeRabbit

Release Notes

  • New Features

  • Bootloader updates now intelligently gate by CoreOS variant, CPU architecture, and Secure Boot status.

  • For RHEL 9.6+, the updater first attempts a node-level bootloader update, then falls back to the container-based approach if needed.

  • Bug Fixes

  • Corrected OS major-version handling to select the appropriate update binary.

  • Improved OS version parsing by exposing numeric major/minor components.

  • Tests

  • Extended OS release parsing tests to validate major/minor extraction.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ptalgulk01

ptalgulk01 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Pre-merge re-verification-

Setup:
To get the secure boot cluster we need to use prow job like here https://github.com/openshift/release/pull/80899/changes#diff-9b9ad4fb616269370824c342d66f9a9a04496a5e0e59fc530b8b3416c69d4678R1174
Platform: GCP

Steps:

  • Get the 4.22 SB cluster
$ oc debug node/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw  -- chroot /host mokutil --sb-state
Starting pod/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw-debug-lgzns ...
To use host binaries, run `chroot /host`
SecureBoot enabled

Removing debug pod ...
  • Disable the bootimage
$ oc patch machineconfiguration cluster --type=merge -p  '{"spec":{"managedBootImages":{"machineManagers":[{"resource":"machinesets","apiGroup":"machine.openshift.io","selection":{"mode":"None"}}]}}}'

$   oc get machineconfiguration cluster -o jsonpath='{.spec.managedBootImages}'
{"machineManagers":[{"apiGroup":"machine.openshift.io","resource":"machinesets","selection":{"mode":"None"}}]}

  • Patch the 4.15 Rhcos image in GCP cluster by first duplicating the existing MS and patch the rhcos image
....
      disks:
      - autoDelete: false
        boot: false
        image: projects/rhcos-cloud/global/images/rhcos-415-92-202402201450-0-gcp-x86-64
        labels: null
        sizeGb: 0
        type: pd-standard
  • Scale the node of duplicate MS
$ oc scale --replicas 1  machinesets.machine.openshift.io -n openshift-machine-api ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test
machineset.machine.openshift.io/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test scaled
  • Check the below on new node
  $ oc debug node/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw  -- chroot /host sh -c "
  echo '======================================'
  echo 'BOOTLOADER DEBUG INFO'
  echo '======================================'
  echo ''
  echo '=== 1. Bootupctl Status ==='
  bootupctl status
  echo ''
  echo '=== 2. Installed Packages ==='
  rpm -q shim-x64 grub2-efi-x64 bootupd
  echo ''
  echo '=== 3. OS Version ==='
  rpm-ostree status | head -10
  echo ''
  echo '=== 4. EFI Mount ==='
  mount | grep -i efi
  echo ''
  echo '=== 5. EFI Files ==='
  find /boot/efi -name '*.efi' 2>/dev/null | head -10
  echo ''
  echo '=== 6. Bootupctl Validate ==='
  bootupctl validate 2>&1
  echo ''
  echo '=== 7. EFI Boot Manager ==='
  efibootmgr -v 2>&1 | head -20
  "
Starting pod/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw-debug-q2dfh ...
To use host binaries, run `chroot /host`
======================================
BOOTLOADER DEBUG INFO
======================================

=== 1. Bootupctl Status ===
Available components: BIOS EFI

=== 2. Installed Packages ===
shim-x64-15.8-4.el9_3.x86_64
grub2-efi-x64-2.06-105.el9_6.x86_64
bootupd-0.2.27-5.el9_6.x86_64

=== 3. OS Version ===
State: idle
Deployments:
* ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:aa141f406b68f5476102928f20d1fa562944e1003d1fb8cf9ad583b6dd3a9a37
                   Digest: sha256:aa141f406b68f5476102928f20d1fa562944e1003d1fb8cf9ad583b6dd3a9a37
                  Version: 9.6.20260130-0 (2026-02-02T14:01:31Z)

=== 4. EFI Mount ===
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)

=== 5. EFI Files ===

=== 6. Bootupctl Validate ===
Failed to start transient service unit: Connection reset by peer

=== 7. EFI Boot Manager ===
BootCurrent: 0002
Timeout: 0 seconds
BootOrder: 0002,0001,0000
Boot0000* UiApp	FvVol(7cb8bdc9-f8eb-4f34-aaea-3ee4af6516a1)/FvFile(462caa21-7614-4503-836e-8ab6f4662331)
Boot0001* UEFI nvme_card-pd	PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)N.....YM....R,Y.
Boot0002* redhat	PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(2,GPT,ad03a38f-5285-4a37-a073-178bd70f2695,0x1000,0x3f800)/File(\EFI\redhat\shimx64.efi)

Removing debug pod ...
  • Upgrade the cluster to the fix
$ oc adm upgrade --to-image=registry.build10.ci.openshift.org/ci-ln-72s61vb/release:latest --allow-explicit-upgrade  --force
  • After upgrade is completed re-run the above cmd on node
$ oc debug node/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw  -- chroot /host sh -c "
  echo '======================================'
  echo 'BOOTLOADER DEBUG INFO'
  echo '======================================'
  echo ''
  echo '=== 1. Bootupctl Status ==='
  bootupctl status
  echo ''
  echo '=== 2. Installed Packages ==='
  rpm -q shim-x64 grub2-efi-x64 bootupd
  echo ''
  echo '=== 3. OS Version ==='
  rpm-ostree status | head -10
  echo ''
  echo '=== 4. EFI Mount ==='
  mount | grep -i efi
  echo ''
  echo '=== 5. EFI Files ==='
  find /boot/efi -name '*.efi' 2>/dev/null | head -10
  echo ''
  echo '=== 6. Bootupctl Validate ==='
  bootupctl validate 2>&1
  echo ''
  echo '=== 7. EFI Boot Manager ==='
  efibootmgr -v 2>&1 | head -20
  "
Starting pod/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw-debug-wd7rp ...
To use host binaries, run `chroot /host`
======================================
BOOTLOADER DEBUG INFO
======================================

=== 1. Bootupctl Status ===
Available components: BIOS EFI

=== 2. Installed Packages ===
shim-x64-16.1-7.el9.x86_64
grub2-efi-x64-2.06-126.el9_8.x86_64
bootupd-0.2.31-1.el9.x86_64

=== 3. OS Version ===
State: idle
Deployments:
* ostree-unverified-registry:registry.build10.ci.openshift.org/ci-ln-ilvggkk/stable@sha256:a39c18562a72635a9a4fcebd0c2df385426fda2edb0e822abacdb7e224aba730
                   Digest: sha256:a39c18562a72635a9a4fcebd0c2df385426fda2edb0e822abacdb7e224aba730
                  Version: 9.8.20260623-0 (2026-06-25T09:38:58Z)

=== 4. EFI Mount ===
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)

=== 5. EFI Files ===

=== 6. Bootupctl Validate ===
Failed to start transient service unit: Connection reset by peer

=== 7. EFI Boot Manager ===
BootCurrent: 0002
Timeout: 0 seconds
BootOrder: 0002,0001,0000
Boot0000* UiApp	FvVol(7cb8bdc9-f8eb-4f34-aaea-3ee4af6516a1)/FvFile(462caa21-7614-4503-836e-8ab6f4662331)
Boot0001* UEFI nvme_card-pd	PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)N.....YM....R,Y.
Boot0002* redhat	PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(2,GPT,ad03a38f-5285-4a37-a073-178bd70f2695,0x1000,0x3f800)/File(\EFI\redhat\shimx64.efi)

Removing debug pod ...
  • Check the bootupctllogs
$  oc debug node/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw -- chroot /host journalctl --since "2 hours ago" | grep -i "bootupctl"
Starting pod/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw-debug-tb9qt ...
To use host binaries, run `chroot /host`
Jun 26 07:28:35 ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw bootupctl[45881]: Previous EFI: grub2-efi-x64-1:2.06-61.el9_2.2.x86_64,shim-x64-15.6-1.el9.x86_64
Jun 26 07:28:35 ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw bootupctl[45881]: Updated EFI: grub2-efi-x64-1:2.06-105.el9_6.x86_64,shim-x64-15.8-4.el9_3.x86_64
Jun 26 07:28:37 ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw bootupctl[45881]: Adopted and updated: BIOS: grub2-tools-1:2.06-105.el9_6.x86_64

Removing debug pod ...

$ oc debug node/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw -- chroot /host journalctl --since "2 hours ago" | grep -i "runBootloaderUpdate"
Starting pod/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw-debug-s6284 ...
To use host binaries, run `chroot /host`
Jun 26 07:28:34 ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw root[45879]: machine-config-daemon[43926]: "runBootloaderUpdate: RHEL 9.6+ detected, attempting node-level bootloader update via bootloader-update.service"
Jun 26 07:28:37 ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw root[45960]: machine-config-daemon[43926]: "runBootloaderUpdate: node-level bootloader-update.service succeeded"

Removing debug pod ...

/label qe-approved
/verified by @ptalgulk01

@openshift-ci openshift-ci Bot added the qe-approved Signifies that QE has signed off on this PR label Jun 26, 2026
@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 26, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@ptalgulk01: This PR has been marked as verified by @ptalgulk01.

Details

In response to this:

Pre-merge re-verification-

Setup:
To get the secure boot cluster we need to use prow job like here https://github.com/openshift/release/pull/80899/changes#diff-9b9ad4fb616269370824c342d66f9a9a04496a5e0e59fc530b8b3416c69d4678R1174
Platform: GCP

Steps:

  • Get the 4.22 SB cluster
$ oc debug node/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw  -- chroot /host mokutil --sb-state
Starting pod/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw-debug-lgzns ...
To use host binaries, run `chroot /host`
SecureBoot enabled

Removing debug pod ...
  • Disable the bootimage
$ oc patch machineconfiguration cluster --type=merge -p  '{"spec":{"managedBootImages":{"machineManagers":[{"resource":"machinesets","apiGroup":"machine.openshift.io","selection":{"mode":"None"}}]}}}'

$   oc get machineconfiguration cluster -o jsonpath='{.spec.managedBootImages}'
{"machineManagers":[{"apiGroup":"machine.openshift.io","resource":"machinesets","selection":{"mode":"None"}}]}

  • Patch the 4.15 Rhcos image in GCP cluster by first duplicating the existing MS and patch the rhcos image
....
     disks:
     - autoDelete: false
       boot: false
       image: projects/rhcos-cloud/global/images/rhcos-415-92-202402201450-0-gcp-x86-64
       labels: null
       sizeGb: 0
       type: pd-standard
  • Scale the node of duplicate MS
$ oc scale --replicas 1  machinesets.machine.openshift.io -n openshift-machine-api ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test
machineset.machine.openshift.io/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test scaled
  • Check the below on new node
 $ oc debug node/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw  -- chroot /host sh -c "
 echo '======================================'
 echo 'BOOTLOADER DEBUG INFO'
 echo '======================================'
 echo ''
 echo '=== 1. Bootupctl Status ==='
 bootupctl status
 echo ''
 echo '=== 2. Installed Packages ==='
 rpm -q shim-x64 grub2-efi-x64 bootupd
 echo ''
 echo '=== 3. OS Version ==='
 rpm-ostree status | head -10
 echo ''
 echo '=== 4. EFI Mount ==='
 mount | grep -i efi
 echo ''
 echo '=== 5. EFI Files ==='
 find /boot/efi -name '*.efi' 2>/dev/null | head -10
 echo ''
 echo '=== 6. Bootupctl Validate ==='
 bootupctl validate 2>&1
 echo ''
 echo '=== 7. EFI Boot Manager ==='
 efibootmgr -v 2>&1 | head -20
 "
Starting pod/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw-debug-q2dfh ...
To use host binaries, run `chroot /host`
======================================
BOOTLOADER DEBUG INFO
======================================

=== 1. Bootupctl Status ===
Available components: BIOS EFI

=== 2. Installed Packages ===
shim-x64-15.8-4.el9_3.x86_64
grub2-efi-x64-2.06-105.el9_6.x86_64
bootupd-0.2.27-5.el9_6.x86_64

=== 3. OS Version ===
State: idle
Deployments:
* ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:aa141f406b68f5476102928f20d1fa562944e1003d1fb8cf9ad583b6dd3a9a37
                  Digest: sha256:aa141f406b68f5476102928f20d1fa562944e1003d1fb8cf9ad583b6dd3a9a37
                 Version: 9.6.20260130-0 (2026-02-02T14:01:31Z)

=== 4. EFI Mount ===
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)

=== 5. EFI Files ===

=== 6. Bootupctl Validate ===
Failed to start transient service unit: Connection reset by peer

=== 7. EFI Boot Manager ===
BootCurrent: 0002
Timeout: 0 seconds
BootOrder: 0002,0001,0000
Boot0000* UiApp	FvVol(7cb8bdc9-f8eb-4f34-aaea-3ee4af6516a1)/FvFile(462caa21-7614-4503-836e-8ab6f4662331)
Boot0001* UEFI nvme_card-pd	PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)N.....YM....R,Y.
Boot0002* redhat	PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(2,GPT,ad03a38f-5285-4a37-a073-178bd70f2695,0x1000,0x3f800)/File(\EFI\redhat\shimx64.efi)

Removing debug pod ...
  • Upgrade the cluster to the fix
$ oc adm upgrade --to-image=registry.build10.ci.openshift.org/ci-ln-72s61vb/release:latest --allow-explicit-upgrade  --force
  • After upgrade is completed re-run the above cmd on node
$ oc debug node/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw  -- chroot /host sh -c "
 echo '======================================'
 echo 'BOOTLOADER DEBUG INFO'
 echo '======================================'
 echo ''
 echo '=== 1. Bootupctl Status ==='
 bootupctl status
 echo ''
 echo '=== 2. Installed Packages ==='
 rpm -q shim-x64 grub2-efi-x64 bootupd
 echo ''
 echo '=== 3. OS Version ==='
 rpm-ostree status | head -10
 echo ''
 echo '=== 4. EFI Mount ==='
 mount | grep -i efi
 echo ''
 echo '=== 5. EFI Files ==='
 find /boot/efi -name '*.efi' 2>/dev/null | head -10
 echo ''
 echo '=== 6. Bootupctl Validate ==='
 bootupctl validate 2>&1
 echo ''
 echo '=== 7. EFI Boot Manager ==='
 efibootmgr -v 2>&1 | head -20
 "
Starting pod/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw-debug-wd7rp ...
To use host binaries, run `chroot /host`
======================================
BOOTLOADER DEBUG INFO
======================================

=== 1. Bootupctl Status ===
Available components: BIOS EFI

=== 2. Installed Packages ===
shim-x64-16.1-7.el9.x86_64
grub2-efi-x64-2.06-126.el9_8.x86_64
bootupd-0.2.31-1.el9.x86_64

=== 3. OS Version ===
State: idle
Deployments:
* ostree-unverified-registry:registry.build10.ci.openshift.org/ci-ln-ilvggkk/stable@sha256:a39c18562a72635a9a4fcebd0c2df385426fda2edb0e822abacdb7e224aba730
                  Digest: sha256:a39c18562a72635a9a4fcebd0c2df385426fda2edb0e822abacdb7e224aba730
                 Version: 9.8.20260623-0 (2026-06-25T09:38:58Z)

=== 4. EFI Mount ===
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)

=== 5. EFI Files ===

=== 6. Bootupctl Validate ===
Failed to start transient service unit: Connection reset by peer

=== 7. EFI Boot Manager ===
BootCurrent: 0002
Timeout: 0 seconds
BootOrder: 0002,0001,0000
Boot0000* UiApp	FvVol(7cb8bdc9-f8eb-4f34-aaea-3ee4af6516a1)/FvFile(462caa21-7614-4503-836e-8ab6f4662331)
Boot0001* UEFI nvme_card-pd	PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)N.....YM....R,Y.
Boot0002* redhat	PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(2,GPT,ad03a38f-5285-4a37-a073-178bd70f2695,0x1000,0x3f800)/File(\EFI\redhat\shimx64.efi)

Removing debug pod ...

$ oc debug node/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw -- chroot /host journalctl --since "2 hours ago" | grep -i "bootupctl"
Starting pod/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw-debug-tb9qt ...
To use host binaries, run chroot /host
Jun 26 07:28:35 ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw bootupctl[45881]: Previous EFI: grub2-efi-x64-1:2.06-61.el9_2.2.x86_64,shim-x64-15.6-1.el9.x86_64
Jun 26 07:28:35 ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw bootupctl[45881]: Updated EFI: grub2-efi-x64-1:2.06-105.el9_6.x86_64,shim-x64-15.8-4.el9_3.x86_64
Jun 26 07:28:37 ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw bootupctl[45881]: Adopted and updated: BIOS: grub2-tools-1:2.06-105.el9_6.x86_64

Removing debug pod ...

$ oc debug node/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw -- chroot /host journalctl --since "2 hours ago" | grep -i "runBootloaderUpdate"
Starting pod/ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw-debug-s6284 ...
To use host binaries, run chroot /host
Jun 26 07:28:34 ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw root[45879]: machine-config-daemon[43926]: "runBootloaderUpdate: RHEL 9.6+ detected, attempting node-level bootloader update via bootloader-update.service"
Jun 26 07:28:37 ci-op-7tdfsm8l-56d51-hlmkh-worker-a-test-9g4kw root[45960]: machine-config-daemon[43926]: "runBootloaderUpdate: node-level bootloader-update.service succeeded"

Removing debug pod ...


/label qe-approved
/verified by @ptalgulk01 

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Comment thread pkg/daemon/bootupd.go
if major > 9 || (major == 9 && minor >= 6) {
logSystem("runBootloaderUpdate: RHEL 9.6+ detected, attempting node-level bootloader update via bootloader-update.service")
var err error
if err = runCmdSync("systemctl", "start", "bootloader-update.service"); err == nil {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will essentially never fail as the return code here is not about the process launched but whether or not systemd successfully launched the service.

@travier travier Jun 26, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead I think you need to start the unit like you do here, then wait until it completes (or maybe use the --wait option?) and then check it's status again. If anything failed, it should show up in the status.

@djoshy djoshy Jun 26, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially tried with --wait but because the service sets RemainAfterExit=yes; it blocks forever. I thought it being a oneshot unit will cause it to return the correct error code?

@travier travier Jun 26, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ cat ~/.config/systemd/user/test.service
[Service]
Type=oneshot
ExecStart=bash -c "sleep 2; exit 1"
RemainAfterExit=yes
# Keep this stuff in sync with SYSTEMD_ARGS_BOOTUPD in general
PrivateNetwork=yes
ProtectHome=yes
KillMode=mixed
MountFlags=slave
$ systemctl --user daemon-reload
$ systemctl --user start test.service
$ echo $?
0

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, hold on.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, never mind, it does fail as expected. I had a previous remaining instance so systemd was not starting it again.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe this should be restart instead and then we can add the --wait back. But not sure if that would matter in this case. And if we have to restart it then it means that it already ran, so there should be nothing to do. So maybe we leave this as is.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's leave this as is.

@travier

travier commented Jun 26, 2026

Copy link
Copy Markdown
Member

echo '=== 1. Bootupctl Status ==='
bootupctl status

=== 1. Bootupctl Status ===
Available components: BIOS EFI

That looks weird. There should be more output like the versions here.

echo '=== 4. EFI Mount ==='
mount | grep -i efi
echo ''
echo '=== 5. EFI Files ==='
find /boot/efi -name '*.efi' 2>/dev/null | head -10

=== 4. EFI Mount ===
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)

=== 5. EFI Files ===

The ESP is not mounted by default in RHCOS. The following should do it without changing the mount points in the main namespace:

unshare -m
mount /dev/... /boot/efi

echo '=== 6. Bootupctl Validate ==='
bootupctl validate 2>&1

=== 6. Bootupctl Validate ===
Failed to start transient service unit: Connection reset by peer

This is weird.

echo '=== 7. EFI Boot Manager ==='
efibootmgr -v 2>&1 | head -20

Should not matter here.

The shim version check (shimIsSafe) looked at the RPM database in /usr,
which reflects the system image. For nodes born from older releases, the
shim in /usr is up to date but the ESP still contains old bootloader
binaries, causing the workaround to be incorrectly skipped.

Replace the shim version check with a node-level bootupctl update on
RHEL 9.6+ before falling back to the container-based approach.

Signed-off-by: Pablo Rodriguez Nava <git@amail.pablintino.eu>
@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Jun 26, 2026
@travier

travier commented Jun 26, 2026

Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 26, 2026
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-ovn
/test e2e-aws-ovn-upgrade
/test e2e-gcp-op-ocl-part1
/test e2e-gcp-op-ocl-part2
/test e2e-gcp-op-part1
/test e2e-gcp-op-part2
/test e2e-gcp-op-single-node
/test e2e-hypershift

@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pablintino, travier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@djoshy

djoshy commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

/retest-required

/verified by @ptalgulk01

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 26, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@djoshy: This PR has been marked as verified by @ptalgulk01.

Details

In response to this:

/retest-required

/verified by @ptalgulk01

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy

djoshy commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

/retest-required

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD c3ce5f2 and 2 for PR HEAD 3a9d76e in total

@djoshy

djoshy commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

/retest-required

@openshift-ci

openshift-ci Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

@pablintino: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 0e0f89b into openshift:main Jun 27, 2026
17 checks passed
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@pablintino: Jira Issue Verification Checks: Jira Issue OCPBUGS-88333
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-88333 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Details

In response to this:

Closes: #OCPBUGS-88333

- What I did

The shim version check (shimIsSafe) looked at the RPM database in /usr, which reflects the system image. For nodes born from older releases, the shim in /usr is up to date but the ESP still contains old bootloader binaries, causing the workaround to be incorrectly skipped.

Replace the shim version check with a node-level bootupctl update on RHEL 9.6+ before falling back to the container-based approach.

- How to verify it

  1. Setup a Secure Boot enabled 4.15 cluster and update it to 4.20 or setup a 4.20 cluster and manually rollback the shim & GRUB binaries in the ESP to the ones from 4.15 or earlier.
  2. Trigger the update to 4.22

- Description for the changelog

Fix bootloader update being skipped on nodes with stale bootloader binaries

Summary by CodeRabbit

Release Notes

  • New Features

  • Bootloader updates now use smarter decision logic based on CoreOS variant, supported CPU architecture, and Secure Boot status.

  • For RHEL 9.6+, the updater attempts a node-level update first, then falls back to the container-based approach (and ultimately to a direct EFI copy if needed).

  • Bug Fixes

  • Improved OS major-version handling when selecting the appropriate update binary.

  • Enhanced OS version parsing by providing numeric major/minor components.

  • Tests

  • Expanded OS release parsing tests to validate major/minor extraction.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy

djoshy commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

/cherry-pick release-4.22

@openshift-cherrypick-robot

Copy link
Copy Markdown

@djoshy: new pull request created: #6239

Details

In response to this:

/cherry-pick release-4.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-robot

Copy link
Copy Markdown
Contributor

Fix included in release 5.0.0-0.nightly-2026-06-28-044905

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants