Skip to content

feat: add Rocky Linux 10 GRUB-EFI support and RHEL-family improvements#90

Closed
Raboo wants to merge 20 commits into
linka-cloud:mainfrom
Raboo:feat/rhel10-grub-support
Closed

feat: add Rocky Linux 10 GRUB-EFI support and RHEL-family improvements#90
Raboo wants to merge 20 commits into
linka-cloud:mainfrom
Raboo:feat/rhel10-grub-support

Conversation

@Raboo
Copy link
Copy Markdown

@Raboo Raboo commented Mar 31, 2026

Summary

  • Fix grub2-install for RHEL-family distros: Extended newGrubCommon() to use grub2 commands for all RHEL-family distros: CentOS, Rocky, AlmaLinux, and RHEL. Previously only CentOS was handled.
  • Enable grub and grub-efi bootloaders for RHEL-family: Removed the blocks that prevented Rocky, AlmaLinux, and CentOS from using these bootloaders.
  • Add --force flag for EFI install: Required for grub2-install on RHEL-family EFI platforms when running in a chroot/offline environment without EFI variables.
  • Adjust GRUB config template: Set GRUB_TIMEOUT=1 to allow editing boot entries, cleared GRUB_CMDLINE_LINUX (root passed via cmdline args).
  • Add filesystem labels: Root filesystem labeled rootfs, boot partition labeled boot (or name boot for FAT32) for portable GRUB search commands.
  • Add SELinux autorelabel trigger: Creates /.autorelabel during image build so the first boot triggers a full filesystem relabel, required because SELinux contexts from Docker builds don't match the policy loaded at boot time.
  • Kernel cmdline: Config.Cmdline() now uses isRhelFamily() to choose the correct format — RHEL-family distros omit the ro initrd=... prefix.
  • Split-boot loader entries: fixLoaderEntries() strips /boot/ prefix from paths in /boot/loader/entries/ files for split-boot setups.
  • E2E tests: Enabled EFI testing for RHEL-family distros (CentOS, Rocky, AlmaLinux) and added more Rocky Linux images.
  • Example: Added examples/rocky.Dockerfile for Rocky Linux 10.

Testing

Successfully converted a Rocky Linux 10.1 Docker image to qcow2 with --bootloader=grub-efi --split-boot --boot-fs=fat32:

$ sudo ./d2vm convert rocky10-test:latest -o /tmp/rocky10-grub-efi.qcow2 --bootloader=grub-efi --split-boot --boot-fs=fat32 --keep-cache
...
Setting up grub-efi bootloader
Converting to qcow2

Output: A qcow2 image, boots successfully.

Files Changed

File Change
grub_common.go Add isRhelFamily() helper, extend grub2 support, add SELinux autorelabel, adjust grubCfg
grub.go Add --force flag for EFI install, remove RHEL-family block
grub_efi.go Add --force flag for EFI install, remove RHEL-family block
builder.go Add filesystem labels, pass OSRelease to Cmdline(), add fixLoaderEntries()
config.go Pass OSRelease to Cmdline() for RHEL-family kernel cmdline format
e2e/e2e_test.go Enable EFI testing for RHEL-family, add more Rocky Linux images
examples/rocky.Dockerfile New example for Rocky Linux 10

This PR was coded with OpenCode using model Qwen3.6-35-A3B.

@Raboo Raboo mentioned this pull request Mar 31, 2026
@Raboo Raboo marked this pull request as ready for review April 7, 2026 14:08
@Raboo
Copy link
Copy Markdown
Author

Raboo commented Apr 7, 2026

I have a working image that boots Rocky 10.1 using uefi, I haven't tested secure boot, but it ought to work.
fixes #74

Raboo added 10 commits April 7, 2026 16:24
…dling

- Add isRhelFamily() helper to centralize RHEL-family distro detection
- Use grub2-install/grub2-mkconfig for all RHEL-family distros (CentOS, Rocky, AlmaLinux, RHEL)
- Enable grub and grub-efi bootloaders for Rocky, AlmaLinux, and CentOS
- Copy distro-specific EFI binary to removable boot path for RHEL-family EFI support
- Consolidate SupportsLUKS() switch cases for RHEL-family distros
grub2-install on RHEL-family EFI platforms requires --force when
running in a chroot/offline environment without EFI variables.
This is standard for VM image building scenarios.
- Add templates/rocky.Dockerfile for Rocky/AlmaLinux image builds
- Add examples/rocky.Dockerfile following existing example patterns
- Route Rocky and AlmaLinux to use the new rocky template
- Keep CentOS using its own centos.Dockerfile template
network-scripts were removed in RHEL 9+. Rocky Linux 10 uses
NetworkManager exclusively, which handles DHCP on eth0 by default.
The --removable flag passed to grub2-install already places the EFI
binary at the standard removable boot path. The copyEfiBinary function
was redundant and always logged a warning about not finding the source
file. Remove it along with unused imports.
- Add filesystem labels (rootfs, boot) to mkfs commands in builder
- Add RHEL-specific grub config template with GRUB_ENABLE_BLSCFG=false
  and GRUB_DISABLE_LINUX_UUID=true to prevent duplicate root/ro/initrd
- Write /etc/fstab with LABEL-based entries so grub2-mkconfig correctly
  detects separate /boot partition and generates relative paths
- Set root=LABEL=rootfs in GRUB_CMDLINE_LINUX for portable root device
…kconfig

grub2-mkconfig in a chroot environment leaks the host's /proc/cmdline
(loop device paths) and produces duplicate root=, ro, and initrd= entries.
Replace it with a custom grub.cfg generator for RHEL-family that:
- Uses label-based boot partition lookup (search --label boot)
- Uses label-based root device (root=LABEL=rootfs)
- Generates correct relative paths for kernel/initrd (no /boot prefix)
- Produces a single clean menuentry with no duplicates

Also add SplitBoot and BootFS fields to Config struct so grubCommon
can access them for template generation.
Since we now generate grub.cfg directly for RHEL-family instead of
using grub2-mkconfig, the RHEL-specific /etc/default/grub template
is no longer needed. Simplify prepare() to use the standard template
for all distros.
- Set timeout=5 so users can interrupt boot to edit grub entries
- Remove load_video which fails with 'can't find command' error in
  minimal grub environment and is unnecessary for serial console VMs
Create /.autorelabel during image build so the first boot triggers
a full filesystem relabel. This is required because SELinux contexts
from the Docker build don't match the policy loaded at boot time.
@Raboo Raboo force-pushed the feat/rhel10-grub-support branch from 0c9be91 to f27c011 Compare April 7, 2026 14:25
@Raboo
Copy link
Copy Markdown
Author

Raboo commented Apr 22, 2026

@Adphi Hi, when do you think you might have some time over to review my PR?

@Raboo Raboo changed the title feat: add Rocky Linux 10 GRUB support and RHEL-family improvements feat: add Rocky Linux 10 GRUB-EFI support and RHEL-family improvements Apr 22, 2026
Copy link
Copy Markdown
Member

@Adphi Adphi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the Pull Request 😀 !

The e2e tests need to cover this changes.
Maybe remove:

d2vm/e2e/e2e_test.go

Lines 123 to 125 in 95633d3

if (strings.Contains(img.name, "centos") || strings.Contains(img.name, "almalinux") || strings.Contains(img.name, "rocky")) && tt.efi {
t.Skip("efi not supported for CentOS")
}

Comment thread config.go Outdated
Comment thread builder.go Outdated
Comment thread templates/rocky.Dockerfile Outdated
Comment thread dockerfile.go Outdated
Comment thread dockerfile.go Outdated
Comment thread dockerfile.go Outdated
Comment thread grub_common.go Outdated
Comment thread os_release.go
Raboo added 9 commits April 24, 2026 15:13
These fields were assigned but never read. The builder struct has its
own splitBoot and bootFS fields that are used throughout the codebase.
…elease.go

This change was out of scope for this PR.
Remove the skip for centos/almalinux/rocky EFI tests now that
grub2-install with --force works in chroot environments.
Remove the dedicated rocky.Dockerfile template and instead use the
centOS template for Rocky Linux and AlmaLinux releases. This reduces
template duplication as the base package installations and configurations
are now unified across these RHEL-family distributions.
Instead of adding a field to Config, Cmdline now takes OSRelease and
calls isRhelFamily() to choose the correct kernel cmdline format.
RHEL-family distros omit 'ro initrd=...' prefix; others keep it.
Include Docker Inc's official images and Rocky Linux Project official
images for versions 9 and 10.
- Remove custom mkconfigRhel, fall back to grub2-mkconfig
- Set GRUB_TIMEOUT=1 to allow editing boot entries
- Remove root=LABEL=rootfs from GRUB_CMDLINE_LINUX (passed via cmdline)
For split-boot setups, strip /boot/ prefix from paths in
/boot/loader/entries/ files since the boot partition is mounted at /
at runtime.
@Raboo
Copy link
Copy Markdown
Author

Raboo commented Apr 28, 2026

Ok, I have resolved all comments. Please have another look now.
It's so many commits, so you should probably squash and merge, when it's merging time.

@Adphi
Copy link
Copy Markdown
Member

Adphi commented Apr 29, 2026

I'm going to close that one in favor of #91.

Thanks @Raboo for your work on this pull request !

@Adphi Adphi closed this Apr 29, 2026
@Raboo
Copy link
Copy Markdown
Author

Raboo commented May 6, 2026

@Adphi Ok fair enough. But your PR doesn't fix some of the "quality" issues. I tried latest master and it still produces duplicate initrd settings (even if the initrd on the kernel line is ignored, it shouldn't be there, it's also referring to a file that doesn't exist), duplicate "ro", duplicate "root" disk entries. It works, image is bootable, but I feel that quality could be better.

Regarding the grub defaults, I assume that disabling the OS prober fixed the wrong path issue (/boot/). And disabling BLSCFG, solves the GRUB_DEFAULT=0 always pointing to the latest image. It works, but it also changes the OS defaults more than my approach.
Perhaps OS prober should be disabled during image build. But reset after grub config has been built? I don't know for the non-rhel distros, but the default for rhel is "GRUB_DEFAULT=saved".

I also have opinions about making grub menu hidden and with no timeout. Makes it hard to fix any boot issues that might occur or booting into recovery mode. Same here, RHEL default (pretty sure ubuntu as well) is to show it and have a timeout.

If it were my choice, I would opt to keep as much as the OS default behaviors as much as possible.

Do you have plans to address these stuff? Do you want me to submit smaller PR's to fix these issues? Or do we let it stay as it is?

@Adphi
Copy link
Copy Markdown
Member

Adphi commented May 6, 2026

@Raboo I don't plan to address those.

For d2vm-generated images, we don't really care about preserving RHEL defaults for boot-related behavior. These images are intended to be minimal immutable artifacts, not general-purpose installed systems that users maintain through GRUB, recover through the GRUB menu, or expect to behave exactly like a normal RHEL installation.

The priority here is that the generated VM boots reliably and as fast as possible. So keeping the GRUB menu visible, preserving GRUB_DEFAULT=saved, or matching the distro's default timeout behavior are not goals for d2vm images. In particular, GRUB_TIMEOUT=0 is intentional: GRUB boots the default entry immediately.

GRUB_DISABLE_OS_PROBER=true is not what fixes the /boot/ path issue. OS prober is about scanning other disks/partitions for installed operating systems and generating menu entries for them. In the d2vm build context that is not only unnecessary, it is actively undesirable: it can see block devices exposed from the host/build environment (through the /dev mount) and potentially leak host kernels or other boot entries into the generated image's GRUB config.

So disabling OS prober is a build isolation measure. It prevents unrelated host/build-environment OS entries from being discovered. It does not rewrite kernel or initrd paths.

The relevant change in #91 is GRUB_ENABLE_BLSCFG=false. On RHEL-family systems, BLS causes GRUB to generate boot menu entries dynamically from BLS snippets at boot. Those snippets live under the boot partition's loader/entries layout. With split /boot, that is where path assumptions can get confusing.

Your PR tried to address that by rewriting /boot/loader/entries contents and changing path references from /boot/... to /.... That is a BLS-specific workaround. #91 instead avoids relying on BLS entries for this case. It sets GRUB_ENABLE_BLSCFG=false, so grub2-mkconfig generates classic grub.cfg menu entries, and the template change makes the kernel/initramfs available in the classic /boot/vmlinuz-* and /boot/initramfs-*.img layout. So the merged fix is not "OS prober fixes /boot paths"; it is "do not use BLS for this generated image path, and generate a simple working classic GRUB config instead."

Regarding the duplicate initrd, ro, and root entries: I agree it is not pretty. But if the image boots as-is, I don't consider that a blocker or a quality issue worth complicating the builder for right now.

So I would rather keep the current behavior unless there is a concrete boot failure caused by it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants