OCPEDGE-2498: Add migration support by eggfoobar · Pull Request #83 · openshift-eng/two-node-toolbox

eggfoobar · 2026-06-26T06:20:33Z

Summary by CodeRabbit

New Features
- Added support for transitioning a single-node cluster to a 3-node high-availability setup.
- Added new command options for SNO deployment and cleanup in the help output.
- Added a guided cleanup flow for reverting mutable topology changes.
Bug Fixes
- Improved retry handling during project setup to better tolerate temporary startup delays.
- Expanded topology validation and cluster readiness checks for SNO-based workflows.
- Added DNS, node, and storage cleanup steps to keep cluster state consistent after changes.

Add a new Ansible role and make target to perform a one-shot migration of a platform:None SNO cluster to a 3-node HA cluster. The role creates two new control-plane VMs, boots them with RHCOS via coreos-installer ISO ignition embed, waits for nodes to join and etcd to scale, then transitions the topology to HighlyAvailable. Usage: make sno-to-3node Co-Authored-By: Claude <noreply@anthropic.com>

Signed-off-by: ehila <ehila@redhat.com>

openshift-ci-robot · 2026-06-26T06:20:36Z

@eggfoobar: This pull request references OCPEDGE-2498 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-06-26T06:20:42Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eggfoobar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [eggfoobar]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-06-26T06:20:46Z

Walkthrough

Adds SNO deployment and SNO-to-3node transition entrypoints, install-dev SNO support, role defaults and ignition generation, preflight checks, VM provisioning and boot tasks, topology and rollout waits, DNS refresh, verification, and cleanup.

Changes

SNO-to-3node workflow

Layer / File(s)	Summary
Command entrypoints and playbooks `deploy/Makefile`, `deploy/openshift-clusters/scripts/.sh`, `deploy/openshift-clusters/.yml`	Adds Make targets, shell wrappers, and playbooks for the SNO transition and cleanup flows.
Install-dev SNO support `deploy/openshift-clusters/roles/dev-scripts/install-dev/files/*`, `deploy/openshift-clusters/roles/dev-scripts/install-dev/handlers/main.yml`, `deploy/openshift-clusters/roles/dev-scripts/install-dev/tasks/config.yml`, `deploy/openshift-clusters/roles/dev-scripts/install-dev/vars/main.yml`	Adds SNO install config examples, ignores `config_sno.sh`, retries `oc project`, and extends topology validation.
Role variables and ignition assets `deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/defaults/main.yml`, `deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/vars/main.yml`, `deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/templates/*.j2`, `deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/ignition.yml`	Defines SNO defaults, role variables, Ignition templates, and the MCS CA fetch task.
Preflight checks `deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/preflight.yml`	Queries cluster state, node readiness, etcd, cluster operators, and api-int DNS before the transition.
VM creation `deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/create-vms.yml`	Removes stale libvirt state, creates VM disks, defines new VMs, and adds DHCP and DNS reservations.
VM boot and install `deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/boot-nodes.yml`	Discovers the RHCOS live ISO, embeds ignition into per-node ISOs, boots the VMs, waits for install completion, and returns them to disk boot.
Topology transition and rollout waits `deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/topology.yml`, `deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-nodes.yml`, `deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-etcd.yml`, `deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/mco-rollout.yml`	Scales CVO down, patches the infrastructure topology, and waits for node, etcd, and MCO status changes.
DNS refresh and verification `deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/update-dns.yml`, `deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/verify.yml`	Rewrites dnsmasq, updates `/etc/hosts`, and captures post-transition node, topology, etcd, CVO, and MCP state.
Cleanup tasks `deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/clean.yml`	Destroys the new VMs and removes their disks, NVRAM, ISOs, DHCP reservations, DNS entries, and leases.

Sequence Diagram(s)

sequenceDiagram
  participant Makefile
  participant sno_to_3node_sh as "sno-to-3node.sh"
  participant sno_to_3node_yml as "sno-to-3node.yml"
  participant mutable_topology_role as "mutable-topology/sno-to-3node"
  participant oc
  participant virsh

  Makefile->>sno_to_3node_sh: runs sno-to-3node target
  sno_to_3node_sh->>sno_to_3node_yml: ansible-playbook sno-to-3node.yml
  sno_to_3node_yml->>mutable_topology_role: import_tasks phases
  mutable_topology_role->>oc: preflight, topology patch, CSR approval, and verification
  mutable_topology_role->>virsh: create VMs, boot disks, and update libvirt DNS
  mutable_topology_role->>oc: wait for nodes, etcd, MCO rollout, and final cluster checks

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Suggested reviewers

slintes
fonta-rh
jaypoulz

Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

Ignore

❌ Failed checks (1 error)

Check name	Status	Explanation	Resolution
No-Sensitive-Data-In-Logs	❌ Error	New Ansible debug/echo output prints cluster domains, node IPs, hostnames, and api-int DNS targets without no_log protection.	Remove or redact those messages, or gate them behind no_log/verbose-only output; avoid logging cluster domains, IPs, CSR names, and internal DNS records.

✅ Passed checks (10 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title is concise and matches the main change: adding migration support for the SNO-to-3-node flow.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
No-Weak-Crypto	✅ Passed	No MD5/SHA1/DES/RC4/3DES/Blowfish/ECB or secret/token comparisons were found in the changed files.
Container-Privileges	✅ Passed	No changed container/K8s manifests contain privileged, hostPID/Network/IPC, allowPrivilegeEscalation, SYS_ADMIN, or root-run settings.
No-Hardcoded-Secrets	✅ Passed	No hardcoded secrets found in touched files; scans found no literal tokens/passwords/private keys or embedded-credential URLs, and only a CI_TOKEN comment reference.
No-Injection-Vectors	✅ Passed	No flagged injection APIs or patterns appear in the changed files; the new shell blocks only interpolate fixed role-owned vars and hardcoded data.
Ai-Attribution	✅ Passed	No AI tools are mentioned in the commit/PR text, and the commit only has a Signed-off-by trailer—no Co-Authored-By, Assisted-by, or Generated-by found.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

deploy/Makefile (1)
122-130: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Document the new sno-ipi and sno-agent targets in make help.

Lines 79-83 add both commands, but this help block only advertises sno-to-3node. make help will miss the new entrypoints entirely.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/Makefile` around lines 122 - 130, Update the Makefile help output so
the new `sno-ipi` and `sno-agent` targets are listed alongside the existing
deployment entries in `make help`. Add matching `@echo` lines in the help block
near the `sno-to-3node` entry, using the same wording/style as the other target
descriptions so the new entrypoints are discoverable.
deploy/openshift-clusters/scripts/deploy-cluster.sh (1)
62-69: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Reject the unsupported sno + kcli combination explicitly.

This validation now allows --topology sno for every method, so a direct deploy-cluster.sh --topology sno --method kcli falls through to kcli-install.yml. The PR only adds SNO support through the install-dev/setup flow, so this opens an unsupported path instead of failing fast.
Suggested fix
 if [[ "${METHOD}" != "ipi" && "${METHOD}" != "agent" && "${METHOD}" != "kcli" ]]; then
     echo "Error: Invalid method '${METHOD}'. Must be 'ipi', 'agent', or 'kcli'."
     exit 1
 fi
+
+if [[ "${TOPOLOGY}" == "sno" && "${METHOD}" == "kcli" ]]; then
+    echo "Error: Topology 'sno' only supports 'ipi' or 'agent'."
+    exit 1
+fi
Also applies to: 87-104
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/scripts/deploy-cluster.sh` around lines 62 - 69,
The validation in deploy-cluster.sh currently permits every METHOD with
TOPOLOGY=sno, which allows an unsupported sno + kcli path to continue into the
kcli install flow. Update the argument checks around the TOPOLOGY/METHOD
validation blocks to explicitly reject the unsupported combination before any
install logic runs, and keep the failure consistent with the existing validation
messages. Reference the validation section and the kcli-install path so the
guard is placed where the topology and method are first checked.
deploy/openshift-clusters/roles/dev-scripts/install-dev/tasks/config.yml (1)
26-32: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Include the SNO scenario in the missing-variable error.

Line 31 still only tells agent users to add TNF_IPV4 or TNA_IPV4. Now that sno is a supported prefix on Line 43, a missing AGENT_E2E_TEST_SCENARIO for SNO will produce the wrong remediation.
Proposed fix
-      AGENT_E2E_TEST_SCENARIO="TNF_IPV4" (fencing) or "TNA_IPV4" (arbiter) to
-      the config file.
+      AGENT_E2E_TEST_SCENARIO="TNF_IPV4" (fencing), "TNA_IPV4" (arbiter), or
+      "SNO_IPV4" (sno) to the config file.
Also applies to: 43-43
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/roles/dev-scripts/install-dev/tasks/config.yml`
around lines 26 - 32, The missing-AGENT_E2E_TEST_SCENARIO failure message in the
config validation task should be updated to include the SNO option now supported
by the method-prefix logic. Adjust the fail message in the config validation
flow so it lists the correct remediation for all supported prefixes, including
the new sno case alongside the existing agent scenarios, and make sure the
guidance matches the prefix handling used in the task that selects by method.

🧹 Nitpick comments (2)

deploy/openshift-clusters/scripts/sno-to-3node.sh (1)
1-2: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use the repository-standard Bash shebang.

This new wrapper still uses #!/bin/bash, but the repo shell-script rule requires #!/usr/bin/bash. As per coding guidelines, "Shell scripts should use #!/usr/bin/bash shebang".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/scripts/sno-to-3node.sh` around lines 1 - 2, The
wrapper script currently uses the non-standard Bash shebang, so update the
script header to match the repository convention by changing the shebang at the
top of sno-to-3node.sh to the repo-standard Bash path. Keep the rest of the
script unchanged, and verify the file still starts with the expected shebang
followed by set -euo pipefail.
Source: Coding guidelines
deploy/openshift-clusters/scripts/clean-mutable-topology.sh (1)
1-2: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use the repository-standard Bash shebang.

This new wrapper still uses #!/bin/bash, but the repo shell-script rule requires #!/usr/bin/bash. As per coding guidelines, "Shell scripts should use #!/usr/bin/bash shebang".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/scripts/clean-mutable-topology.sh` around lines 1 -
2, The clean-mutable-topology.sh wrapper uses the wrong Bash shebang, so update
the script header to match the repository standard. Replace the current shebang
at the top of the file with the repo-required Bash interpreter line, keeping the
rest of the script unchanged; this is the only fix needed in the script’s
startup header.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deploy/openshift-clusters/clean-mutable-topology.yml`:
- Around line 35-39: The cleanup playbook currently imports
mutable-topology/sno-to-3node tasks/clean.yml unconditionally, which can destroy
a healthy HA control plane. Add a topology check in clean-mutable-topology.yml
before the import_role so it only runs when the cluster is not already HA, or
require an explicit force_cleanup=true override to proceed. Use the existing
import_role task as the entry point and gate it with a condition or precheck
that prevents destructive cleanup unless the override is set.

In
`@deploy/openshift-clusters/roles/dev-scripts/install-dev/files/config_sno_example.sh`:
- Around line 35-40: This example hardcodes an x86_64 release image and leaves
out the ARM64-specific Metal3 overrides, so update the config example to
explicitly document how to switch to aarch64/Graviton. Use
OPENSHIFT_RELEASE_IMAGE as the payload architecture selector and add the
required Metal3 container override guidance (IRONIC_IMAGE, VBMC_IMAGE,
SUSHY_TOOLS_IMAGE) in the example so the install path is correct for ARM64
hypervisors. Refer to the config_sno_example.sh example block and keep the
existing installer/image-policy comments consistent with the new ARM64 notes.

In
`@deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/clean.yml`:
- Around line 68-72: The cleanup task in clean.yml is deleting the original SNO
node’s api-int DNS entry, which can break the surviving cluster after cleanup.
Update the [clean] Remove api-int from libvirt network DNS task to target only
mutable-topology-added resources associated with sno_new_nodes, and avoid
removing the api-int record for sno_master0_ip in the libvirt network update
logic.
- Around line 37-43: The embedded Python in the cleanup task scripts is
indented, which causes the inline `python3 -c` snippets used in `clean.yml` to
fail before the DHCP cleanup logic runs. Update the scripts in the task blocks
that use `EXISTING_MAC` and the lease cleanup command so the Python content is
left-aligned or otherwise dedented, and keep the logic under the same task names
so `|| true` / `failed_when: false` no longer hides a parse failure.
- Around line 26-30: The per-node ISO cleanup in the [clean] task is using a
hostname-based filename that can collide across clusters on the same hypervisor.
Update the ISO naming scheme in the related ISO creation and cleanup flow to use
a cluster-scoped identifier, such as item.name or a sno_cluster_name-prefixed
name, and make the file path in the cleanup task match that same unique pattern.

In
`@deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/create-vms.yml`:
- Around line 27-32: The DNS cleanup in the `create-vms.yml` task is still
targeting `{{ item.ip }}`, so stale `dns-host` records for `master-1`/`master-2`
with old IPs are not removed before `virsh net-update add dns-host` runs. Update
the cleanup logic around the `virsh net-update` calls to delete entries by
hostname instead of by IP, and then tighten the `dig` verification so it checks
that the resolved answers actually contain `{{ item.ip }}` rather than only a
non-empty response.

In
`@deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/ignition.yml`:
- Around line 19-24: The MCS reachability check in the ignition task is
currently allowed to fail and continue, which defers a required configuration
error until boot. Update the existing "[ignition] Verify MCS reachability" step
so it aborts when the curl probe to the MCS health endpoint fails, and only keep
a non-failing path if you explicitly add a break-glass override. Use the
existing mcs_health probe logic as the place to enforce fail-fast behavior.
- Around line 13-18: The ignition generation step writes /tmp/master.ign but
never populates sno_master_ign_b64, so auto-install.ign.j2 cannot render
reliably. Update the relevant task flow in ignition.yml to read back the
generated master.ign and set sno_master_ign_b64 before the template that
consumes it, using the existing ignition generation and auto-install rendering
steps as the anchor points. Ensure the variable is produced within the role
rather than assuming an external caller supplies it.

In
`@deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/mco-rollout.yml`:
- Around line 68-81: The MCO rollout polling in the mco-rollout task currently
suppresses timeout failures by using failed_when: false, which allows the play
to continue even if mcp_poll never reports completion. Update the
polling/verification logic in the mcp_poll task so a timeout is fatal by
default, or make the non-fatal continuation explicitly opt-in via a best-effort
mode. Keep the success check based on mcp_poll.stdout and ensure the final MCO
rollout result step reflects failure when the rollout does not complete.

In
`@deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/preflight.yml`:
- Around line 80-120: The preflight checks in the etcd probe and the
cluster-operator health check are only logging warnings instead of stopping the
topology transition, which makes this phase non-gating. Update the preflight
logic so the tasks for the etcd pod check and the degraded CO check fail the
play by default, removing the unconditional `|| true` and the `failed_when:
false` behavior; keep any bypass only behind an explicit break-glass override.
Use the existing task names like "[preflight] Check etcd pods" and "[preflight]
Check for degraded cluster operators" to locate the affected steps.
- Around line 129-154: The preflight repair flow for api-int only deletes
entries that already match sno_master0_ip and the verification step only checks
for any DNS answer, so stale mappings can remain undetected. Update the shell
logic in the api-int repair task to remove existing api-int entries by
hostname/FQDN regardless of IP, then make the dns_apiint_verify check confirm
the resolved result includes sno_master0_ip rather than just being non-empty.
Use the existing preflight tasks and variables like sno_libvirt_network,
sno_master0_ip, and sno_cluster_domain to locate and adjust the repair and
verification steps.

In
`@deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/update-dns.yml`:
- Around line 24-31: The dnsmasq config in the update-dns task is incorrectly
pinning the wildcard apps record to sno_all_node_ips[0], which can vary because
node ordering is not stable. Update the logic in the sno_dnsmasq_lines
generation to use a stable ingress target such as a VIP/load balancer, or
otherwise emit all intended ingress node IPs instead of selecting the first node
IP. Make the change in the set_fact block that builds the DNS entries so the
*.apps record does not depend on array order.

In
`@deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-etcd.yml`:
- Around line 28-48: The wait-etcd task only checks for 3 Ready pods, while the
etcd membership and endpoint health probes in the same flow still suppress
failures via failed_when: false and fallback echo text. Update the wait logic in
wait-etcd.yml so the shell checks using oc exec and etcdctl member list /
endpoint health are allowed to fail the task, or fold their success into the
retry condition, so the role does not proceed to topology.yml until etcd quorum
and health are actually good.

In
`@deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-nodes.yml`:
- Around line 4-21: The CSR approval logic in the wait-nodes task is too broad
because it approves every pending request returned by oc get csr. Update the
pending-CSR collection and approval loop in the shell block so it only selects
kubelet client/server CSRs associated with sno_new_nodes, and then approve only
those entries before calling oc adm certificate approve. Use the existing
PENDING/CSR handling in wait-nodes.yml as the place to narrow the filter.

In `@deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/vars/main.yml`:
- Around line 2-5: The kubeconfig path in sno_kubeconfig_resolved is incorrectly
derived from the default sno_cluster_name, so it can point to the wrong cluster
before discovery happens. Update the mutable-topology SNO-to-3node vars so
sno_cluster_name is not assumed here; instead require sno_cluster_name or
sno_kubeconfig to be provided up front, or change the resolution logic to use a
source that does not depend on the default value. Keep the fix localized to the
sno_kubeconfig_resolved expression and related variable handling.

---

Outside diff comments:
In `@deploy/Makefile`:
- Around line 122-130: Update the Makefile help output so the new `sno-ipi` and
`sno-agent` targets are listed alongside the existing deployment entries in
`make help`. Add matching `@echo` lines in the help block near the
`sno-to-3node` entry, using the same wording/style as the other target
descriptions so the new entrypoints are discoverable.

In `@deploy/openshift-clusters/roles/dev-scripts/install-dev/tasks/config.yml`:
- Around line 26-32: The missing-AGENT_E2E_TEST_SCENARIO failure message in the
config validation task should be updated to include the SNO option now supported
by the method-prefix logic. Adjust the fail message in the config validation
flow so it lists the correct remediation for all supported prefixes, including
the new sno case alongside the existing agent scenarios, and make sure the
guidance matches the prefix handling used in the task that selects by method.

In `@deploy/openshift-clusters/scripts/deploy-cluster.sh`:
- Around line 62-69: The validation in deploy-cluster.sh currently permits every
METHOD with TOPOLOGY=sno, which allows an unsupported sno + kcli path to
continue into the kcli install flow. Update the argument checks around the
TOPOLOGY/METHOD validation blocks to explicitly reject the unsupported
combination before any install logic runs, and keep the failure consistent with
the existing validation messages. Reference the validation section and the
kcli-install path so the guard is placed where the topology and method are first
checked.

---

Nitpick comments:
In `@deploy/openshift-clusters/scripts/clean-mutable-topology.sh`:
- Around line 1-2: The clean-mutable-topology.sh wrapper uses the wrong Bash
shebang, so update the script header to match the repository standard. Replace
the current shebang at the top of the file with the repo-required Bash
interpreter line, keeping the rest of the script unchanged; this is the only fix
needed in the script’s startup header.

In `@deploy/openshift-clusters/scripts/sno-to-3node.sh`:
- Around line 1-2: The wrapper script currently uses the non-standard Bash
shebang, so update the script header to match the repository convention by
changing the shebang at the top of sno-to-3node.sh to the repo-standard Bash
path. Keep the rest of the script unchanged, and verify the file still starts
with the expected shebang followed by set -euo pipefail.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d1bb47f1-0e94-4c52-bf4c-ea4c9efe1020

📥 Commits

Reviewing files that changed from the base of the PR and between 903e071 and 497dfcc.

📒 Files selected for processing (27)

deploy/Makefile
deploy/openshift-clusters/clean-mutable-topology.yml
deploy/openshift-clusters/roles/dev-scripts/install-dev/files/.gitignore
deploy/openshift-clusters/roles/dev-scripts/install-dev/files/config_sno_example.sh
deploy/openshift-clusters/roles/dev-scripts/install-dev/handlers/main.yml
deploy/openshift-clusters/roles/dev-scripts/install-dev/tasks/config.yml
deploy/openshift-clusters/roles/dev-scripts/install-dev/vars/main.yml
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/defaults/main.yml
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/boot-nodes.yml
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/clean.yml
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/create-vms.yml
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/ignition.yml
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/main.yml
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/mco-rollout.yml
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/preflight.yml
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/topology.yml
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/update-dns.yml
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/verify.yml
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-etcd.yml
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-nodes.yml
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/templates/auto-install.ign.j2
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/templates/master.ign.j2
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/vars/main.yml
deploy/openshift-clusters/scripts/clean-mutable-topology.sh
deploy/openshift-clusters/scripts/deploy-cluster.sh
deploy/openshift-clusters/scripts/sno-to-3node.sh
deploy/openshift-clusters/sno-to-3node.yml

coderabbitai · 2026-06-26T06:38:04Z

+  tasks:
+    - name: Run mutable topology cleanup
+      import_role:
+        name: mutable-topology/sno-to-3node
+        tasks_from: clean.yml


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Refuse cleanup once the cluster is already HA unless the caller forces it.

This imports tasks/clean.yml unconditionally, and that role deletes master-1/master-2 plus their disks and DHCP state. On a successful 3-node cluster, removing two control-plane nodes here can drop etcd from 3 members to 1 and take the cluster down. Please gate this on the current control-plane topology, or require an explicit force_cleanup=true override before running the destructive role.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deploy/openshift-clusters/clean-mutable-topology.yml` around lines 35 - 39, The cleanup playbook currently imports mutable-topology/sno-to-3node tasks/clean.yml unconditionally, which can destroy a healthy HA control plane. Add a topology check in clean-mutable-topology.yml before the import_role so it only runs when the cluster is not already HA, or require an explicit force_cleanup=true override to proceed. Use the existing import_role task as the entry point and gate it with a condition or precheck that prevents destructive cleanup unless the override is set.

coderabbitai · 2026-06-26T06:38:05Z

+- name: "[preflight] Add api-int to libvirt network DNS"
+  # Using virsh net-update (not lineinfile) so that api-int survives addnhosts
+  # regeneration when create-vms.yml calls virsh net-update for master-1/2.
+  # libvirt regenerates ostestbm.addnhosts from the network XML on every net-update.
+  shell: |
+    # Remove any stale api-int entry (idempotent, ignore errors)
+    sudo virsh net-update {{ sno_libvirt_network }} delete dns-host \
+      "<host ip='{{ sno_master0_ip }}'><hostname>api-int.{{ sno_cluster_domain }}</hostname></host>" \
+      --live --config 2>/dev/null || true
+    # Add the correct entry - triggers addnhosts regeneration with api-int included
+    sudo virsh net-update {{ sno_libvirt_network }} add dns-host \
+      "<host ip='{{ sno_master0_ip }}'><hostname>api-int.{{ sno_cluster_domain }}</hostname></host>" \
+      --live --config
+  when: dns_apiint.stdout | trim == "" or sno_master0_ip not in dns_apiint.stdout
+  changed_when: true
+
+- name: "[preflight] Verify api-int DNS after fix"
+  shell: |
+    dig +short api-int.{{ sno_cluster_domain }} @192.168.111.1
+  register: dns_apiint_verify
+  changed_when: false
+  failed_when: dns_apiint_verify.stdout | trim == ""
+  retries: 3
+  delay: 2
+  until: dns_apiint_verify.stdout | trim != ""
+  when: dns_apiint.stdout | trim == "" or sno_master0_ip not in dns_apiint.stdout


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

The api-int repair path does not remove or verify stale mappings correctly.

The delete call matches api-int on {{ sno_master0_ip }}, so it only removes a record that already points at the desired IP. If api-int is stale and points somewhere else, the old XML entry survives, and the follow-up dig still passes because it only checks for any answer, not for {{ sno_master0_ip }}. Remove existing api-int entries by hostname/FQDN and verify the resolved answers include the expected IP.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/preflight.yml` around lines 129 - 154, The preflight repair flow for api-int only deletes entries that already match sno_master0_ip and the verification step only checks for any DNS answer, so stale mappings can remain undetected. Update the shell logic in the api-int repair task to remove existing api-int entries by hostname/FQDN regardless of IP, then make the dns_apiint_verify check confirm the resolved result includes sno_master0_ip rather than just being non-empty. Use the existing preflight tasks and variables like sno_libvirt_network, sno_master0_ip, and sno_cluster_domain to locate and adjust the repair and verification steps.

coderabbitai · 2026-06-26T06:38:05Z

+- name: "[dns] Build NM dnsmasq config"
+  set_fact:
+    sno_dnsmasq_lines: |
+      {% for ip in sno_all_node_ips %}
+      address=/api.{{ sno_cluster_domain }}/{{ ip }}
+      address=/api-int.{{ sno_cluster_domain }}/{{ ip }}
+      {% endfor %}
+      address=/.apps.{{ sno_cluster_domain }}/{{ sno_all_node_ips[0] }}


🎯 Functional Correctness | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Map the target file and nearby role files. git ls-files 'deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/**' \ 'deploy/openshift-clusters/roles/**/tasks/**' \ 'deploy/openshift-clusters/roles/**/files/**' \ 'deploy/openshift-clusters/roles/**/templates/**' | sed -n '1,250p' echo echo '--- outline ---' ast-grep outline deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/update-dns.yml --view expanded || true echo echo '--- contents ---' cat -n deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/update-dns.yml

Repository: openshift-eng/two-node-toolbox

Length of output: 9700

🏁 Script executed:

#!/bin/bash set -euo pipefail # Map the target file and nearby role files. git ls-files 'deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/**' \ 'deploy/openshift-clusters/roles/**/tasks/**' \ 'deploy/openshift-clusters/roles/**/files/**' \ 'deploy/openshift-clusters/roles/**/templates/**' | sed -n '1,250p' echo echo '--- outline ---' ast-grep outline deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/update-dns.yml --view expanded || true echo echo '--- contents ---' cat -n deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/update-dns.yml

Repository: openshift-eng/two-node-toolbox

Length of output: 9700

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect neighboring topology and DNS logic for the intended apps target. rg -n "sno_all_node_ips|apps\\.|dnsmasq|ingress|router|api-int|api\\." \ deploy/openshift-clusters/roles/mutable-topology/sno-to-3node \ deploy/openshift-clusters/roles/common \ deploy/openshift-clusters/roles/dev-scripts/install-dev \ deploy/openshift-clusters/roles/kcli/kcli-install \ deploy/openshift-clusters/roles/assisted/assisted-spoke \ deploy/openshift-clusters/roles/assisted/acm-install echo echo '--- vars/main.yml ---' cat -n deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/vars/main.yml echo echo '--- topology.yml ---' cat -n deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/topology.yml echo echo '--- verify.yml ---' cat -n deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/verify.yml

Repository: openshift-eng/two-node-toolbox

Length of output: 16571

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect neighboring topology and DNS logic for the intended apps target. rg -n "sno_all_node_ips|apps\\.|dnsmasq|ingress|router|api-int|api\\." \ deploy/openshift-clusters/roles/mutable-topology/sno-to-3node \ deploy/openshift-clusters/roles/common \ deploy/openshift-clusters/roles/dev-scripts/install-dev \ deploy/openshift-clusters/roles/kcli/kcli-install \ deploy/openshift-clusters/roles/assisted/assisted-spoke \ deploy/openshift-clusters/roles/assisted/acm-install echo echo '--- vars/main.yml ---' cat -n deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/vars/main.yml echo echo '--- topology.yml ---' cat -n deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/topology.yml echo echo '--- verify.yml ---' cat -n deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/verify.yml

Repository: openshift-eng/two-node-toolbox

Length of output: 16571

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect neighboring topology and DNS logic for the intended apps target. rg -n "sno_all_node_ips|apps\\.|dnsmasq|ingress|router|api-int|api\\." \ deploy/openshift-clusters/roles/mutable-topology/sno-to-3node \ deploy/openshift-clusters/roles/common \ deploy/openshift-clusters/roles/dev-scripts/install-dev \ deploy/openshift-clusters/roles/kcli/kcli-install \ deploy/openshift-clusters/roles/assisted/assisted-spoke \ deploy/openshift-clusters/roles/assisted/acm-install echo echo '--- vars/main.yml ---' cat -n deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/vars/main.yml echo echo '--- topology.yml ---' cat -n deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/topology.yml echo echo '--- verify.yml ---' cat -n deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/verify.yml

Repository: openshift-eng/two-node-toolbox

Length of output: 16571

🏁 Script executed:

#!/bin/bash set -euo pipefail # Check how ingress/router is handled for the SNO-to-3node transition. rg -n "ingress|router|ingresscontroller|router pod|default ingress|apps\\." \ deploy/openshift-clusters/roles/mutable-topology/sno-to-3node \ deploy/openshift-clusters/docs \ deploy/openshift-clusters/README.md \ docs \ README.md echo echo '--- file list around mutable-topology docs ---' git ls-files 'deploy/openshift-clusters/**README*' 'deploy/openshift-clusters/**/*.md' | rg 'mutable-topology|sno-to-3node|openshift-clusters' echo echo '--- update-dns task with context already known ---' sed -n '1,120p' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/update-dns.yml

Repository: openshift-eng/two-node-toolbox

Length of output: 1012

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "IngressController|ingresscontroller|router-default|worker|master.*router|taint|schedul" \ deploy/openshift-clusters/roles/mutable-topology/sno-to-3node \ deploy/openshift-clusters/roles echo echo '--- mutable-topology role files ---' git ls-files deploy/openshift-clusters/roles/mutable-topology/sno-to-3node echo echo '--- README candidates ---' fd -a 'README.md' deploy/openshift-clusters/roles/mutable-topology deploy/openshift-clusters | sed -n '1,200p'

Repository: openshift-eng/two-node-toolbox

Length of output: 4257

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "IngressController|ingresscontroller|router-default|worker|master.*router|taint|schedul" \ deploy/openshift-clusters/roles/mutable-topology/sno-to-3node \ deploy/openshift-clusters/roles echo echo '--- mutable-topology role files ---' git ls-files deploy/openshift-clusters/roles/mutable-topology/sno-to-3node echo echo '--- README candidates ---' fd -a 'README.md' deploy/openshift-clusters/roles/mutable-topology deploy/openshift-clusters | sed -n '1,200p'

Repository: openshift-eng/two-node-toolbox

Length of output: 4257

🏁 Script executed:

#!/bin/bash set -euo pipefail # Gather the SNO-to-3node ingress/DNS intent from role docs and nearby logic. rg -n "apps|ingress|router|vip|load balancer|load-balancer|dnsmasq" \ deploy/openshift-clusters/roles/mutable-topology/sno-to-3node \ deploy/openshift-clusters/README.md \ deploy/openshift-clusters/roles/common/files/resolve_vm_ip.sh echo echo '--- resolve_vm_ip.sh ---' cat -n deploy/openshift-clusters/roles/common/files/resolve_vm_ip.sh echo echo '--- clean.yml excerpt ---' sed -n '60,90p' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/clean.yml echo echo '--- preflight.yml excerpt ---' sed -n '118,155p' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/preflight.yml

Repository: openshift-eng/two-node-toolbox

Length of output: 11379

🏁 Script executed:

#!/bin/bash set -euo pipefail # Gather the SNO-to-3node ingress/DNS intent from role docs and nearby logic. rg -n "apps|ingress|router|vip|load balancer|load-balancer|dnsmasq" \ deploy/openshift-clusters/roles/mutable-topology/sno-to-3node \ deploy/openshift-clusters/README.md \ deploy/openshift-clusters/roles/common/files/resolve_vm_ip.sh echo echo '--- resolve_vm_ip.sh ---' cat -n deploy/openshift-clusters/roles/common/files/resolve_vm_ip.sh echo echo '--- clean.yml excerpt ---' sed -n '60,90p' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/clean.yml echo echo '--- preflight.yml excerpt ---' sed -n '118,155p' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/preflight.yml

Repository: openshift-eng/two-node-toolbox

Length of output: 11379

Don't pin *.apps to sno_all_node_ips[0].
oc get nodes ordering is not guaranteed, so wildcard routes can jump between control-plane IPs and land on a node that isn't serving ingress. Use a stable ingress VIP/load balancer, or publish all ingress node IPs if that's the intended design.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/update-dns.yml` around lines 24 - 31, The dnsmasq config in the update-dns task is incorrectly pinning the wildcard apps record to sno_all_node_ips[0], which can vary because node ordering is not stable. Update the logic in the sno_dnsmasq_lines generation to use a stable ingress target such as a VIP/load balancer, or otherwise emit all intended ingress node IPs instead of selecting the first node IP. Make the change in the set_fact block that builds the DNS entries so the *.apps record does not depend on array order.

coderabbitai · 2026-06-26T06:38:05Z

+- name: "[etcd] Display etcd member list"
+  shell: |
+    ETCD_POD=$(oc get pods -n openshift-etcd -l app=etcd -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
+    oc exec -n openshift-etcd "$ETCD_POD" -c etcdctl -- etcdctl member list -w table 2>/dev/null || echo "Could not query etcd members"
+  environment: "{{ sno_oc_env }}"
+  register: etcd_members
+  changed_when: false
+  failed_when: false
+
+- name: "[etcd] Members"
+  debug:
+    msg: "{{ etcd_members.stdout }}"
+
+- name: "[etcd] Display etcd endpoint health"
+  shell: |
+    ETCD_POD=$(oc get pods -n openshift-etcd -l app=etcd -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
+    oc exec -n openshift-etcd "$ETCD_POD" -c etcdctl -- etcdctl endpoint health --cluster -w table 2>/dev/null || echo "Could not query endpoint health"
+  environment: "{{ sno_oc_env }}"
+  register: etcd_health
+  changed_when: false
+  failed_when: false


🩺 Stability & Availability | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '--- wait-etcd.yml ---\n' nl -ba deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-etcd.yml | sed -n '1,220p' printf '\n--- role task files ---\n' fd -a -t f 'tasks' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node | sed -n '1,120p' printf '\n--- references to topology.yml and wait-etcd ---\n' rg -n 'topology\.yml|wait-etcd|etcdctl endpoint health|member list|Display etcd' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node -S

Repository: openshift-eng/two-node-toolbox

Length of output: 259

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '--- wait-etcd.yml ---\n' nl -ba deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-etcd.yml | sed -n '1,220p' printf '\n--- role task files ---\n' fd -a -t f 'tasks' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node | sed -n '1,120p' printf '\n--- references to topology.yml and wait-etcd ---\n' rg -n 'topology\.yml|wait-etcd|etcdctl endpoint health|member list|Display etcd' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node -S

Repository: openshift-eng/two-node-toolbox

Length of output: 259

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '--- wait-etcd.yml ---\n' nl -ba deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-etcd.yml | sed -n '1,220p' printf '\n--- role task files ---\n' fd -a -t f 'tasks' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node | sed -n '1,120p' printf '\n--- references to topology.yml and wait-etcd ---\n' rg -n 'topology\.yml|wait-etcd|etcdctl endpoint health|member list|Display etcd' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node -S

Repository: openshift-eng/two-node-toolbox

Length of output: 259

🏁 Script executed:

#!/bin/bash set -euo pipefail echo '--- wait-etcd.yml ---' nl -ba deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-etcd.yml | sed -n '1,220p' echo echo '--- task files under sno-to-3node ---' fd -a -t f tasks deploy/openshift-clusters/roles/mutable-topology/sno-to-3node echo echo '--- references to topology.yml and etcd checks ---' rg -n 'topology\.yml|wait-etcd|etcdctl endpoint health|member list|Display etcd' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node -S

Repository: openshift-eng/two-node-toolbox

Length of output: 230

🏁 Script executed:

#!/bin/bash set -euo pipefail echo '--- wait-etcd.yml ---' sed -n '1,220p' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-etcd.yml echo echo '--- task files under sno-to-3node ---' find deploy/openshift-clusters/roles/mutable-topology/sno-to-3node -maxdepth 3 -type f | sort echo echo '--- references to topology.yml and etcd checks ---' grep -RInE 'topology\.yml|wait-etcd|etcdctl endpoint health|member list|Display etcd' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node

Repository: openshift-eng/two-node-toolbox

Length of output: 4602

Gate the transition on etcd health. The loop only waits for 3 Ready pods; the member list and endpoint health steps still ignore failures (failed_when: false / || echo ...). Since topology.yml runs immediately after this task, the role can proceed while etcd membership or quorum is still unhealthy. Make those checks fail the task or include them in the retry condition.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-etcd.yml` around lines 28 - 48, The wait-etcd task only checks for 3 Ready pods, while the etcd membership and endpoint health probes in the same flow still suppress failures via failed_when: false and fallback echo text. Update the wait logic in wait-etcd.yml so the shell checks using oc exec and etcdctl member list / endpoint health are allowed to fail the task, or fold their success into the retry condition, so the role does not proceed to topology.yml until etcd quorum and health are actually good.

coderabbitai · 2026-06-26T06:38:05Z

+    PENDING=$(oc get csr -o json 2>/dev/null | python3 -c "
+    import json, sys
+    data = json.load(sys.stdin)
+    pending = []
+    for csr in data.get('items', []):
+        status = csr.get('status', {})
+        if not status.get('conditions'):
+            name = csr['metadata']['name']
+            pending.append(name)
+    for name in pending:
+        print(name)
+    " 2>/dev/null || true)
+
+    if [ -n "$PENDING" ]; then
+      for CSR in $PENDING; do
+        oc adm certificate approve "$CSR" 2>/dev/null || true
+        echo "Approved CSR: $CSR"
+      done


🔒 Security & Privacy | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the target file and nearby context. git ls-files 'deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-nodes.yml' wc -l deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-nodes.yml cat -n deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-nodes.yml # Find related uses of CSR approval / sno_new_nodes in this role and repo. rg -n --hidden --no-ignore-vcs 'oc adm certificate approve|pending CSR|sno_new_nodes|csr' deploy/openshift-clusters/roles/mutable-topology -S

Repository: openshift-eng/two-node-toolbox

Length of output: 6346

🏁 Script executed:

#!/bin/bash set -euo pipefail # Map the role directory to locate adjacent tasks/files. find deploy/openshift-clusters/roles/mutable-topology/sno-to-3node -maxdepth 3 -type f | sort # Show the full task file with line numbers if small enough. stat -c '%n %s bytes %n' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-nodes.yml sed -n '1,220p' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-nodes.yml

Repository: openshift-eng/two-node-toolbox

Length of output: 1520

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the variables that define the new nodes and nearby role flow. cat -n deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/vars/main.yml printf '\n---\n' sed -n '1,220p' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/main.yml

Repository: openshift-eng/two-node-toolbox

Length of output: 1943

🏁 Script executed:

#!/bin/bash set -euo pipefail cat -n deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/vars/main.yml printf '\n---\n' sed -n '1,220p' deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/main.yml

Repository: openshift-eng/two-node-toolbox

Length of output: 1943

Restrict CSR approval to the new masters only
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-nodes.yml approves every pending CSR returned by oc get csr. During the migration window, this can approve unrelated or stale requests. Limit approval to the kubelet client/server CSRs for sno_new_nodes before calling oc adm certificate approve.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/wait-nodes.yml` around lines 4 - 21, The CSR approval logic in the wait-nodes task is too broad because it approves every pending request returned by oc get csr. Update the pending-CSR collection and approval loop in the shell block so it only selects kubelet client/server CSRs associated with sno_new_nodes, and then approve only those entries before calling oc adm certificate approve. Use the existing PENDING/CSR handling in wait-nodes.yml as the place to narrow the filter.

coderabbitai · 2026-06-26T06:38:05Z

+sno_dev_scripts_path: "{{ dev_scripts_path | default('openshift-metal3/dev-scripts') }}"
+sno_kubeconfig_resolved: >-
+  {{ sno_kubeconfig if sno_kubeconfig
+     else sno_dev_scripts_path ~ '/ocp/' ~ sno_cluster_name ~ '/auth/kubeconfig' }}


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

sno_cluster_name is not actually auto-detectable here.

deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/defaults/main.yml sets sno_cluster_name to ostest, and this file immediately bakes that into sno_kubeconfig_resolved. On any non-ostest deployment, the first oc call will use the wrong kubeconfig path and fail before the role can discover the real cluster identity. Require sno_cluster_name/sno_kubeconfig up front, or resolve kubeconfig from a source that does not assume the cluster name.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/vars/main.yml` around lines 2 - 5, The kubeconfig path in sno_kubeconfig_resolved is incorrectly derived from the default sno_cluster_name, so it can point to the wrong cluster before discovery happens. Update the mutable-topology SNO-to-3node vars so sno_cluster_name is not assumed here; instead require sno_cluster_name or sno_kubeconfig to be provided up front, or change the resolution logic to use a source that does not depend on the default value. Keep the fix localized to the sno_kubeconfig_resolved expression and related variable handling.

eggfoobar and others added 3 commits June 25, 2026 07:01

feat: add sno deployment config

33a331b

Signed-off-by: ehila <ehila@redhat.com>

upkeep: rename and validate flow for sno to 3node

497dfcc

Signed-off-by: ehila <ehila@redhat.com>

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 26, 2026

openshift-ci Bot requested review from jerpeter1 and qJkee June 26, 2026 06:20

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2026

coderabbitai Bot requested changes Jun 26, 2026

View reviewed changes

Uh oh!

Conversation

eggfoobar commented Jun 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented Jun 26, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci Bot commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Pre-merge checks failed

❌ Failed checks (1 error)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eggfoobar commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented Jun 26, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading