-
Notifications
You must be signed in to change notification settings - Fork 15
OCPEDGE-2498: Add migration support #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
eggfoobar
wants to merge
3
commits into
openshift-eng:main
Choose a base branch
from
eggfoobar:add-migration-support
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| --- | ||
| - hosts: metal_machine | ||
| gather_facts: yes | ||
|
|
||
| pre_tasks: | ||
| - name: Confirm mutable topology cleanup | ||
| ansible.builtin.pause: | ||
| prompt: >- | ||
| This will destroy the master-1 and master-2 VMs, their disks, DHCP reservations, | ||
| and DNS entries. master-0 (if still present) will be unaffected. | ||
| Press Enter to proceed or Ctrl+C to abort. | ||
| delegate_to: localhost | ||
| run_once: true | ||
| when: interactive_mode | default(true) | bool | ||
|
|
||
| - name: Detect cluster domain from kubeconfig (best-effort) | ||
| shell: | | ||
| KUBECONFIG={{ dev_scripts_path | default('openshift-metal3/dev-scripts') }}/ocp/{{ sno_cluster_name | default('ostest') }}/auth/kubeconfig \ | ||
| oc get infrastructure cluster -o jsonpath='{.status.apiServerInternalURI}' 2>/dev/null \ | ||
| | sed 's|^https://api-int\.||; s|:6443$||' | ||
| register: detected_domain | ||
| changed_when: false | ||
| failed_when: false | ||
| ignore_errors: true | ||
|
|
||
| - name: Set cluster domain (use detected, variable override, or default) | ||
| set_fact: | ||
| sno_cluster_domain: >- | ||
| {{ sno_cluster_domain | ||
| if (sno_cluster_domain is defined and sno_cluster_domain) | ||
| else (detected_domain.stdout | trim | ||
| if (detected_domain.stdout is defined and detected_domain.stdout | trim) | ||
| else 'ostest.test.metalkube.org') }} | ||
|
|
||
| tasks: | ||
| - name: Run mutable topology cleanup | ||
| import_role: | ||
| name: mutable-topology/sno-to-3node | ||
| tasks_from: clean.yml | ||
|
|
||
| - name: Cleanup complete | ||
| ansible.builtin.debug: | ||
| msg: "Mutable topology cleanup complete. master-1 and master-2 VMs have been removed." | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,3 +3,4 @@ ci_token | |
| clusterbot-ci_token | ||
| config_arbiter.sh | ||
| config_fencing.sh | ||
| config_sno.sh | ||
40 changes: 40 additions & 0 deletions
40
deploy/openshift-clusters/roles/dev-scripts/install-dev/files/config_sno_example.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| #!/bin/bash | ||
|
|
||
|
|
||
| # Please copy one of the config values below for IPI or Agent based installs into your | ||
| # config. | ||
| # BEGIN IPI Specific Install Config Variables | ||
| export IP_STACK="v4" | ||
| export NUM_WORKERS=0 | ||
| export MASTER_MEMORY=32768 | ||
| export MASTER_DISK=100 | ||
| export MASTER_VCPU=4 | ||
| export NUM_MASTERS=1 | ||
| ## END IPI Specific Install Config Variables | ||
|
|
||
| ## BEGIN Agent Specific Install Config Variables | ||
| export AGENT_E2E_TEST_SCENARIO="SNO_IPV4" | ||
| # Sets the install-config.yaml's platform type. | ||
| # The default is 'baremetal'. | ||
| # See https://github.com/openshift-metal3/dev-scripts/blob/master/config_example.sh for more details on this variable and its effects. | ||
| #export AGENT_PLATFORM_TYPE=none | ||
| ## END Agent Specific Install Config Variables | ||
| #### | ||
|
|
||
| # TechPreview FeatureSet not needed for 4.20 and above OCP | ||
| # export FEATURE_SET="TechPreviewNoUpgrade" | ||
| export OPENSHIFT_CI="true" | ||
|
|
||
| # If you want to avoid using the CI_TOKEN, uncomment this variable, but it has side effects. | ||
| # You can read more on this here: https://github.com/openshift-metal3/dev-scripts/blob/3f070cfd36977381a186cadfb44887856d652bed/config_example.sh#L21 | ||
| # export OPENSHIFT_CI="true" | ||
|
|
||
| # You can find the latest public images in https://quay.io/repository/openshift-release-dev/ocp-release?tab=tags | ||
| # and select your preferred version. Public sources can be found at https://mirror.openshift.com/pub/openshift-v4/ | ||
|
|
||
| export OPENSHIFT_RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.21.0-x86_64 | ||
| # Unless you need to override the installer image, this is not needed | ||
| # export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE="" | ||
|
|
||
| # Disable sigstore image verification during installation | ||
| export OPENSHIFT_INSTALL_EXPERIMENTAL_DISABLE_IMAGE_POLICY=true | ||
|
eggfoobar marked this conversation as resolved.
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
38 changes: 38 additions & 0 deletions
38
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/defaults/main.yml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| --- | ||
| # Cluster identity (auto-detected from cluster if not set) | ||
| sno_cluster_name: ostest | ||
| sno_cluster_domain: "" | ||
| sno_infra_id: "" | ||
|
|
||
| # Existing master-0 (auto-detected from cluster) | ||
| sno_master0_ip: "" | ||
|
|
||
| # New node IPs (static assignments within the dev-scripts DHCP range) | ||
| sno_master1_ip: "192.168.111.21" | ||
| sno_master2_ip: "192.168.111.22" | ||
|
|
||
| # VM specs | ||
| sno_vm_vcpus: 6 | ||
| sno_vm_ram_mb: 16384 | ||
| sno_vm_disk_gb: 50 | ||
|
|
||
| # Libvirt network (dev-scripts baremetal network) | ||
| sno_libvirt_network: ostestbm | ||
| sno_libvirt_bridge: ostestbm | ||
|
|
||
| # RHCOS live ISO path on hypervisor (auto-detected from release image if empty) | ||
| sno_rhcos_live_iso: "" | ||
|
|
||
| # Timeouts | ||
| sno_mco_timeout_minutes: 45 | ||
| sno_node_join_timeout_minutes: 20 | ||
| sno_etcd_timeout_minutes: 15 | ||
|
|
||
| # Auto-fix MCO drain deadlock during topology transition | ||
| sno_auto_fix_drain: true | ||
|
|
||
| # Paths (override if dev-scripts is in a non-standard location) | ||
| sno_kubeconfig: "" | ||
|
|
||
| # VM image directory | ||
| sno_vm_image_dir: "/var/lib/libvirt/images" |
131 changes: 131 additions & 0 deletions
131
deploy/openshift-clusters/roles/mutable-topology/sno-to-3node/tasks/boot-nodes.yml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,131 @@ | ||
| --- | ||
| - name: "[boot] Check if RHCOS live ISO exists on hypervisor" | ||
| stat: | ||
| path: "/var/lib/libvirt/images/rhcos-live.iso" | ||
| register: iso_stat | ||
|
|
||
| - name: "[boot] Find RHCOS live ISO from dev-scripts cache" | ||
| shell: | | ||
| DEVSCRIPTS_ISO=$(find /var/lib/libvirt/images -name 'rhcos-*-live*.iso' 2>/dev/null | head -1) | ||
| if [ -n "$DEVSCRIPTS_ISO" ]; then | ||
| echo "Using existing ISO: $DEVSCRIPTS_ISO" | ||
| sudo ln -sf "$DEVSCRIPTS_ISO" /var/lib/libvirt/images/rhcos-live.iso | ||
| exit 0 | ||
| fi | ||
|
|
||
| CACHE_ISO=$(find {{ sno_dev_scripts_path }}/ -name 'rhcos-*-live*.iso' 2>/dev/null | head -1) | ||
| if [ -n "$CACHE_ISO" ]; then | ||
| echo "Using dev-scripts cached ISO: $CACHE_ISO" | ||
| sudo ln -sf "$CACHE_ISO" /var/lib/libvirt/images/rhcos-live.iso | ||
| exit 0 | ||
| fi | ||
|
|
||
| echo "ERROR: No RHCOS live ISO found. Set sno_rhcos_live_iso variable." | ||
| exit 1 | ||
| when: not iso_stat.stat.exists and sno_rhcos_live_iso == "" | ||
|
|
||
| - name: "[boot] Set ISO path" | ||
| set_fact: | ||
| sno_iso_path: "{{ sno_rhcos_live_iso if sno_rhcos_live_iso else '/var/lib/libvirt/images/rhcos-live.iso' }}" | ||
|
|
||
| - name: "[boot] Read master.ign content" | ||
| slurp: | ||
| src: /tmp/master.ign | ||
| register: master_ign_content | ||
|
|
||
| - name: "[boot] Set base64-encoded master.ign" | ||
| set_fact: | ||
| sno_master_ign_b64: "{{ master_ign_content.content }}" | ||
|
|
||
| - name: "[boot] Generate auto-install ignition" | ||
| template: | ||
| src: auto-install.ign.j2 | ||
| dest: /tmp/auto-install.ign | ||
| mode: '0644' | ||
|
|
||
| - name: "[boot] Create per-node ISO with embedded ignition" | ||
| shell: | | ||
| sudo cp {{ sno_iso_path }} /var/lib/libvirt/images/rhcos-{{ item.hostname }}.iso | ||
| sudo coreos-installer iso ignition embed -i /tmp/auto-install.ign /var/lib/libvirt/images/rhcos-{{ item.hostname }}.iso -f | ||
| loop: "{{ sno_new_nodes }}" | ||
|
|
||
| - name: "[boot] Boot each VM with ignition-embedded ISO" | ||
| shell: | | ||
| VM_NAME="{{ item.name }}" | ||
| MAC="{{ sno_node_macs[item.hostname] }}" | ||
|
|
||
| sudo virsh destroy "$VM_NAME" 2>/dev/null || true | ||
| sudo virsh undefine "$VM_NAME" 2>/dev/null || true | ||
|
|
||
| sudo virt-install \ | ||
| --name "$VM_NAME" \ | ||
| --ram {{ sno_vm_ram_mb }} \ | ||
| --vcpus {{ sno_vm_vcpus }} \ | ||
| --disk {{ sno_vm_image_dir }}/${VM_NAME}.qcow2,bus=virtio \ | ||
| --network network={{ sno_libvirt_network }},model=virtio,mac=${MAC} \ | ||
| --cdrom /var/lib/libvirt/images/rhcos-{{ item.hostname }}.iso \ | ||
| --os-variant rhel9.0 \ | ||
| --graphics none \ | ||
| --noautoconsole \ | ||
| --boot loader=/usr/share/edk2/ovmf/OVMF_CODE.fd,loader_ro=yes,loader_type=pflash,nvram_template=/usr/share/edk2/ovmf/OVMF_VARS.fd,loader_secure=no \ | ||
| --boot hd,cdrom \ | ||
| --tpm none | ||
| loop: "{{ sno_new_nodes }}" | ||
|
|
||
| - name: "[boot] Verify VMs are running (initial install boot)" | ||
| shell: | | ||
| sudo virsh domstate {{ item.name }} | ||
| register: vm_state | ||
| loop: "{{ sno_new_nodes }}" | ||
| changed_when: false | ||
| failed_when: "'running' not in vm_state.stdout" | ||
|
|
||
| - name: "[boot] Wait for coreos-installer to complete (VM will power off)" | ||
| # coreos-installer runs ExecStartPost=systemctl reboot, but RHCOS live issues | ||
| # an ACPI poweroff rather than a reset. libvirt fires on_poweroff=destroy so | ||
| # the VM shuts off. We poll until shut off, then boot from disk below. | ||
| shell: | | ||
| for i in $(seq 50); do | ||
| STATE=$(sudo virsh domstate {{ item.name }} 2>/dev/null || echo "unknown") | ||
| if echo "$STATE" | grep -q "shut off"; then | ||
| echo "{{ item.name }} shut off after $((i * 20))s - install complete" | ||
| exit 0 | ||
| fi | ||
| sleep 20 | ||
| done | ||
| echo "Timeout: {{ item.name }} did not shut off within 1000s" | ||
| exit 1 | ||
| loop: "{{ sno_new_nodes }}" | ||
| changed_when: false | ||
|
|
||
| - name: "[boot] Remove CDROM from boot order after install" | ||
| # Prevent coreos-installer loop: strip the cdrom boot entry so UEFI only | ||
| # tries the hard disk on subsequent boots. | ||
| shell: | | ||
| TMPXML=$(mktemp /tmp/vm-XXXXXX.xml) | ||
| sudo virsh dumpxml {{ item.name }} > "$TMPXML" | ||
| sudo sed -i "/<boot dev='cdrom'\/>/d" "$TMPXML" | ||
| sudo virsh define "$TMPXML" | ||
| sudo rm -f "$TMPXML" | ||
| loop: "{{ sno_new_nodes }}" | ||
| changed_when: true | ||
|
|
||
| - name: "[boot] Start VMs to boot from installed RHCOS" | ||
| shell: | | ||
| sudo virsh start {{ item.name }} | ||
| loop: "{{ sno_new_nodes }}" | ||
| changed_when: true | ||
|
|
||
| - name: "[boot] Verify VMs are running from installed disk" | ||
| shell: | | ||
| sudo virsh domstate {{ item.name }} | ||
| register: vm_state_disk | ||
| loop: "{{ sno_new_nodes }}" | ||
| changed_when: false | ||
| failed_when: "'running' not in vm_state_disk.stdout" | ||
|
|
||
| - name: "[boot] VMs booted with RHCOS" | ||
| debug: | ||
| msg: >- | ||
| {{ sno_new_nodes | length }} VMs installed and started from disk. | ||
| Waiting for nodes to join the cluster... |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Refuse cleanup once the cluster is already HA unless the caller forces it.
This imports
tasks/clean.ymlunconditionally, and that role deletesmaster-1/master-2plus their disks and DHCP state. On a successful 3-node cluster, removing two control-plane nodes here can drop etcd from 3 members to 1 and take the cluster down. Please gate this on the current control-plane topology, or require an explicitforce_cleanup=trueoverride before running the destructive role.🤖 Prompt for AI Agents