OCPBUGS-59958: move cleanUpDuplicatedMC to avoid double reboot on first updated Master node by proietfb · Pull Request #6201 · openshift/machine-config-operator

proietfb · 2026-06-17T13:55:21Z

What I did

Moved cleanUpDuplicatedMC to after the loop that creates/updates MachineConfigs.

How to verify it

Deploy a stock cluster (without this fix)
Apply a KubeletConfig with autoSizingReserved: true targeting the master pool and wait for the pool to reach UPDATED: True
Corrupt the GeneratedByControllerVersionAnnotationKey annotation on 97-master-generated-kubelet with an arbitrary value to simulate the pre-upgrade state, then restart the
machine-config-controller pod
Without the fix: DELETED followed by ADDED events appear on 97-master-generated-kubelet, and a new rendered-master-* is created
Apply the MCC image containing this fix and repeat steps 3–4
With the fix: only MODIFIED events appear on 97-master-generated-kubelet, with no new rendered-master-*

Note: corrupting the annotation manually is necessary to simulate the version mismatch that occurs naturally during an MCO upgrade, when the new MCC binary carries a different GeneratedByControllerVersionAnnotationKey value than the one stored in the existing MC annotation.

Description for the changelog

Running the loop first guarantees that all existing MCs have their GeneratedByControllerVersionAnnotationKey annotation updated before cleanUpDuplicatedMC runs, preventing it from wrongly removing them.

cleanUpDuplicatedMC will only act on MCs not associated with any existing MachineConfigPool.

cleanUpDuplicatedMC's git history shows that its original position was after the create/update MCs loop and was moved to avoid a corner case related to an early exit when no cgroup v2 was present. Then, after defaulting cgroups, that corner case was removed.

Summary by CodeRabbit

Bug Fixes
- Improved kubelet configuration reconciliation by adjusting when duplicate machine configuration cleanup occurs, resulting in more predictable kubelet configuration updates across cluster pools.
Tests
- Updated node configuration tests to reflect the updated machine configuration fetch behavior during reconciliation.

…eConfigHandler

openshift-merge-bot · 2026-06-17T13:55:24Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

openshift-ci-robot · 2026-06-17T13:55:28Z

@proietfb: This pull request references Jira Issue OCPBUGS-59958, which is invalid:

expected the bug to target either version "5.0." or "openshift-5.0.", but it targets "4.22.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Closes OCPBUGS-59958

What I did

Moved cleanUpDuplicatedMC to after the loop that creates/updates MachineConfigs.

How to verify it

Deploy a stock cluster (without this fix)

Apply a KubeletConfig with autoSizingReserved: true targeting the master pool and wait for the pool to reach UPDATED: True

Corrupt the GeneratedByControllerVersionAnnotationKey annotation on 97-master-generated-kubelet with an arbitrary value to simulate the pre-upgrade state, then restart the
machine-config-controller pod

Without the fix: DELETED followed by ADDED events appear on 97-master-generated-kubelet, and a new rendered-master-* is created

Apply the MCC image containing this fix and repeat steps 3–4

With the fix: only MODIFIED events appear on 97-master-generated-kubelet, with no new rendered-master-*

Note: corrupting the annotation manually is necessary to simulate the version mismatch that occurs naturally during an MCO upgrade, when the new MCC binary carries a different GeneratedByControllerVersionAnnotationKey value than the one stored in the existing MC annotation.

Description for the changelog

Running the loop first guarantees that all existing MCs have their GeneratedByControllerVersionAnnotationKey annotation updated before cleanUpDuplicatedMC runs, preventing it from wrongly removing them.

cleanUpDuplicatedMC will only act on MCs not associated with any existing MachineConfigPool.

cleanUpDuplicatedMC's git history shows that its original position was after the create/update MCs loop and was moved to avoid a corner case related to an early exit when no cgroup v2 was present. Then, after defaulting cgroups, that corner case was removed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-06-17T13:55:38Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 830309d0-8c63-45aa-8b49-9745f20ea808

📥 Commits

Reviewing files that changed from the base of the PR and between 51e1b0d and 754cec0.

📒 Files selected for processing (1)

pkg/controller/kubelet-config/kubelet_config_nodes_test.go

Walkthrough

In syncNodeConfigHandler, the cleanUpDuplicatedMC(managedNodeConfigKeyPrefix) call is relocated from before the controller config and pool processing block to after the pool reconciliation loop finishes and immediately before the existing kubeletconfigs are synced. Test expectations are updated to verify the new call sequence.

Changes

Cleanup call reorder in syncNodeConfigHandler

Layer / File(s)	Summary
Reorder `cleanUpDuplicatedMC` call and update test expectations `pkg/controller/kubelet-config/kubelet_config_nodes.go`, `pkg/controller/kubelet-config/kubelet_config_nodes_test.go`	`cleanUpDuplicatedMC(managedNodeConfigKeyPrefix)` is moved from before pool processing to after all pool reconciliation completes and before kubeletconfig syncing begins. Test expectations are updated to include an additional GET call for the worker machine config.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (14 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and specifically references the bug fix (OCPBUGS-59958) and clearly summarizes the main change: relocating cleanUpDuplicatedMC to prevent double reboot on Master nodes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	PR modifies kubelet_config_nodes_test.go which uses standard Go testing, not Ginkgo. No Ginkgo test definitions (It, Describe, Context, etc.) exist in the file, so the check is not applicable.
Test Structure And Quality	✅ Passed	TestNodeConfigDefault follows Ginkgo test quality requirements: uses Go's standard t.Run for test isolation, employs fake clients for API interactions, assertions include meaningful context message...
Microshift Test Compatibility	✅ Passed	No new Ginkgo e2e tests were added in this PR. Changes are limited to refactoring existing controller code and updating unit test expectations in the kubelet_config_nodes_test.go file.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	PR only modifies unit tests in pkg/controller/kubelet-config/ using Go's standard testing framework, not new Ginkgo e2e tests. The custom check for SNO compatibility applies only to new Ginkgo e2e...
Topology-Aware Scheduling Compatibility	✅ Passed	PR modifies only controller reconciliation logic for MachineConfig ordering; no deployment manifests, pod scheduling constraints, affinity rules, or topology-aware code changes are introduced.
Ote Binary Stdout Contract	✅ Passed	Modified files are controller package code (not binary entry points) and test assertions, with no process-level stdout writes. Changes relocate a function call and update test expectations, introdu...
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	No new Ginkgo e2e tests are added in this PR. Only unit test mock expectations are updated in kubelet_config_nodes_test.go; check does not apply.
No-Weak-Crypto	✅ Passed	PR contains no cryptographic operations. Changes are purely orchestration logic that reorders function calls in a Kubernetes controller, with no weak crypto, custom crypto, or secret comparison iss...
Container-Privileges	✅ Passed	No container/Kubernetes manifests modified in this PR; check only applies to manifest files defining privileged container settings.
No-Sensitive-Data-In-Logs	✅ Passed	PR only moves cleanUpDuplicatedMC call between lines; no new logging statements were introduced that could expose passwords, tokens, API keys, PII, or sensitive data.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

yuqi-zhang

I think this will probably fix the problem, but we did explicitly move this to the current location in #3563

With the commit message:

2. Execute the clean up of duplicate MCs in the kubeletCfg_NodeCfg controller before returning due to empty nodes.config spec

So I'm wondering if that's still relevant, i.e. if you do set a node.config spec, then remove it, is this properly handling it?

I was debating whether or not we should actually add an explicit check and delete instead of relying on the cleanUpDuplicatedMC code, since I'm not sure if its doing it properly in current version clusters.

isabella-janssen · 2026-06-18T16:01:50Z

/jira refresh

openshift-ci-robot · 2026-06-18T16:01:58Z

@isabella-janssen: This pull request references Jira Issue OCPBUGS-59958, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @asahay19

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

proietfb · 2026-06-19T17:02:52Z

I think this will probably fix the problem, but we did explicitly move this to the current location in #3563

From what I'm understanding, this was true until MCO defaulted to cgroupv2 (#3972). Before that, if nodeCfg.Spec was empty, the sync function would return without errors. For this reason, if cleanUpDuplicatedMC was placed after the for loop, it would never run. By removing that Path from sync function (#3972) the execution of cleanUpDuplicatedMC should be guaranteed.

So I'm wondering if that's still relevant, i.e. if you do set a node.config spec, then remove it, is this properly handling it?

Since there are not early returns without errors, I think the kubelet config nodes worker should cover this scenario, but this is not fully true for controller worker. Unlike the others, for kubelet controller configs I'm seeing 2 more scenarios:

If a config has been added and then deleted, that config should not be orphan due to a deletion or timeout.
If a config is part of a custom pool and that custom pool has been deleted, the related MCs (if not deleted by something else), will remain orphaned until the MCO is updated. In that case, those MCs will not have the updated GeneratedByControllerVersionAnnotationKey annotation with the latest MCO version and orphaned MCs should be deleted because of a mismatched annotation.

So, in theory, if what I've said it is true, the last scenario was still present even before #3972 and #3563. In that case, we can investigate further by opening a bug related to that case.

@yuqi-zhang what do you think about?

yuqi-zhang · 2026-06-19T22:53:25Z

From what I'm understanding, this was true until MCO defaulted to cgroupv2 (#3972). Before that, if nodeCfg.Spec was empty, the sync function would return without errors. For this reason, if cleanUpDuplicatedMC was placed after the for loop, it would never run. By removing that Path from sync function (#3972) the execution of cleanUpDuplicatedMC should be guaranteed.

Ack, that makes sense, thanks for checking into it. (side note, it's not really just cleaning up duplicated MCs, so the name is kinda misleading, but not a big deal)

If a config is part of a custom pool and that custom pool has been deleted, the related MCs (if not deleted by something else), will remain orphaned until the MCO is updated. In that case, those MCs will not have the updated GeneratedByControllerVersionAnnotationKey annotation with the latest MCO version and orphaned MCs should be deleted because of a mismatched annotation.

If I understand correctly, you're saying that the controller running here to delete the additional MC doesn't key off of pool changes, so until something actually triggers a re-render, we have a bunch of stale MCs. I think this is probably fine since if the pool is gone, nothing would be using these MCs anyways. Since they eventually get cleaned up, it's up to you on whether you want to make this a followup issue or not.

Also, the unit test failures are legitimate (albeit not because you're doing anything wrong I think, just that the expected behaviour has changed?). You probably need to update https://github.com/openshift/machine-config-operator/blob/main/pkg/controller/kubelet-config/kubelet_config_nodes_test.go#L72-L75 based on an initial look. You may have one additional get() for some reason - may be best to check if that's expected first before adding it to the test.

openshift-ci-robot · 2026-06-22T06:46:49Z

@proietfb: This pull request references Jira Issue OCPBUGS-59958, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @asahay19

Details

In response to this:

Closes OCPBUGS-59958

What I did

Moved cleanUpDuplicatedMC to after the loop that creates/updates MachineConfigs.

How to verify it

Deploy a stock cluster (without this fix)

Apply a KubeletConfig with autoSizingReserved: true targeting the master pool and wait for the pool to reach UPDATED: True

Corrupt the GeneratedByControllerVersionAnnotationKey annotation on 97-master-generated-kubelet with an arbitrary value to simulate the pre-upgrade state, then restart the
machine-config-controller pod

Without the fix: DELETED followed by ADDED events appear on 97-master-generated-kubelet, and a new rendered-master-* is created

Apply the MCC image containing this fix and repeat steps 3–4

With the fix: only MODIFIED events appear on 97-master-generated-kubelet, with no new rendered-master-*

Note: corrupting the annotation manually is necessary to simulate the version mismatch that occurs naturally during an MCO upgrade, when the new MCC binary carries a different GeneratedByControllerVersionAnnotationKey value than the one stored in the existing MC annotation.

Description for the changelog

Running the loop first guarantees that all existing MCs have their GeneratedByControllerVersionAnnotationKey annotation updated before cleanUpDuplicatedMC runs, preventing it from wrongly removing them.

cleanUpDuplicatedMC will only act on MCs not associated with any existing MachineConfigPool.

cleanUpDuplicatedMC's git history shows that its original position was after the create/update MCs loop and was moved to avoid a corner case related to an early exit when no cgroup v2 was present. Then, after defaulting cgroups, that corner case was removed.

Summary by CodeRabbit

Bug Fixes

Improved kubelet configuration reconciliation by adjusting when duplicate machine configuration cleanup occurs, resulting in more predictable kubelet configuration updates across cluster pools.

Tests

Updated node configuration tests to reflect the updated machine configuration fetch behavior during reconciliation.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

proietfb · 2026-06-22T07:07:29Z

If I understand correctly, you're saying that the controller running here to delete the additional MC doesn't key off of pool changes, so until something actually triggers a re-render, we have a bunch of stale MCs. I think this is probably fine since if the pool is gone, nothing would be using these MCs anyways.

Yes exactly

Also, the unit test failures are legitimate (albeit not because you're doing anything wrong I think, just that the expected behaviour has changed?). You probably need to update https://github.com/openshift/machine-config-operator/blob/main/pkg/controller/kubelet-config/kubelet_config_nodes_test.go#L72-L75 based on an initial look. You may have one additional get() for some reason - may be best to check if that's expected first before adding it to the test.

Thank you. Yes, moving the cleanup function after, requires an extra get() to unit tests.

yuqi-zhang

/lgtm

Based on the conversation I think this is a fine backportable fix to get rid of the immediate problem. Long term I'd like us to consider something like https://redhat.atlassian.net/browse/MCO-2340 for this as well (way beyond the scope of this PR, just linking for context)

openshift-merge-bot · 2026-06-22T18:47:57Z

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-ovn
/test e2e-aws-ovn-upgrade
/test e2e-gcp-op-ocl-part1
/test e2e-gcp-op-ocl-part2
/test e2e-gcp-op-part1
/test e2e-gcp-op-part2
/test e2e-gcp-op-single-node
/test e2e-hypershift

openshift-ci · 2026-06-22T18:48:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: proietfb, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [proietfb,yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

yuqi-zhang · 2026-06-22T18:49:32Z

/cherry-pick release-4.22

openshift-cherrypick-robot · 2026-06-22T18:49:35Z

@yuqi-zhang: once the present PR merges, I will cherry-pick it on top of release-4.22 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

proietfb · 2026-06-25T11:10:26Z

/test e2e-gcp-op-part2
/test e2e-aws-ovn-upgrade
/test e2e-hypershift

proietfb · 2026-06-26T10:20:14Z

/cherry-pick release-4.21

openshift-cherrypick-robot · 2026-06-26T10:20:17Z

@proietfb: once the present PR merges, I will cherry-pick it on top of release-4.21 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.21 release-4.20 release-4.19 release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

proietfb · 2026-06-26T10:22:12Z

/cherry-pick release-4.20
/cherry-pick release-4.19
/cherry-pick release-4.18

openshift-cherrypick-robot · 2026-06-26T10:22:15Z

@proietfb: once the present PR merges, I will cherry-pick it on top of release-4.18, release-4.19, release-4.20 in new PRs and assign them to you.

Details

In response to this:

/cherry-pick release-4.20
/cherry-pick release-4.19
/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sergiordlr · 2026-06-29T11:20:35Z

Verified using IPI on AWS.

Reproduce the issue

In a 5.0 cluster without the fix we executed the following steps

Create a kubeletconfig with autoSizingReserved: true in the master pool

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: autosizing-reserved
spec:
  autoSizingReserved: true
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/master: ""

Manually edit the machineconfiguration.openshift.io/generated-by-controller-version annotation in 97-master-generated-kubelet machineconfig.
Remove the MCC pod and let MCO recreate it
Check that the 97-master-generated-kubelet machineconfig is deleted and recreated

Verify the fix

In a 5.0 cluster with the fix we executed the following steps

Create a kubeletconfig with autoSizingReserved: true in the master pool

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: autosizing-reserved
spec:
  autoSizingReserved: true
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/master: ""

Manually edit the machineconfiguration.openshift.io/generated-by-controller-version annotation in 97-master-generated-kubelet machineconfig.
Remove the MCC pod and let MCO recreate it
Check that the 97-master-generated-kubelet machineconfig was NOT deleted, but updated.

Verify the fix with an actual upgrade

We executed the following steps

Create a 4.22 cluster
Create a kubeletconfig with autoSizingReserved: true in the master pool

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: autosizing-reserved
spec:
  autoSizingReserved: true
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/master: ""

Upgrade the cluster to a 5.0+fix version
Check that the nodes were only rebooted once

The upgrade window is 10:01:45Z – 11:06:45Z.                               
                                                                                                                                                                                                                   
  Each last reboot line shows when a boot started. A reboot happened during the upgrade if the boot start time falls within that window.                                                                           
                                                                                                                                                                                                                   
  Masters:                                                                                                                                                                                                         
                                                                                                                                                                                                                   
  ┌────────────────┬────────────────────────────┬──────────────────────────┐                                                                                                                                       
  │      Node      │         Boot times         │ Boots within 10:01–11:06 │
  ├────────────────┼────────────────────────────┼──────────────────────────┤                                                                                                                                       
  │ ip-10-0-19-104 │ 08:39, 08:41, 09:26, 10:51 │ 1                        │
  ├────────────────┼────────────────────────────┼──────────────────────────┤                                                                                                                                       
  │ ip-10-0-41-162 │ 08:39, 08:41, 09:32, 11:05 │ 1                        │                                                                                                                                       
  ├────────────────┼────────────────────────────┼──────────────────────────┤                                                                                                                                       
  │ ip-10-0-88-60  │ 08:39, 08:41, 09:37, 10:58 │ 1                        │                                                                                                                                       
  └────────────────┴────────────────────────────┴──────────────────────────┘                                                                                                                                       
                  
  Workers:                                                                                                                                                                                                         
                  
  ┌────────────────┬─────────────────────┬──────────────────────────┐
  │      Node      │     Boot times      │ Boots within 10:01–11:06 │
  ├────────────────┼─────────────────────┼──────────────────────────┤                                                                                                                                              
  │ ip-10-0-22-154 │ 08:47, 08:48, 10:47 │ 1                        │
  ├────────────────┼─────────────────────┼──────────────────────────┤                                                                                                                                              
  │ ip-10-0-48-220 │ 08:47, 08:49, 10:50 │ 1                        │
  ├────────────────┼─────────────────────┼──────────────────────────┤                                                                                                                                              
  │ ip-10-0-72-255 │ 08:47, 08:50, 10:53 │ 1                        │
  └────────────────┴─────────────────────┴──────────────────────────┘                                                                                                                                              
                  
  Every node rebooted exactly 1 time during the upgrade.

/verified by @sergiordlr

sergiordlr · 2026-06-29T11:21:42Z

/verified by @sergiordlr

openshift-ci-robot · 2026-06-29T11:22:00Z

@sergiordlr: This PR has been marked as verified by @sergiordlr.

Details

In response to this:

/verified by @sergiordlr

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-06-29T11:42:16Z

@proietfb: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot · 2026-06-29T11:45:22Z

@proietfb: Jira Issue OCPBUGS-59958: Some pull requests linked via external trackers have merged:

The following pull request, linked via external tracker, has not merged:

openshift/machine-config-operator#5681 is open

All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-59958 has not been moved to the MODIFIED state.

This PR is marked as verified. If the remaining PRs listed above are marked as verified before merging, the issue will automatically be moved to VERIFIED after all of the changes from the PRs are available in an accepted nightly payload.

Details

In response to this:

Closes OCPBUGS-59958

What I did

Moved cleanUpDuplicatedMC to after the loop that creates/updates MachineConfigs.

How to verify it

Deploy a stock cluster (without this fix)

Apply a KubeletConfig with autoSizingReserved: true targeting the master pool and wait for the pool to reach UPDATED: True

Corrupt the GeneratedByControllerVersionAnnotationKey annotation on 97-master-generated-kubelet with an arbitrary value to simulate the pre-upgrade state, then restart the
machine-config-controller pod

Without the fix: DELETED followed by ADDED events appear on 97-master-generated-kubelet, and a new rendered-master-* is created

Apply the MCC image containing this fix and repeat steps 3–4

With the fix: only MODIFIED events appear on 97-master-generated-kubelet, with no new rendered-master-*

Note: corrupting the annotation manually is necessary to simulate the version mismatch that occurs naturally during an MCO upgrade, when the new MCC binary carries a different GeneratedByControllerVersionAnnotationKey value than the one stored in the existing MC annotation.

Description for the changelog

Running the loop first guarantees that all existing MCs have their GeneratedByControllerVersionAnnotationKey annotation updated before cleanUpDuplicatedMC runs, preventing it from wrongly removing them.

cleanUpDuplicatedMC will only act on MCs not associated with any existing MachineConfigPool.

cleanUpDuplicatedMC's git history shows that its original position was after the create/update MCs loop and was moved to avoid a corner case related to an early exit when no cgroup v2 was present. Then, after defaulting cgroups, that corner case was removed.

Summary by CodeRabbit

Bug Fixes

Improved kubelet configuration reconciliation by adjusting when duplicate machine configuration cleanup occurs, resulting in more predictable kubelet configuration updates across cluster pools.

Tests

Updated node configuration tests to reflect the updated machine configuration fetch behavior during reconciliation.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-cherrypick-robot · 2026-06-29T11:46:10Z

@yuqi-zhang: new pull request created: #6240

Details

In response to this:

/cherry-pick release-4.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-cherrypick-robot · 2026-06-29T11:46:54Z

@proietfb: new pull request created: #6241

Details

In response to this:

/cherry-pick release-4.21

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-cherrypick-robot · 2026-06-29T11:47:37Z

@proietfb: new pull request created: #6242

Details

In response to this:

/cherry-pick release-4.20
/cherry-pick release-4.19
/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-cherrypick-robot · 2026-06-29T11:48:20Z

@proietfb: new pull request created: #6243

Details

In response to this:

/cherry-pick release-4.20
/cherry-pick release-4.19
/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-cherrypick-robot · 2026-06-29T11:49:02Z

@proietfb: #6201 failed to apply on top of branch "release-4.18":

Applying: OCPBUGS-59958: move cleanUpDuplicatedMC to after pool loop in syncNodeConfigHandler
Using index info to reconstruct a base tree...
M	pkg/controller/kubelet-config/kubelet_config_nodes.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/controller/kubelet-config/kubelet_config_nodes.go
CONFLICT (content): Merge conflict in pkg/controller/kubelet-config/kubelet_config_nodes.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0001 OCPBUGS-59958: move cleanUpDuplicatedMC to after pool loop in syncNodeConfigHandler

Details

In response to this:

/cherry-pick release-4.20
/cherry-pick release-4.19
/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

OCPBUGS-59958: move cleanUpDuplicatedMC to after pool loop in syncNod…

51e1b0d

…eConfigHandler

openshift-ci Bot requested review from umohnani8 and yuqi-zhang June 17, 2026 13:56

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 17, 2026

yuqi-zhang reviewed Jun 18, 2026

View reviewed changes

openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 18, 2026

openshift-ci Bot requested a review from asahay19 June 18, 2026 16:02

OCPBUGS-59958: Updated Unit Test, added extra getconfig check

754cec0

yuqi-zhang approved these changes Jun 22, 2026

View reviewed changes

openshift-ci Bot assigned yuqi-zhang Jun 22, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 22, 2026

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 29, 2026

openshift-merge-bot Bot merged commit 6396296 into openshift:main Jun 29, 2026
17 checks passed

openshift-cherrypick-robot mentioned this pull request Jun 29, 2026

[release-4.22] OCPBUGS-93804: move cleanUpDuplicatedMC to avoid double reboot on first updated Master node #6240

Open

openshift-cherrypick-robot mentioned this pull request Jun 29, 2026

[release-4.21] OCPBUGS-93806: move cleanUpDuplicatedMC to avoid double reboot on first updated Master node #6241

Open

openshift-cherrypick-robot mentioned this pull request Jun 29, 2026

[release-4.20] OCPBUGS-93807: move cleanUpDuplicatedMC to avoid double reboot on first updated Master node #6242

Open

openshift-cherrypick-robot mentioned this pull request Jun 29, 2026

[release-4.19] OCPBUGS-93808: move cleanUpDuplicatedMC to avoid double reboot on first updated Master node #6243

Open

proietfb mentioned this pull request Jun 29, 2026

OCPBUGS-59958: Remove cleanUpDuplicatedMC call in node.config handling #5681

Open

Uh oh!

Conversation

proietfb commented Jun 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What I did

How to verify it

Description for the changelog

Summary by CodeRabbit

Uh oh!

openshift-merge-bot Bot commented Jun 17, 2026

Uh oh!

openshift-ci-robot commented Jun 17, 2026

What I did

How to verify it

Description for the changelog

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

yuqi-zhang left a comment

Choose a reason for hiding this comment

Uh oh!

isabella-janssen commented Jun 18, 2026

Uh oh!

openshift-ci-robot commented Jun 18, 2026

Uh oh!

proietfb commented Jun 19, 2026

Uh oh!

yuqi-zhang commented Jun 19, 2026

Uh oh!

openshift-ci-robot commented Jun 22, 2026

What I did

How to verify it

Description for the changelog

Summary by CodeRabbit

Uh oh!

proietfb commented Jun 22, 2026

Uh oh!

yuqi-zhang left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-merge-bot Bot commented Jun 22, 2026

Uh oh!

openshift-ci Bot commented Jun 22, 2026

Uh oh!

yuqi-zhang commented Jun 22, 2026

Uh oh!

openshift-cherrypick-robot commented Jun 22, 2026

Uh oh!

proietfb commented Jun 25, 2026

Uh oh!

proietfb commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-cherrypick-robot commented Jun 26, 2026

Uh oh!

proietfb commented Jun 26, 2026

Uh oh!

openshift-cherrypick-robot commented Jun 26, 2026

Uh oh!

sergiordlr commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reproduce the issue

Verify the fix

Verify the fix with an actual upgrade

Uh oh!

sergiordlr commented Jun 29, 2026

Uh oh!

openshift-ci-robot commented Jun 29, 2026

Uh oh!

openshift-ci Bot commented Jun 29, 2026

Uh oh!

Uh oh!

openshift-ci-robot commented Jun 29, 2026

What I did

How to verify it

proietfb commented Jun 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

proietfb commented Jun 26, 2026 •

edited

Loading

sergiordlr commented Jun 29, 2026 •

edited

Loading