OCPBUGS-59958: Remove cleanUpDuplicatedMC call in node.config handling#5681
OCPBUGS-59958: Remove cleanUpDuplicatedMC call in node.config handling#5681yuqi-zhang wants to merge 1 commit into
Conversation
This function was originally intended to remove duplicate MCs that somehow was no longer tracked by the controller. I don't think it should be used in general, and this removal here was causing the controller to be confused and deleting valid 97-POOL-generated-kubeletconfig machineconfig objects, since the new controller hasn't yet had a chance to sync and stamp its controller version, causing multiple reboots. Looks like we did this change in openshift#3563 (comment), and the behaviour seems to have changed since then, so I would expect this to be a safe change as the cgroups config is always being generated now. I would lean towards removing cleanUpDuplicatedMC entirely, but based on the comment, it looks like it is catching some unmanaged MC scenarios (e.g. deletion of a custom MCP), so will not do that for a backporting fix until we can sort that out. Note that 97-generated only applies for master/worker/arbiter pools, so I think that shouldn't be a problem.
|
@yuqi-zhang: This pull request references Jira Issue OCPBUGS-59958, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sairameshv, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold Will fix up the tests |
|
@yuqi-zhang: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/jira refresh The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target different OCP), recalculating validity. |
|
@openshift-bot: This pull request references Jira Issue OCPBUGS-59958, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@yuqi-zhang given that #6201 has been merged I'd close this PR. What do you think about? |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This function was originally intended to remove duplicate MCs that somehow was no longer tracked by the controller. I don't think it should be used in general, and this removal here was causing the controller to be confused and deleting valid 97-POOL-generated-kubeletconfig machineconfig objects, since the new controller hasn't yet had a chance to sync and stamp its controller version, causing multiple reboots.
Looks like we did this change in
#3563 (comment), and the behaviour seems to have changed since then, so I would expect this to be a safe change as the cgroups config is always being generated now. I would lean towards removing cleanUpDuplicatedMC entirely, but based on the comment, it looks like it is catching some unmanaged MC scenarios (e.g. deletion of a custom MCP), so will not do that for a backporting fix until we can sort that out. Note that 97-generated only applies for master/worker/arbiter pools, so I think that shouldn't be a problem.
cc @sairameshv in case I am missing something from the original context