crio: conditionally enable min_injected_gomaxprocs for high-CPU MCPs#6225
crio: conditionally enable min_injected_gomaxprocs for high-CPU MCPs#6225haircommander wants to merge 1 commit into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
WalkthroughThe PR extends the container-runtime-config controller to automatically emit a CRI-O drop-in configuration that enables ChangesCPU-aware CRI-O drop-in generation
Sequence Diagram(s)sequenceDiagram
participant Controller
participant getMaxCPUCountFromPool
participant KubeClient
Controller->>getMaxCPUCountFromPool: pool.nodeSelector
getMaxCPUCountFromPool->>KubeClient: list nodes with label selector
KubeClient-->>getMaxCPUCountFromPool: nodes with CPU capacity
getMaxCPUCountFromPool-->>Controller: max CPU value
Controller->>Controller: maxCPU >= 128?
alt CPU threshold met
Controller->>Controller: generate TOML drop-in<br/>min_injected_gomaxprocs=1
Controller->>Controller: append to configFileList
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 13 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (13 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: haircommander The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/controller/container-runtime-config/helpers_test.go (1)
2974-3190: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd an explicit API list-error test case for this helper.
Current cases validate selector and empty-match behavior, but not the node-list failure branch. A fake client reactor that forces
nodes/listto fail would lock in the expected reconcile behavior and prevent regressions.🧪 Example test addition
+{ + name: "node list API error", + pool: &mcfgv1.MachineConfigPool{ + ObjectMeta: metav1.ObjectMeta{Name: "worker"}, + Spec: mcfgv1.MachineConfigPoolSpec{ + NodeSelector: &metav1.LabelSelector{ + MatchLabels: map[string]string{"node-role.kubernetes.io/worker": ""}, + }, + }, + }, + nodes: []corev1.Node{}, + expectedCPU: 0, + setupClient: func(c *k8sfake.Clientset) { + c.PrependReactor("list", "nodes", func(action k8stesting.Action) (bool, runtime.Object, error) { + return true, nil, fmt.Errorf("simulated list error") + }) + }, +},🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/container-runtime-config/helpers_test.go` around lines 2974 - 3190, The TestGetMaxCPUCountFromPool test function is missing a test case that covers the failure path when the Kubernetes API call to list nodes fails. Add a new test case in the tests slice that configures the fake client to return an error when listing nodes, then verify the function handles this gracefully by either logging an error or returning an appropriate value. Use the fake client's reactor mechanism to inject a failure on the nodes/list operation, and update the test loop to check for expectedErrLog when present to validate error logging behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`:
- Around line 825-828: The error handling for the addTOMLgeneratedConfigFile
call when creating the min_injected_gomaxprocs config currently logs the error
and continues with the else block, allowing reconciliation to proceed even when
drop-in file generation fails. Instead of logging and continuing, return an
error from the reconciliation function when addTOMLgeneratedConfigFile returns
an error for crioDropInFilePathMinInjectedGomaxprocs, ensuring that
reconciliation fails explicitly and the pool configuration is not left in an
incomplete state.
---
Nitpick comments:
In `@pkg/controller/container-runtime-config/helpers_test.go`:
- Around line 2974-3190: The TestGetMaxCPUCountFromPool test function is missing
a test case that covers the failure path when the Kubernetes API call to list
nodes fails. Add a new test case in the tests slice that configures the fake
client to return an error when listing nodes, then verify the function handles
this gracefully by either logging an error or returning an appropriate value.
Use the fake client's reactor mechanism to inject a failure on the nodes/list
operation, and update the test loop to check for expectedErrLog when present to
validate error logging behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 05775bb9-e077-432b-a8dd-69035fc07c07
📒 Files selected for processing (3)
pkg/controller/container-runtime-config/container_runtime_config_controller.gopkg/controller/container-runtime-config/helpers.gopkg/controller/container-runtime-config/helpers_test.go
Go processes on nodes with many CPUs suffer from excessive runtime threads when GOMAXPROCS isn't set. The go runtime creates threads per CPU it can see, but the kubernetes scheduler binpacks based on CPU requests, leading to scheduling and GC latency issues. CRI-O's min_injected_gomaxprocs feature injects the GOMAXPROCS environment variable into all pods based on their CPU request (set to 2*requested CPUs to allow bursting while mitigating latency). This change automatically enables min_injected_gomaxprocs for any pool (master, worker, arbiter, or custom) where any node has >= 128 CPUs. The controller: - Queries nodes in each pool to find max CPU count - Generates a CRIO drop-in config file when CPU >= 128 - Sets min_injected_gomaxprocs = 1 to enable the feature This approach is pool-aware and only affects high-CPU nodes where the GOMAXPROCS issue is most impactful. Signed-off-by: Peter Hunt <pehunt@redhat.com>
b40459b to
6e5fdb6
Compare
|
@harche PTAL |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
pkg/controller/container-runtime-config/helpers_test.go (2)
2974-3191: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winVerify or remove the unused
expectedErrLogfield.The test table includes an
expectedErrLogfield (e.g., line 3170 sets it totruefor the "invalid node selector" case), but the test body never asserts that the error was logged. This makes the field misleading and adds confusion to the test.Either:
- Remove the
expectedErrLogfield if error logging verification is not needed, or- Capture and verify klog output to ensure errors are logged as expected
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/container-runtime-config/helpers_test.go` around lines 2974 - 3191, The TestGetMaxCPUCountFromPool test defines an expectedErrLog field in the test struct (set to true for the "invalid node selector" case) but never verifies or asserts this field in the test body. Either remove the expectedErrLog field from the test struct across all test cases since error logging is not being validated, or add logic to the test body to capture and verify klog output when expectedErrLog is true to ensure errors are properly logged for invalid selectors.
2974-3191: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winConsider adding a test case for nodes with missing CPU capacity.
The current test cases assume all nodes have
Status.Capacity[corev1.ResourceCPU]populated. Consider adding a test case where a node is missing this field to verify thatcpuQuantity.Value()handles it gracefully (likely returns 0).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/container-runtime-config/helpers_test.go` around lines 2974 - 3191, Add a new test case to the tests slice in the TestGetMaxCPUCountFromPool function that covers a scenario where a node has an empty or missing CPU capacity field. Create a test case similar to the existing ones (with name, pool, nodes, and expectedCPU fields) where at least one node either has an empty Status.Capacity ResourceList or lacks the corev1.ResourceCPU entry. Set the expectedCPU to 0 or an appropriate value to verify that the getMaxCPUCountFromPool function gracefully handles nodes without CPU capacity information without crashing or returning unexpected values.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/controller/container-runtime-config/helpers_test.go`:
- Around line 2974-3191: The TestGetMaxCPUCountFromPool test defines an
expectedErrLog field in the test struct (set to true for the "invalid node
selector" case) but never verifies or asserts this field in the test body.
Either remove the expectedErrLog field from the test struct across all test
cases since error logging is not being validated, or add logic to the test body
to capture and verify klog output when expectedErrLog is true to ensure errors
are properly logged for invalid selectors.
- Around line 2974-3191: Add a new test case to the tests slice in the
TestGetMaxCPUCountFromPool function that covers a scenario where a node has an
empty or missing CPU capacity field. Create a test case similar to the existing
ones (with name, pool, nodes, and expectedCPU fields) where at least one node
either has an empty Status.Capacity ResourceList or lacks the corev1.ResourceCPU
entry. Set the expectedCPU to 0 or an appropriate value to verify that the
getMaxCPUCountFromPool function gracefully handles nodes without CPU capacity
information without crashing or returning unexpected values.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 7b6f6075-c50f-4163-aa99-bffc5b45eb48
📒 Files selected for processing (3)
pkg/controller/container-runtime-config/container_runtime_config_controller.gopkg/controller/container-runtime-config/helpers.gopkg/controller/container-runtime-config/helpers_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/controller/container-runtime-config/helpers.go
- pkg/controller/container-runtime-config/container_runtime_config_controller.go
|
@haircommander: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| }) | ||
| if err != nil { | ||
| klog.V(2).Infof("error listing nodes for pool %s: %v", pool.Name, err) | ||
| return 0 |
There was a problem hiding this comment.
This collapses every failure into 0, which is the same value as "no big nodes," and 0 means the drop-in gets dropped from the rendered MachineConfig. So a transient node List error during a sync removes the config and reboots the whole pool, then the next successful sync adds it back and reboots again. A blip on the API shouldn't be able to roll a pool. Can we propagate the error and degrade through syncStatusOnly the same way the TOML path below does, instead of swallowing it?
| } | ||
|
|
||
| // List nodes matching the pool's node selector | ||
| nodes, err := kubeClient.CoreV1().Nodes().List(context.TODO(), metav1.ListOptions{ |
There was a problem hiding this comment.
This does a full node List against the API server on every sync, once per pool. The controller already runs informers/listers for its other watched types, but there's no node informer wired in today, so this call is uncached. Wiring a node lister and reading from cache here would cut the per-sync API load and also make the transient failure above much less likely to flap the config. Worth weighing against the cost of adding a node watch to this controller.
|
|
||
| // For pools with >= 128 CPUs, enable min_injected_gomaxprocs | ||
| maxCPU := getMaxCPUCountFromPool(ctrl.kubeClient, pool) | ||
| if maxCPU >= 128 { |
There was a problem hiding this comment.
Quick design question on this block. Since it reads the live node CPU count on every sync and the answer lands in a MachineConfig, the on/off state can change from things that are not config edits, and each change reboots the pool. For example, if an autoscaler takes a worker pool from 128 CPU nodes down to 64 and later back up, the file gets removed and then added back, so the pool reboots on each change. A fresh pool whose nodes have not registered yet also reads as 0 first and flips on once they join.
Would it be simpler to keep MCO out of the CPU counting and just turn the feature on here, then let CRI-O decide per node at container creation? CRI-O already runs on the node and sits in the injection path (gomaxprocs_hooks_linux.go), so it could skip injection on smaller nodes itself. Then the MachineConfig never changes based on cluster state, the reboot churn goes away, and getMaxCPUCountFromPool plus the kubeClient plumbing would not be needed. Is there a reason the 128 threshold needs to live in MCO, or would moving that decision into CRI-O work for your case?
| pool *mcfgv1.MachineConfigPool | ||
| nodes []corev1.Node | ||
| expectedCPU int64 | ||
| expectedErrLog bool |
There was a problem hiding this comment.
expectedErrLog is set but never asserted on, so it's dead right now. Either assert against captured log output or drop the field. Also, no test covers the new branch in syncContainerRuntimeConfig itself (drop-in appended at >=128, and the degrade path on TOML failure), which is the riskier wiring.
|
/hold approach may change actually openshift/enhancements#2047 |
- What I did
Go processes run on nodes with many CPUs suffer a problem where the go runtime creates runtime threads per CPUs it can see, assuming there isn't a CPU limit on the process. However, the kubernetes scheduler is binpacking pods based on CPU request, which means go processes actually have access to less CPU time than the go runtime is expecting. This causes scheduling and GC latency.
min_injected_gomaxprocs is a feature in CRI-O that injects the GOMAXPROCS environment variable into all pods based on their CPU request. if the pod has a limit, GOMAXPROCS isn't set. Otherwise, it is set to 2requested CPUs. The 2 figure was found from perf testing, allowing the go runtime to burst up to capacity, while mitigating the latency.
Add code to inject min_injected_gomaxprocs if any node in the pool has more than 128 CPUs
alternative to #6177
- How to verify it
- Description for the changelog
Summary by CodeRabbit
Release Notes
New Features
min_injected_gomaxprocswhen pools have 128+ CPUs. If TOML drop-in generation fails, the update stops to avoid partial configuration.Tests