Skip to content

Use runtime.GOMAXPROCS(0) instead of runtime.NumCPU() to set --max-concurrent-reconciles default #222

@rh-iwalker

Description

@rh-iwalker

Bug Report

What did you do?

Ansible operator plugin should consider using runtime.GOMAXPROCS(0) to set --max-concurrent-reconciles instead of runtime.NumCPU() in https://github.com/operator-framework/ansible-operator-plugins/blob/main/internal/ansible/flags/flag.go#L109.

What did you expect to see?

--max-concurrent-reconciles should take into account the pod's CPU resource limits so as to not start too many simultaneous playbooks.

What did you see instead? Under which circumstances?

runtime.NumCPU() will return the number of logical CPUs usable by the current process, which does not seem to be cgroup-aware. This means that --max-concurrent-reconciles can be set to a large number on OpenShift nodes with a large number of CPUs. This can cause the Ansible operator to start up too many playbooks simultaneously, eating up memory and leading to the pod being OOM killed.

On an OpenShift compute node with 128 CPUs, the operator set the default --max-concurrent-reconciles to 128. This particular environment had over 30 AnsibleAutomationPlatformBackup resources, which meant the operator tried to start over 30 simultaneous playbooks. This used up quite a bit of memory, causing the pod to exceed the 4000Mi resource limit which lead to it being OOM killed.

The pod that was OOM killed did have CPU resource limits set, but the operator did not seem to take those into account when determining the maximum number of concurrent reconciles.

resources:        
  limits:         
    cpu: "2"      
    memory: 4000Mi
  requests:       
    cpu: 10m      
    memory: 256Mi 

Environment

Kubernetes cluster type:

OpenShift

This behavior was observed with Ansible Automation Platform operator 2.5 using Go 1.21.13.

ansible-operator version: "v1.31.0-ocp", commit: "731dca792e1343af155b82bc3c34a5800ee863af", kubernetes version: "v1.26.0", go version: "go1.21.13 (Red Hat 1.21.13-3.module+el8.10.0+22345+acdd8d0e) X:strictfipsruntime", GOOS: "linux", GOARCH: "amd64"

Possible Solution

Switch to using runtime.GOMAXPROCS(0) which seems to be cgroup-aware since Go 1.25.

For versions less than 1.25, using runtime.GOMAXPROCS(0) seems to return the same value as runtime.NumCPU().

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions