Skip to content

fix(shard): set numeric runAsUser defaults for pool containers#467

Open
luizfelmach wants to merge 1 commit into
multigres:mainfrom
luizfelmach:fix/pod-sec-context
Open

fix(shard): set numeric runAsUser defaults for pool containers#467
luizfelmach wants to merge 1 commit into
multigres:mainfrom
luizfelmach:fix/pod-sec-context

Conversation

@luizfelmach

Copy link
Copy Markdown

Summary

  • Fixes pool pod startup failures when runAsNonRoot=true and images use non-numeric users (postgres/nobody), which prevented kubelet non-root verification.
  • Keeps runAsNonRoot=true and now sets numeric default runAsUser/runAsGroup for pool containers (postgres, postgres-exporter, multipooler) when fsGroup is not set.
  • Preserves existing behavior: when fsGroup is provided in PoolSpec, it still takes precedence for runAsUser/runAsGroup.

Problem

Applying config/samples/minimal.yaml could fail to start pool pods with:

  • container has runAsNonRoot and image has non-numeric user ... cannot verify user is non-root
    After the initial fix, a shared-volume permission issue appeared:
  • failed to write pgbackrest-server.conf: permission denied

Root Cause

  • The operator set runAsNonRoot=true without numeric UID/GID when fsGroup was nil.
  • With runAsNonRoot=true, kubelet requires a numeric user to verify non-root.
  • Different UIDs on containers sharing /var/lib/pooler caused write permission conflicts.

Changes

  • Updated buildContainerSecurityContext to accept a fallback UID and always set numeric runAsUser/runAsGroup when fsGroup is unset.
  • Added per-container fallback IDs for pool pods:
    • postgres: 999
    • postgres-exporter: 65534
    • multipooler: 999 (aligned with postgres to avoid shared-volume permission issues)
  • Updated shard controller tests to reflect the new behavior.

Validation

  • go test ./pkg/resource-handler/controller/shard
  • go test ./...
  • make check (lint passes; any remaining failure was local toolchain/env related)

Impact

  • minimal.yaml works out of the box without requiring fsGroup edits.
  • Security posture is preserved (runAsNonRoot=true remains default).
  • Reduces shared-volume permission errors in pool pods.

Backward Compatibility

  • No API/CRD changes.
  • Existing configs that set fsGroup continue to work and still override fallback IDs.

Issue

Closes #466

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CreateContainerConfigError on Kind: non-numeric user (postgres) with runAsNonRoot

1 participant