[pull] master from kubernetes:master by pull[bot] · Pull Request #1924 · next-stack/kops

pull · 2021-10-24T06:58:07Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

nodeup: skip protokube/channels assets on workers

nodeup's seedRNG called kms:GenerateRandom on every AWS node at boot and wrote the result into /dev/urandom, to guard against early-boot entropy starvation. Remove seedRNG along with the kms:GenerateRandom grant on the node, apiserver and control-plane IAM roles. On AWS this is already solved below nodeup: the kernel seeds the CRNG before nodeup runs, from CPU RDRAND/RDSEED (random.trust_cpu) and, on Nitro instances, a virtio-rng hardware device. Go's crypto/rand is backed by getrandom(2), which blocks until the CRNG is initialized, so nodeup's bootstrap key generation already gets well-seeded randomness. The old code could not have helped regardless: reaching KMS needs a TLS handshake that itself draws from crypto/rand, and a plain write to /dev/urandom mixes bytes in without crediting entropy. Removing it also drops a fatal boot-time dependency on KMS reachability and one permission from every instance role.

Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>

aws: remove KMS-based RNG seeding in nodeup

Bump CoreDNS memory on large clusters in scalability scenario

addNodeupPermissions granted both actions to every instance role, a leftover from the in-tree AWS cloud provider. The only consumers left are etcd-manager and protokube. Grant them via addEtcdManagerPermissions, and re-add ec2:DescribeInstances to the node role only where protokube runs (legacy-gossip clusters without kops-controller).

vfs: reject GCS paths without buckets

A "*" key in RegistryMirrors is now emitted as /etc/containerd/certs.d/_default/hosts.toml, the catch-all namespace containerd's hosts.md defines. Previously the literal "*" was used as the directory name and containerd silently ignored it.

aws: scope ec2:DescribeInstances/DescribeRegions to roles that use them

Support containerd v3 config schema

When set, re-applies the channel on the given duration until SIGTERM, logging per-iteration errors instead of exiting. Default 0 preserves the existing one-shot behavior for current callers (protokube, CI). Enables running channels as a long-lived workload (e.g. a static pod) without an external loop.

Relocates the control-plane node labeler from protokube to a new channels/pkg/nodelabeler package and renames it to BootstrapControlPlaneNodeLabels. Protokube still drives the call for now via the new import path. This is preparation for running channels as a static pod that owns both addon application and the labels addons target. The labeler's tainter.go scratch types are removed; the new package inlines only the patch struct it needs.

apply channel now takes one or more channel URLs and applies them sequentially per invocation. With --interval the loop iterates over the URLs each tick, mirroring protokube's old syncOnce ordering. Per-channel errors are collected via multierr so one bad channel does not stop the rest. Single-URL callers continue to work unchanged. Adds --node-name: when set, each iteration patches the named node with the mandatory control-plane labels via channels/pkg/nodelabeler. Empty --node-name skips labeling, which is the right default for one-shot CLI use from a developer's laptop. The kops-channels static pod supplies --node-name via the downward API. Together these let a single channels process own both addon application and control-plane labeling for the entire channel set, replacing protokube's per-channel subprocess fan-out and its separate labeler step.

Adds the ko-kops-channels-export Makefile target set (build, export, version-dist, dev-upload, push) cloning the kops-controller pattern, and wires kops-channels-push into cloudbuild.yaml so the staging push step pushes the new image alongside the others. Needed so channels can run as a static pod under kubelet instead of as a host binary invoked by protokube.

Adds a ChannelsBuilder that emits /etc/kubernetes/manifests/kops-channels.manifest. The pod runs one container per channel URL on a 60s interval; the bootstrap-channel container additionally patches the local node with control-plane labels via --bootstrap-node-labels and the downward API. The pod is system-node-critical because it owns the labels addons target for scheduling, and uses hostNetwork so VFS can reach the cloud metadata service before CNI is up. At this commit the static pod and protokube both apply channels in parallel; that is safe because apply is idempotent via manifest-hash annotations. The protokube side is removed in the next commit.

Now that the kops-channels static pod owns both responsibilities, drop the protokube-side reconciliation: the channels exec wrapper, the --channels and --node-name flags, the labeler call, and the host-side install of /opt/kops/bin/channels in the nodeup builder. The KubeBoot struct sheds Channels and NodeName; the sync loop is now an idle keep-alive for the gossip goroutines and will be removed alongside the legacy gossip code path.

The first apply fails while a control-plane node's apiserver is still starting; retry every 5s until it succeeds rather than waiting a full interval, which delays cluster bootstrap. Also reuse a cached kube client per iteration.

The kubelet maxPods calculation runs for AmazonVPC and Cilium-ENI networking and falls back to DefaultMachineType when the IMDS instance-type lookup fails. NewConfig only set DefaultMachineType for AmazonVPC, so a Cilium-ENI node would dereference a nil pointer if IMDS was unavailable.

Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>

e2e: Skip ImageVolume tests on COS 121

Register Karpenter nodes with karpenter.sh/unregistered taint

Add missing EC2 read permissions to Karpenter IAM policy

Two related fixes for Calico on the ForceNftables() distros (RHEL10+, Rocky10+, etc.). Load the ip_set kernel module alongside nf_tables and nf_conntrack. Calico's Felix unconditionally starts an ipsetsManager that shells out to "ipset list -name" during dataplane resync, even when NFTablesMode is Enabled. On RHEL10-family kernels ip_set is not auto-loaded, so the ipset call returns EINVAL and Felix panics in a tight loop, crashing calico-node and blocking cluster Up on every arm64 grid cell. Disable and mask firewalld via a new disableFirewalld step on FirewallBuilder, gated on Distribution.ForceNftables(). firewalld's default-reject filter_INPUT/filter_FORWARD policies and periodic-reload behavior conflict with the iptables/nftables rules CNIs install for pod and service traffic; Calico's own requirements doc and RKE2 both document that firewalld must be disabled on hosts running these CNIs. The disable/mask sequence is idempotent and a no-op where firewalld is not installed, so this is net-neutral on the cloud images that already strip firewalld (AWS RHEL/Rocky AMIs, Rocky GenericCloud) and net- positive on the GCE-optimized Rocky 10 image where firewalld ships active and breaks Calico BGP keepalives in BPF mode.

…est2 scaletest: report experiment variant after kubetest2 runs

nodeup: load ip_set module and disable firewalld on RHEL10

Added flag --api-server-size to be consistent with other machine type flags. Added doc on the flag reflecting my testing. Adding GCE test for APIServer only option. Fixed comment from previous PR. apiserver only DNS check for AWS comment is now correct. Removed k8s version flag from doc. make gen-cli-docs

Api only cli

Initially the LB sent traffic to both. The DNS None is a new case. Now we only send traffic to the APIServer in this case. This protects the Control Plane nodes to do core controller work. Remove separate tests. Regenerated docs.

Fixing LB behavior when you have both APIServer and Control Plane.

Remove namespace from DO ClusterRole

…1.29

chore(channels): promote to stable, bump node images, update recommended kOps versions

Bumps [actions/checkout](https://github.com/actions/checkout) from 6.0.3 to 7.0.0. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@df4cb1c...9c091bb) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: 7.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

…ctions/checkout-7.0.0 build(deps): bump actions/checkout from 6.0.3 to 7.0.0

In e2e, `kops create cluster --channel=alpha` reads the channel from the kops master branch, so a PR's edits to channels/alpha or channels/stable are never exercised by its own e2e jobs. When kops is built from the PR checkout, the deployer now rewrites --channel to a file:// path into that checkout's channels/ directory (defaulting to alpha when --channel is unset), so the build uses the PR's channels. Downloaded release/marker binaries don't match the checkout and keep using master's channels.

scaletest: bind etcd metrics to all interfaces

e2e: test the PR's own channels, not master's

pull Bot added the ⤵️ pull label Oct 24, 2021

k8s-ci-robot and others added 29 commits May 17, 2026 17:53

Merge pull request #18358 from hakman/skip-protokube-channels

0b02af1

nodeup: skip protokube/channels assets on workers

./hack/update-expected.sh

8081140

Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>

Support containerd config toml v3

b15ba45

Clarify containerd config file version behavior

1566c36

Update containerd config tests

1f46cef

Set the v3 runtime class path in tests

ad0977b

Preserve ConfigOverride bypass and mirror priority

a685a43

./hack/verify-gomod.sh

8b6bc2b

Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>

fix: reject GCS VFS paths without buckets

afba0cd

bump coredns memory on large clusters

3c791f7

Merge pull request #18359 from hakman/aws-remove-kms

377e92c

aws: remove KMS-based RNG seeding in nodeup

Merge pull request #18361 from Jefftree/coredns-memory-5k

9e751ea

Bump CoreDNS memory on large clusters in scalability scenario

./hack/update-expected.sh

39a0493

Merge pull request #18360 from immanuwell/fix-gcs-empty-bucket-path

ea72aec

vfs: reject GCS paths without buckets

Merge pull request #18362 from hakman/aws-scope-ec2-describe

afb5758

aws: scope ec2:DescribeInstances/DescribeRegions to roles that use them

Merge pull request #18291 from rifelpet/containerd-config-v3

6f6187e

Support containerd v3 config schema

nodeup: use shared system-component env vars for kops-channels

ee3e924

Use kms:ViaService condition on KMS data actions

749245b

hakman and others added 30 commits June 18, 2026 07:03

Register Karpenter nodes with karpenter.sh/unregistered taint

00e89e5

Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>

hack/update-expected.sh

a7a6ad3

Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>

Add missing EC2 read permissions to Karpenter IAM policy

cf56b45

Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>

Merge pull request #18483 from rifelpet/cos121-imagevolume

e4eb865

e2e: Skip ImageVolume tests on COS 121

Merge pull request #18486 from hakman/karpenter-taint

0e6067f

Register Karpenter nodes with karpenter.sh/unregistered taint

Merge pull request #18487 from hakman/karpenter-iam

a4f0fa2

Add missing EC2 read permissions to Karpenter IAM policy

scaletest: report experiment variant after kubetest2 runs

70848b4

Merge pull request #18485 from Jefftree/scaletest-variant-after-kubet…

f54e1d7

…est2 scaletest: report experiment variant after kubetest2 runs

Merge pull request #18478 from rifelpet/disable-firewalld-rhel10

340689d

nodeup: load ip_set module and disable firewalld on RHEL10

Merge pull request #18482 from cheftako/apiOnlyCli

1fcef80

Api only cli

Adding flag to enable machine type for APIServer only.

f865f25

Adding GCE test for APIServer only option.

617e2e2

Merge pull request #18496 from cheftako/apiOnlyLB

a84a7ac

Fixing LB behavior when you have both APIServer and Control Plane.

Remove namespace from DO ClusterRole

fca71a8

Merge pull request #18498 from rifelpet/do-ns

c9b93c3

Remove namespace from DO ClusterRole

chore(channels): promote alpha to stable

d446aeb

chore(channels): bump node images

101e9d0

chore(channels): recommend kOps 1.35.1 for k8s 1.30-1.35, 1.34.3 for …

7ace323

…1.29

chore(channels): add arm64 GCE and Azure noble node images

e00f4b8

./hack/update-expected.sh

d96b240

Merge pull request #18501 from hakman/channels-chores

9488c60

chore(channels): promote to stable, bump node images, update recommended kOps versions

scale-test: bind etcd metrics to all interfaces

6486f1f

Merge pull request #18505 from kubernetes/dependabot/github_actions/a…

595d5e8

…ctions/checkout-7.0.0 build(deps): bump actions/checkout from 6.0.3 to 7.0.0

Merge pull request #18503 from Jefftree/scale-etcd-metrics-listen-all

c1047de

scaletest: bind etcd metrics to all interfaces

Merge pull request #18504 from hakman/channels-e2e

7a0fb7e

e2e: test the PR's own channels, not master's

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from kubernetes:master#1924

[pull] master from kubernetes:master#1924
pull[bot] wants to merge 8253 commits into
next-stack:masterfrom
kubernetes:master

pull Bot commented Oct 24, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Conversation

pull Bot commented Oct 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

pull Bot commented Oct 24, 2021 •

edited

Loading