Skip to content

[pull] master from kubernetes:master#1924

Open
pull[bot] wants to merge 8253 commits into
next-stack:masterfrom
kubernetes:master
Open

[pull] master from kubernetes:master#1924
pull[bot] wants to merge 8253 commits into
next-stack:masterfrom
kubernetes:master

Conversation

@pull

@pull pull Bot commented Oct 24, 2021

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull Bot added the ⤵️ pull label Oct 24, 2021
k8s-ci-robot and others added 29 commits May 17, 2026 17:53
nodeup: skip protokube/channels assets on workers
nodeup's seedRNG called kms:GenerateRandom on every AWS node at boot
and wrote the result into /dev/urandom, to guard against early-boot
entropy starvation. Remove seedRNG along with the kms:GenerateRandom
grant on the node, apiserver and control-plane IAM roles.
On AWS this is already solved below nodeup: the kernel seeds the CRNG
before nodeup runs, from CPU RDRAND/RDSEED (random.trust_cpu) and, on
Nitro instances, a virtio-rng hardware device. Go's crypto/rand is
backed by getrandom(2), which blocks until the CRNG is initialized,
so nodeup's bootstrap key generation already gets well-seeded
randomness. The old code could not have helped regardless: reaching
KMS needs a TLS handshake that itself draws from crypto/rand, and a
plain write to /dev/urandom mixes bytes in without crediting entropy.
Removing it also drops a fatal boot-time dependency on KMS
reachability and one permission from every instance role.
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
aws: remove KMS-based RNG seeding in nodeup
Bump CoreDNS memory on large clusters in scalability scenario
addNodeupPermissions granted both actions to every instance role, a
leftover from the in-tree AWS cloud provider. The only consumers left
are etcd-manager and protokube. Grant them via addEtcdManagerPermissions,
and re-add ec2:DescribeInstances to the node role only where protokube
runs (legacy-gossip clusters without kops-controller).
A "*" key in RegistryMirrors is now emitted as
/etc/containerd/certs.d/_default/hosts.toml, the catch-all namespace
containerd's hosts.md defines. Previously the literal "*" was used as
the directory name and containerd silently ignored it.
aws: scope ec2:DescribeInstances/DescribeRegions to roles that use them
Support containerd v3 config schema
When set, re-applies the channel on the given duration until SIGTERM,
logging per-iteration errors instead of exiting. Default 0 preserves the
existing one-shot behavior for current callers (protokube, CI). Enables
running channels as a long-lived workload (e.g. a static pod) without
an external loop.
Relocates the control-plane node labeler from protokube to a new
channels/pkg/nodelabeler package and renames it to
BootstrapControlPlaneNodeLabels. Protokube still drives the call for
now via the new import path. This is preparation for running channels
as a static pod that owns both addon application and the labels addons
target.

The labeler's tainter.go scratch types are removed; the new package
inlines only the patch struct it needs.
apply channel now takes one or more channel URLs and applies them
sequentially per invocation. With --interval the loop iterates over the
URLs each tick, mirroring protokube's old syncOnce ordering. Per-channel
errors are collected via multierr so one bad channel does not stop the
rest. Single-URL callers continue to work unchanged.

Adds --node-name: when set, each iteration patches the named node with
the mandatory control-plane labels via channels/pkg/nodelabeler. Empty
--node-name skips labeling, which is the right default for one-shot
CLI use from a developer's laptop. The kops-channels static pod supplies
--node-name via the downward API.

Together these let a single channels process own both addon application
and control-plane labeling for the entire channel set, replacing
protokube's per-channel subprocess fan-out and its separate labeler
step.
Adds the ko-kops-channels-export Makefile target set (build, export,
version-dist, dev-upload, push) cloning the kops-controller pattern,
and wires kops-channels-push into cloudbuild.yaml so the staging push
step pushes the new image alongside the others. Needed so channels
can run as a static pod under kubelet instead of as a host binary
invoked by protokube.
Adds a ChannelsBuilder that emits /etc/kubernetes/manifests/kops-channels.manifest.
The pod runs one container per channel URL on a 60s interval; the
bootstrap-channel container additionally patches the local node with
control-plane labels via --bootstrap-node-labels and the downward API.
The pod is system-node-critical because it owns the labels addons target
for scheduling, and uses hostNetwork so VFS can reach the cloud
metadata service before CNI is up.

At this commit the static pod and protokube both apply channels in
parallel; that is safe because apply is idempotent via manifest-hash
annotations. The protokube side is removed in the next commit.
Now that the kops-channels static pod owns both responsibilities, drop
the protokube-side reconciliation: the channels exec wrapper, the
--channels and --node-name flags, the labeler call, and the host-side
install of /opt/kops/bin/channels in the nodeup builder. The KubeBoot
struct sheds Channels and NodeName; the sync loop is now an idle
keep-alive for the gossip goroutines and will be removed alongside the
legacy gossip code path.
The first apply fails while a control-plane node's apiserver is still
starting; retry every 5s until it succeeds rather than waiting a full
interval, which delays cluster bootstrap. Also reuse a cached kube
client per iteration.
The kubelet maxPods calculation runs for AmazonVPC and Cilium-ENI
networking and falls back to DefaultMachineType when the IMDS
instance-type lookup fails. NewConfig only set DefaultMachineType for
AmazonVPC, so a Cilium-ENI node would dereference a nil pointer if IMDS
was unavailable.
hakman and others added 30 commits June 18, 2026 07:03
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
e2e: Skip ImageVolume tests on COS 121
Register Karpenter nodes with karpenter.sh/unregistered taint
Add missing EC2 read permissions to Karpenter IAM policy
Two related fixes for Calico on the ForceNftables() distros (RHEL10+,
Rocky10+, etc.).

Load the ip_set kernel module alongside nf_tables and nf_conntrack.
Calico's Felix unconditionally starts an ipsetsManager that shells out
to "ipset list -name" during dataplane resync, even when NFTablesMode
is Enabled. On RHEL10-family kernels ip_set is not auto-loaded, so the
ipset call returns EINVAL and Felix panics in a tight loop, crashing
calico-node and blocking cluster Up on every arm64 grid cell.

Disable and mask firewalld via a new disableFirewalld step on
FirewallBuilder, gated on Distribution.ForceNftables(). firewalld's
default-reject filter_INPUT/filter_FORWARD policies and periodic-reload
behavior conflict with the iptables/nftables rules CNIs install for
pod and service traffic; Calico's own requirements doc and RKE2 both
document that firewalld must be disabled on hosts running these CNIs.
The disable/mask sequence is idempotent and a no-op where firewalld is
not installed, so this is net-neutral on the cloud images that already
strip firewalld (AWS RHEL/Rocky AMIs, Rocky GenericCloud) and net-
positive on the GCE-optimized Rocky 10 image where firewalld ships
active and breaks Calico BGP keepalives in BPF mode.
…est2

scaletest: report experiment variant after kubetest2 runs
nodeup: load ip_set module and disable firewalld on RHEL10
Added flag --api-server-size to be consistent with other machine type
flags.
Added doc on the flag reflecting my testing.
Adding GCE test for APIServer only option.
Fixed comment from previous PR.
apiserver only DNS check for AWS comment is now correct.
Removed k8s version flag from doc.
make gen-cli-docs
Initially the LB sent traffic to both.
The DNS None is a new case.
Now we only send traffic to the APIServer in this case.
This protects the Control Plane nodes to do core controller work.
Remove separate tests.
Regenerated docs.
Fixing LB behavior when you have both APIServer and Control Plane.
Remove namespace from DO ClusterRole
chore(channels): promote to stable, bump node images, update recommended kOps versions
Bumps [actions/checkout](https://github.com/actions/checkout) from 6.0.3 to 7.0.0.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@df4cb1c...9c091bb)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: 7.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
…ctions/checkout-7.0.0

build(deps): bump actions/checkout from 6.0.3 to 7.0.0
In e2e, `kops create cluster --channel=alpha` reads the channel from the
kops master branch, so a PR's edits to channels/alpha or channels/stable
are never exercised by its own e2e jobs.

When kops is built from the PR checkout, the deployer now rewrites
--channel to a file:// path into that checkout's channels/ directory
(defaulting to alpha when --channel is unset), so the build uses the PR's
channels. Downloaded release/marker binaries don't match the checkout and
keep using master's channels.
scaletest: bind etcd metrics to all interfaces
e2e: test the PR's own channels, not master's
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.