[pull] master from kubernetes:master#1924
Open
pull[bot] wants to merge 8253 commits into
Open
Conversation
nodeup: skip protokube/channels assets on workers
nodeup's seedRNG called kms:GenerateRandom on every AWS node at boot and wrote the result into /dev/urandom, to guard against early-boot entropy starvation. Remove seedRNG along with the kms:GenerateRandom grant on the node, apiserver and control-plane IAM roles. On AWS this is already solved below nodeup: the kernel seeds the CRNG before nodeup runs, from CPU RDRAND/RDSEED (random.trust_cpu) and, on Nitro instances, a virtio-rng hardware device. Go's crypto/rand is backed by getrandom(2), which blocks until the CRNG is initialized, so nodeup's bootstrap key generation already gets well-seeded randomness. The old code could not have helped regardless: reaching KMS needs a TLS handshake that itself draws from crypto/rand, and a plain write to /dev/urandom mixes bytes in without crediting entropy. Removing it also drops a fatal boot-time dependency on KMS reachability and one permission from every instance role.
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
aws: remove KMS-based RNG seeding in nodeup
Bump CoreDNS memory on large clusters in scalability scenario
addNodeupPermissions granted both actions to every instance role, a leftover from the in-tree AWS cloud provider. The only consumers left are etcd-manager and protokube. Grant them via addEtcdManagerPermissions, and re-add ec2:DescribeInstances to the node role only where protokube runs (legacy-gossip clusters without kops-controller).
vfs: reject GCS paths without buckets
A "*" key in RegistryMirrors is now emitted as /etc/containerd/certs.d/_default/hosts.toml, the catch-all namespace containerd's hosts.md defines. Previously the literal "*" was used as the directory name and containerd silently ignored it.
aws: scope ec2:DescribeInstances/DescribeRegions to roles that use them
Support containerd v3 config schema
When set, re-applies the channel on the given duration until SIGTERM, logging per-iteration errors instead of exiting. Default 0 preserves the existing one-shot behavior for current callers (protokube, CI). Enables running channels as a long-lived workload (e.g. a static pod) without an external loop.
Relocates the control-plane node labeler from protokube to a new channels/pkg/nodelabeler package and renames it to BootstrapControlPlaneNodeLabels. Protokube still drives the call for now via the new import path. This is preparation for running channels as a static pod that owns both addon application and the labels addons target. The labeler's tainter.go scratch types are removed; the new package inlines only the patch struct it needs.
apply channel now takes one or more channel URLs and applies them sequentially per invocation. With --interval the loop iterates over the URLs each tick, mirroring protokube's old syncOnce ordering. Per-channel errors are collected via multierr so one bad channel does not stop the rest. Single-URL callers continue to work unchanged. Adds --node-name: when set, each iteration patches the named node with the mandatory control-plane labels via channels/pkg/nodelabeler. Empty --node-name skips labeling, which is the right default for one-shot CLI use from a developer's laptop. The kops-channels static pod supplies --node-name via the downward API. Together these let a single channels process own both addon application and control-plane labeling for the entire channel set, replacing protokube's per-channel subprocess fan-out and its separate labeler step.
Adds the ko-kops-channels-export Makefile target set (build, export, version-dist, dev-upload, push) cloning the kops-controller pattern, and wires kops-channels-push into cloudbuild.yaml so the staging push step pushes the new image alongside the others. Needed so channels can run as a static pod under kubelet instead of as a host binary invoked by protokube.
Adds a ChannelsBuilder that emits /etc/kubernetes/manifests/kops-channels.manifest. The pod runs one container per channel URL on a 60s interval; the bootstrap-channel container additionally patches the local node with control-plane labels via --bootstrap-node-labels and the downward API. The pod is system-node-critical because it owns the labels addons target for scheduling, and uses hostNetwork so VFS can reach the cloud metadata service before CNI is up. At this commit the static pod and protokube both apply channels in parallel; that is safe because apply is idempotent via manifest-hash annotations. The protokube side is removed in the next commit.
Now that the kops-channels static pod owns both responsibilities, drop the protokube-side reconciliation: the channels exec wrapper, the --channels and --node-name flags, the labeler call, and the host-side install of /opt/kops/bin/channels in the nodeup builder. The KubeBoot struct sheds Channels and NodeName; the sync loop is now an idle keep-alive for the gossip goroutines and will be removed alongside the legacy gossip code path.
The first apply fails while a control-plane node's apiserver is still starting; retry every 5s until it succeeds rather than waiting a full interval, which delays cluster bootstrap. Also reuse a cached kube client per iteration.
The kubelet maxPods calculation runs for AmazonVPC and Cilium-ENI networking and falls back to DefaultMachineType when the IMDS instance-type lookup fails. NewConfig only set DefaultMachineType for AmazonVPC, so a Cilium-ENI node would dereference a nil pointer if IMDS was unavailable.
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
e2e: Skip ImageVolume tests on COS 121
Register Karpenter nodes with karpenter.sh/unregistered taint
Add missing EC2 read permissions to Karpenter IAM policy
Two related fixes for Calico on the ForceNftables() distros (RHEL10+, Rocky10+, etc.). Load the ip_set kernel module alongside nf_tables and nf_conntrack. Calico's Felix unconditionally starts an ipsetsManager that shells out to "ipset list -name" during dataplane resync, even when NFTablesMode is Enabled. On RHEL10-family kernels ip_set is not auto-loaded, so the ipset call returns EINVAL and Felix panics in a tight loop, crashing calico-node and blocking cluster Up on every arm64 grid cell. Disable and mask firewalld via a new disableFirewalld step on FirewallBuilder, gated on Distribution.ForceNftables(). firewalld's default-reject filter_INPUT/filter_FORWARD policies and periodic-reload behavior conflict with the iptables/nftables rules CNIs install for pod and service traffic; Calico's own requirements doc and RKE2 both document that firewalld must be disabled on hosts running these CNIs. The disable/mask sequence is idempotent and a no-op where firewalld is not installed, so this is net-neutral on the cloud images that already strip firewalld (AWS RHEL/Rocky AMIs, Rocky GenericCloud) and net- positive on the GCE-optimized Rocky 10 image where firewalld ships active and breaks Calico BGP keepalives in BPF mode.
…est2 scaletest: report experiment variant after kubetest2 runs
nodeup: load ip_set module and disable firewalld on RHEL10
Added flag --api-server-size to be consistent with other machine type flags. Added doc on the flag reflecting my testing. Adding GCE test for APIServer only option. Fixed comment from previous PR. apiserver only DNS check for AWS comment is now correct. Removed k8s version flag from doc. make gen-cli-docs
Api only cli
Initially the LB sent traffic to both. The DNS None is a new case. Now we only send traffic to the APIServer in this case. This protects the Control Plane nodes to do core controller work. Remove separate tests. Regenerated docs.
Fixing LB behavior when you have both APIServer and Control Plane.
Remove namespace from DO ClusterRole
chore(channels): promote to stable, bump node images, update recommended kOps versions
Bumps [actions/checkout](https://github.com/actions/checkout) from 6.0.3 to 7.0.0. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@df4cb1c...9c091bb) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: 7.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
…ctions/checkout-7.0.0 build(deps): bump actions/checkout from 6.0.3 to 7.0.0
In e2e, `kops create cluster --channel=alpha` reads the channel from the kops master branch, so a PR's edits to channels/alpha or channels/stable are never exercised by its own e2e jobs. When kops is built from the PR checkout, the deployer now rewrites --channel to a file:// path into that checkout's channels/ directory (defaulting to alpha when --channel is unset), so the build uses the PR's channels. Downloaded release/marker binaries don't match the checkout and keep using master's channels.
scaletest: bind etcd metrics to all interfaces
e2e: test the PR's own channels, not master's
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )