KEP-6003: Add KEP for configurable HPA sync period by Fedosin · Pull Request #6004 · kubernetes/enhancements

Fedosin · 2026-04-08T14:23:14Z

One-line PR description: Introduce KEP proposing an optional syncPeriodSeconds field on HorizontalPodAutoscalerBehavior to allow per-HPA override of the global --horizontal-pod-autoscaler-sync-period.

Issue link: Configurable sync period for HPA #6003

Other comments: [WIP] Add per-HPA configurable sync period via syncPeriodSeconds field kubernetes#138222

k8s-ci-robot · 2026-04-08T14:23:24Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Fedosin
Once this PR has been reviewed and has the lgtm label, please assign towca for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

keps/sig-autoscaling/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

adrianmoisey · 2026-04-09T09:51:39Z

+per-item interval during reconciliation and cleans it up on HPA deletion.
+
+The informer event handlers are updated so that:
+- Newly created HPAs and spec changes (detected via `Generation` comparison)


HPA's generation fields are currently not set.
I have kubernetes/kubernetes#138228 to get this fixed

Good catch, thank you! I've added a note in the Design Details section acknowledging the dependency on kubernetes/kubernetes#138228 for properly incrementing the HPA Generation field on spec changes.

jm-franc · 2026-04-10T18:39:52Z

+We propose to add a new field to the existing [`HorizontalPodAutoscalerBehavior`][] object:
+
+- `syncPeriodSeconds`: *(int32)* the period in seconds between each
+  reconciliation of this HPA. Must be greater than 0 and less than or equal


This will allow this HPA to query more frequently the metric source, which is usually metrics-server, which caches its metrics. The end result is that in most situations reducing this sync period won't have much impact. Conceivably you could add a field that only targets custom/external metrics, although my suspicion is that will result in a non-intuitive API.

Great point. I've added a new "Interaction with metrics sources" paragraph to the Proposal section addressing this. It explains that for resource metrics served by metrics-server (which caches values at ~60s intervals), syncing the HPA faster than the metrics collection interval won't yield fresher data. Reducing the sync period is most impactful when used with custom or external metrics providers that expose rapidly updating values. Users are advised to consider their metrics pipeline latency when choosing a syncPeriodSeconds value.

I considered scoping the field to custom/external metrics only, but as you noted, that would result in a non-intuitive API -- users would need to reason about which metric types their HPA uses before setting the field. A single field that applies uniformly is simpler, and the documentation clarifies when it's actually beneficial.

jm-franc · 2026-04-10T19:16:29Z

+  periods (e.g. 1s) for many HPAs, increasing the rate of metrics queries
+  and scale sub-resource calls. This is mitigated by:
+  - Validation bounds: `syncPeriodSeconds` must be >= 1 and <= 3600.
+  - Cluster administrators can use admission webhooks or policies to enforce


Wouldn't a meaningful webhook require to calculate the aggregate sync frequency across all HPAs in the cluster? For users, this looks difficult to reason about.

Do you see another way such a webhook could implemented? Perhaps it could be part of this KEP to ensure this feature is safe by default?

You're right that a webhook calculating aggregate frequency would be hard to reason about. I've reworked the Risks and Mitigations section to present a multi-layered safety story instead:

Validation bounds (>= 1, <= 3600) with a note that we may raise the lower bound before Beta based on real-world usage data.

Feature gate -- in Alpha, cluster admins have explicit opt-in control.

Best-effort semantics -- the field is a target interval, not a hard guarantee. The controller won't queue additional work if reconciliation takes longer than the configured period, preventing workqueue saturation.

Policy enforcement -- I've replaced the vague "admission webhooks" suggestion with a concrete ValidatingAdmissionPolicy example using a simple CEL rule that enforces a per-HPA floor (e.g. syncPeriodSeconds >= 10). This doesn't require reasoning about aggregate frequency -- it's a straightforward per-object check.

This makes the feature safe by default while still giving cluster admins a simple knob if they want tighter control.

jm-franc · 2026-04-10T19:23:00Z

+```
+
+The per-HPA sync frequency is implemented via a new `PerItemIntervalRateLimiter`
+in the HPA controller's workqueue. This rate limiter supports per-key interval


Wouldn't metrics-server and custom/external metrics API need to respond in less than 1s to ensure the workqueue doesn't saturate? This looks like a very short timeout. Do you see how to prevent the workqueue from blocking? Perhaps this new field should only be a hint (i.e. a best-effort goal)?

Good concern. I've added a paragraph in Design Details clarifying that syncPeriodSeconds is a best-effort target interval, not a hard real-time guarantee. Specifically:

If a reconciliation cycle (including metrics queries and scale sub-resource calls) takes longer than the configured period, the controller will start the next cycle immediately after the current one completes rather than queuing up additional work.

This means the workqueue cannot saturate even if the metrics backend is slow or the configured period is shorter than the end-to-end reconciliation latency.

The rate limiter only re-enqueues a key after the current reconciliation for that key finishes, so there is at most one pending item per HPA in the queue at any time.

So effectively, syncPeriodSeconds is already a hint/best-effort goal by design. I've also reflected this in the Risks and Mitigations section.

Introduce KEP proposing an optional syncPeriodSeconds field on HorizontalPodAutoscalerBehavior to allow per-HPA override of the global --horizontal-pod-autoscaler-sync-period.

wozniakjan

<3 from KEDA

wozniakjan · 2026-04-22T11:19:12Z

+3. Look for warnings and errors which might point where the problem lies.
+
+## Implementation History
+


might be worth referencing the old issue tracker before KEPs existed :)
kubernetes/kubernetes#110317

k8s-ci-robot requested review from adrianmoisey and gjtempleton April 8, 2026 14:23

adrianmoisey reviewed Apr 9, 2026

View reviewed changes

Fedosin mentioned this pull request Apr 9, 2026

[WIP] Add per-HPA configurable sync period via syncPeriodSeconds field kubernetes/kubernetes#138222

Open

jm-franc reviewed Apr 10, 2026

View reviewed changes

KEP-6003: Add KEP for configurable HPA sync period

df7747f

Introduce KEP proposing an optional syncPeriodSeconds field on HorizontalPodAutoscalerBehavior to allow per-HPA override of the global --horizontal-pod-autoscaler-sync-period.

Fedosin force-pushed the configurable-hpa-sync-period branch from 68a5daa to df7747f Compare April 21, 2026 14:37

wozniakjan reviewed Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-6003: Add KEP for configurable HPA sync period#6004

KEP-6003: Add KEP for configurable HPA sync period#6004
Fedosin wants to merge 1 commit intokubernetes:masterfrom
Fedosin:configurable-hpa-sync-period

Fedosin commented Apr 8, 2026

Uh oh!

k8s-ci-robot commented Apr 8, 2026

Uh oh!

adrianmoisey Apr 9, 2026

Uh oh!

Fedosin Apr 21, 2026

Uh oh!

jm-franc Apr 10, 2026

Uh oh!

Fedosin Apr 21, 2026

Uh oh!

jm-franc Apr 10, 2026

Uh oh!

Fedosin Apr 21, 2026

Uh oh!

jm-franc Apr 10, 2026

Uh oh!

Fedosin Apr 21, 2026

Uh oh!

wozniakjan left a comment

Uh oh!

wozniakjan Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		3. Look for warnings and errors which might point where the problem lies.

		## Implementation History

Conversation

Fedosin commented Apr 8, 2026

Uh oh!

k8s-ci-robot commented Apr 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wozniakjan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants