Skip to content

[Bug]: Scale delay cooldowns don't reset when autoscaler intent changes during the delay period #3824

@megheaiulian

Description

@megheaiulian

Steps to reproduce

  1. Deploy a service with replicas: 0..1 and scaling: { metric: rps, target: 0.5, scale_down_delay: 900 }
  2. Send a request to trigger scale-up (0→1)
  3. Use the service actively for 14 minutes (RPS > 0, requests completing regularly)
  4. Pause for 1 minute (RPS drops to 0)
  5. Observe: service scales down at T=15min, even though it was actively used for 14 of those 15 minutes

Expected behaviour

scale_down_delay should protect the instance for the specified duration after the autoscaler first decides to scale down — and reset if traffic returns during that window. Instead, the cooldown is calculated from last_scaled_at (the timestamp of the last applied scale event), not from when the autoscaler's intent changed.

The bug

In src/dstack/_internal/server/services/services/autoscalers.py, the RPSAutoscaler uses last_scaled_at to enforce cooldowns:

if (now - last_scaled_at).total_seconds() < self.scale_down_delay:
    # too early to scale down, wait for the delay
    return current_desired_count

last_scaled_at is only updated when a scale event is actually applied (the cooldown expires and the scale happens). It is not updated when the autoscaler's desired count changes (intent changes) during the cooldown period.

This means:

  • Active traffic during the cooldown does not reset last_scaled_at
  • For replicas: 0..1, last_scaled_at is set once at initial scale-up (0→1) and never updated, because desired_count == current_count (1 == 1) produces no scale event
  • Scale-down can happen immediately after the cooldown expires, regardless of activity during the cooldown

Timeline demonstrating the bug

T=0:    Scale up 0→1, last_scaled_at = T0
T=1m:   RPS=0.02, desired=1, current=1 → no scale event → last_scaled_at stays T0
T=5m:   RPS=0.03, desired=1, current=1 → no scale event → last_scaled_at stays T0
T=10m:  RPS=0.01, desired=1, current=1 → no scale event → last_scaled_at stays T0
T=14m:  User pauses, RPS drops to 0
T=15m:  (now - T0) >= scale_down_delay(900s), RPS=0 → scale down happens
        → User was active for 14 of 15 minutes, but got scaled down after a 1-minute pause

Proposed fix

Replace last_scaled_at with last_count_change — track when the autoscaler's desired count changes from the current count, not when a scale event is applied:

  • When desired_count > current_desired_count: update last_count_change (scale-up intent)
  • When desired_count < current_desired_count: update last_count_change (scale-down intent)
  • Use last_count_change instead of last_scaled_at for cooldown calculations

This ensures:

  1. The cooldown starts from when the autoscaler first decided to scale down, not from the last applied scale event
  2. If traffic returns during the cooldown (desired_count goes back up), the cooldown resets because the autoscaler's intent changed
  3. This works symmetrically for both scale_up_delay and scale_down_delay
# Proposed logic
if new_desired_count > current_desired_count:
    if current_desired_count == 0:
        return new_desired_count  # immediate scale-up from zero
    if last_count_change is not None and (now - last_count_change).total_seconds() < self.scale_up_delay:
        return current_desired_count
    return new_desired_count
elif new_desired_count < current_desired_count:
    if last_count_change is not None and (now - last_count_change).total_seconds() < self.scale_down_delay:
        return current_desired_count
    return new_desired_count
return new_desired_count

Impact

This bug makes RPS-based autoscaling unreliable for interactive workloads (LLM serving, chat bots, etc.) where traffic is bursty with pauses between active periods. The cooldown provides a false sense of protection — it doesn't actually guarantee N seconds of inactivity before scaling down.

dstack version

0.19.x (current master)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions