Describe the bug
The reconciliation loop runs multiple times per second per TWD. This shows count of "Running Reconcile loop" logs during the upgrade, rollback, and roll-forward.
I believe it is related to the non-deterministic map iteration in mapToStatus:
|
for buildID := range m.k8sState.Deployments { |
My thinking is this map causes the reconciler's final write to output the deprecatedVersions array in a non-deterministic order and triggers reconciliation again even if no status actually changes (just ordering changes)
|
if err := r.Status().Update(ctx, &workerDeploy); err != nil { |
Here can see the TWD version climbing several times per second:
gleasonm@symbiote-dev:~$ kubectl get twd $NAME -n $NS -w -o custom-columns='RV:.metadata.resourceVersion,GEN:.metadata.generation,TARGET:.status.targetVersion.status' | while IFS= read -r line; do printf '%s %s\n' "$(date '+%H:%M:%S.%3N')" "$line"; done
20:56:33.807 RV GEN TARGET
20:56:33.811 401336452 264 Current
20:56:33.830 401336457 264 Current
20:56:33.893 401336459 264 Current
20:56:33.950 401336461 264 Current
20:56:34.009 401336463 264 Current
20:56:34.077 401336466 264 Current
20:56:34.135 401336469 264 Current
20:56:34.188 401336472 264 Current
20:56:34.248 401336474 264 Current
20:56:34.310 401336479 264 Current
20:56:34.368 401336483 264 Current
20:56:34.431 401336486 264 Current
20:56:34.497 401336489 264 Current
20:56:34.550 401336492 264 Current
20:56:34.616 401336497 264 Current
20:56:34.671 401336499 264 Current
20:56:34.743 401336502 264 Current
20:56:34.826 401336506 264 Current
20:56:34.960 401336509 264 Current
20:56:35.016 401336512 264 Current
and here can see there is no diff when I sort by buildID:
gleasonm@symbiote-dev:~$ kubectl get twd $NAME -n $NS -o json | jq '.status.deprecatedVersions |= sort_by(.buildID) | .status' > /tmp/s1.json
gleasonm@symbiote-dev:~$ kubectl get twd $NAME -n $NS -o json | jq '.status.deprecatedVersions |= sort_by(.buildID) | .status' > /tmp/s2.json
gleasonm@symbiote-dev:~$ diff /tmp/s1.json /tmp/s2.json
gleasonm@symbiote-dev:~$
However, I checked v1.1.1 and this behavior looks unchanged, so I am not sure how to explain why I only observe this on v1.3.1
Minimal Reproduction
Have not tested a repro but I have a TWD with 21 deprecatedVersions and am running TWC v1.3.1
Environment/Versions
- OS: Linux
- Temporal Server Version: 1.30.2
- TWC version: 1.3.1
- Helm Chart version: 0.20.0
Additional context
I was running the TWC on v1.1.1 and started seeing this issue after upgrading to v1.3.1. I couldn't see anything obvious between the version that would explain why I started seeing the behavior following the upgrade.
Describe the bug
The reconciliation loop runs multiple times per second per TWD. This shows count of "Running Reconcile loop" logs during the upgrade, rollback, and roll-forward.
I believe it is related to the non-deterministic map iteration in mapToStatus:
temporal-worker-controller/internal/controller/state_mapper.go
Line 57 in a2e3ccd
My thinking is this map causes the reconciler's final write to output the deprecatedVersions array in a non-deterministic order and triggers reconciliation again even if no status actually changes (just ordering changes)
temporal-worker-controller/internal/controller/worker_controller.go
Line 399 in a2e3ccd
Here can see the TWD version climbing several times per second:
and here can see there is no diff when I sort by buildID:
However, I checked v1.1.1 and this behavior looks unchanged, so I am not sure how to explain why I only observe this on v1.3.1
Minimal Reproduction
Have not tested a repro but I have a TWD with 21 deprecatedVersions and am running TWC v1.3.1
Environment/Versions
Additional context
I was running the TWC on v1.1.1 and started seeing this issue after upgrading to v1.3.1. I couldn't see anything obvious between the version that would explain why I started seeing the behavior following the upgrade.