fix(listener): serialize membership orders per stack#205
Draft
Dav-14 wants to merge 1 commit into
Draft
Conversation
Two membership orders for the same stack could land on different worker
pool goroutines and race on the K8s API, so the slower handler's PATCH
silently overwrote the faster one — repeatedly observed when an
ExistingStack arrived in close succession to another and the second
update was undone by the first.
Route every stack-scoped order (ExistingStack / DeletedStack /
EnabledStack / DisabledStack) through a per-stack mailbox. At most one
order per stack is in flight at a time; pending orders for the same
stack run FIFO once the previous one finishes. Orders for distinct
stacks remain concurrent, still bounded by the same pond worker pool —
so we keep cross-stack throughput.
Two cancellation cases interrupt the in-flight handler when continuing
it would be wasteful or wrong:
- a fresher ExistingStack supersedes an in-flight ExistingStack (the
older snapshot is stale; replaying it overwrites the newer state),
- a DeletedStack supersedes any non-Deleted in-flight order (no point
creating resources we are about to tear down).
An in-flight DeletedStack is never cancelled. Cancellation only fires
when a successor is already queued, so the cluster is guaranteed to
reconcile to the latest desired state. syncExistingStack now checks
ctx between sync steps so a superseding order takes effect promptly
instead of waiting for all four sub-syncs to finish.
The dispatcher is in its own file with a handler-injection seam so the
queue mechanics are unit-tested with -race (no envtest dependency).
Coverage includes per-stack FIFO, cross-stack parallelism,
ExistingStack coalescing, Delete-cancels-Existing, Delete-never-
cancelled, parent-ctx propagation, and the full cancellation matrix.
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two membership orders for the same stack could land on different worker-pool goroutines and race on the K8s API, so the slower handler's PATCH silently overwrote the faster one. Repeatedly seen when an
ExistingStackarrived in close succession to another and the second update was undone by the first.This change routes every stack-scoped order (
ExistingStack/DeletedStack/EnabledStack/DisabledStack) through a per-stack mailbox: at most one order per stack is in flight at a time, pending orders for the same stack run FIFO, orders for distinct stacks remain concurrent on the existingpondpool.Cancellation matrix
ExistingStackDeletedStackEnabled/DisabledExistingStackDeletedStackEnabled/DisabledRules in words:
ExistingStackcancels an in-flightExistingStack— the older snapshot is stale and replaying it would overwrite the newer state.DeletedStackcancels any non-DeletedStackin flight — no point creating resources we are about to tear down.DeletedStackis never cancelled.syncExistingStacknow checksctx.Err()between its four sub-syncs so a superseding order takes effect promptly instead of waiting for all of them.Design notes
The dispatcher lives in
internal/stack_dispatcher.gowith a handler-injection seam (orderHandler func). The queue mechanics are unit-tested under-racewith no envtest dependency —stack_dispatcher_test.gocovers:ExistingStackcoalescing,Deletecancels in-flightExisting,Deleteis never cancelled,Cross-stack throughput is unchanged: the same
pond.New(5, 5)pool still bounds concurrent work, and the dispatcher submits one drain goroutine per active stack rather than one goroutine per order.Test plan
go test ./internal/... -run 'TestStackDispatcher|TestShouldCancelInFlight' -race -count=50— all passjust lint— cleango vet ./...— cleanExistingStacksequence and confirm the second snapshot's state survivesExistingStackthen immediately aDeletedStackand confirm the create is interrupted (no orphaned modules)Notes for reviewers
Start()is now a two-line read-from-channel +Dispatchloop; all the per-message handling moved intohandleOrder()on the listener.ctx = …reassigning the captured outer ctx variable across worker goroutines); the refactor naturally fixes it becausehandleOrderis a method wherectxis a parameter.