fix(analytics): thread-safe polling start + surface fetch errors (SDK-81, SDK-78)#155
fix(analytics): thread-safe polling start + surface fetch errors (SDK-81, SDK-78)#155tylerjroach wants to merge 3 commits into
Conversation
…utors startPollingForDefinitions had no lock around the pollingExecutor create-and-assign. Two concurrent callers would each allocate a fresh ScheduledExecutorService and the earlier one's worker thread + queue would leak (still alive, still scheduled to poll, no way to shut it down because the field had been overwritten). Guard the executor lifecycle with a private lock and bail early if a poller is already scheduled. Snapshot the executor under the lock in stop, then run the (potentially blocking) shutdown / awaitTermination outside the lock so a long-running shutdown can't block a concurrent start for 5s. Linear: SDK-85 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #155 +/- ##
==========================================
+ Coverage 96.64% 96.69% +0.04%
==========================================
Files 14 14
Lines 656 665 +9
==========================================
+ Hits 634 643 +9
Misses 22 22
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
…p silently start_polling_for_definitions! caught StandardError and dispatched only to @error_handler (whose default ErrorHandler#handle is a no-op), so NoMethodError / JSON::ParserError / etc. from schema drift would loop forever undetected unless the user had configured a custom handler. Add log_polling_error which always warns to STDERR before dispatching to @error_handler — visibility no longer depends on configuration. Matches the convention in mixpanel-python, mixpanel-java, mixpanel-go, and mixpanel-node, all of which log unconditionally and continue polling. Linear: SDK-78 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… poller
stop_polling_for_definitions! flipped a shared @stop_polling boolean.
A concurrent start_polling_for_definitions! could then reset it to
false before the old thread woke from its sleep, leaving the old
poller running indefinitely and orphaned (no longer tracked by
@polling_thread, so no future stop could join it).
Each polling thread now captures its own [false] cell by closure. A
subsequent start allocates a fresh cell, so the previous thread still
observes true and exits on its next wake.
Also name the polling thread ('mixpanel-flags-poller') so the concurrent-
start regression test can count our threads by name instead of diffing
Thread.list.size, which is fragile under other background threads in
the same process.
|
Pushed P1 — concurrent stop+start race ( P2 — fragile All 48 specs pass locally. |
Bundles two small audit fixes in the polling loop — both
local_flags_provider.rb, both touching the samebegin/rescueblock.SDK-81 — thread-safe polling start
start_polling_for_definitions!had a check-then-act race:if @config[:enable_polling] && !@polling_threadran unsynchronized, so two concurrent callers could both see@polling_thread == nil, both assignThread.new { ... }, and the earlier thread would become orphaned but keep polling forever.Add a
@polling_mutexguarding the polling-thread lifecycle:start_polling_for_definitions!: under the mutex, bail early if@polling_threadalready exists, otherwise spawn the thread.stop_polling_for_definitions!: snapshot@polling_threadand@stop_pollingunder the mutex, thenjoinoutside the mutex so a long-running join can't block a concurrentstart!for a full poll interval.SDK-78 — surface polling-loop errors
The loop caught
StandardErrorand dispatched only to@error_handler, whose defaultMixpanel::ErrorHandler#handleis a no-op. Schema drift (NoMethodError,JSON::ParserError,TypeError, …) looped forever undetected unless the user had configured a custom handler.Add a
log_polling_errorhelper that alwayswarns to$stderrbefore dispatching to@error_handler. Visibility no longer depends on configuration. Matches the de facto convention across every other Mixpanel SDK polling loop (mixpanel-pythonlogger.exception, mixpanel-javalogger.log(WARNING), mixpanel-golog.Printf, mixpanel-nodethis.logger?.error) — all log unconditionally and keep polling. Crashing the polling thread on the first bad payload would freeze local evaluation at whatever variants were last cached — strictly worse than continuing.Context
Linear: SDK-81, SDK-78. Same audit-driven cross-SDK push; for SDK-81, Ruby is one of three affected (also Java #91, Node #278). SDK-78 was Ruby-only — the other SDKs already log unconditionally.
Test plan
"is idempotent under concurrent start calls and does not leak threads"(SDK-81) spins up 8 contender threads behind aQueuegate; asserts only 1 new background thread exists after they all return. Before the fix the count would be 8."warns to stderr when fetch raises and no error_handler is configured"(SDK-78) builds the provider withnilerror_handler + 500 response, asserts thewarnlands on stderr."surfaces unexpected errors (schema drift) instead of swallowing them silently"(SDK-78) injectsNoMethodErrorviaallow(...).to receive(:fetch_flag_definitions), asserts both the error_handler dispatch AND the stderr warning fire.🤖 Generated with Claude Code