fix: route telemetry HTTP through shared connection stack; fix close() flush race#362
Merged
samikshya-db merged 2 commits intomainfrom Apr 21, 2026
Merged
fix: route telemetry HTTP through shared connection stack; fix close() flush race#362samikshya-db merged 2 commits intomainfrom
samikshya-db merged 2 commits intomainfrom
Conversation
…) flush race - DatabricksTelemetryExporter.sendRequest and FeatureFlagCache.fetchWithRetry now use connectionProvider.getRetryPolicy().invokeWithRetry(), matching the CloudFetchResultHandler pattern instead of bespoke fetch/retry logic - MetricsAggregator: add closing flag so batch-triggered fire-and-forget flushes are suppressed during close(), ensuring a single awaited flushForClose() drains all remaining metrics without racing past process.exit() Co-authored-by: samikshya-chand_data
|
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Replace completeStatementForClose/flushForClose toggle pattern with a closing flag that suppresses batch-triggered fire-and-forget flushes. Set closing=true first so completeStatement works normally via addPendingMetric (closed is still false), then seal with closed=true and drain with a single awaited flush(false). Co-authored-by: samikshya-chand_data
|
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
vikrantpuppala
approved these changes
Apr 21, 2026
samikshya-db
added a commit
that referenced
this pull request
Apr 21, 2026
…e, MetricsAggregator Picks up #362 (shared connection stack + close() flush race fix) merged to main. Resolves add/add conflicts by preferring main's versions of the three infrastructure files, consistent with the prior merge strategy. Co-authored-by: samikshya-chand_data
samikshya-db
added a commit
that referenced
this pull request
Apr 28, 2026
…iring, error telemetry
Rebase the regressed exporter/aggregator/feature-flag-cache on main's
hardened versions and re-apply only the genuinely new functionality
(CONNECTION_CLOSE event, chunk-timing aggregation) on top. Closes the
critical findings from the multi-reviewer audit:
- SSRF guard, redactSensitive, sanitizeProcessName, hasAuthorization,
auth-missing warn-once — all restored via main's telemetryUtils.
- MetricsAggregator memory bounds (maxPendingMetrics with error-preferring
drop, maxErrorsPerStatement, statementTtlMs eviction) restored.
- FeatureFlagCache in-flight fetch dedup and TTL clamp [60s, 3600s]
restored; lib/telemetry/urlUtils.ts deleted.
- close() now properly awaits aggregator drain — fixes the close()/flush
race that PR #362 already fixed once.
- Driver version reads lib/version.ts via buildUserAgentString instead
of hardcoded '1.0.0'; uuidv4() restored in place of Math.random().
- TelemetryTerminalError re-exported from lib/index.ts.
Type-safe wiring:
- Added optional getTelemetryEmitter() / getTelemetryAggregator() to
IClientContext; removed all 7 `(this.context as any)` casts.
- Six copy-pasted event listeners in DBSQLClient.initializeTelemetry
collapsed into one `Object.values(TelemetryEventType)` loop — closes
the listener-name mismatch that silently dropped error events.
- mapAuthType now covers all 6 authType values instead of defaulting
everything to 'pat'.
TelemetryClient now owns the host-scoped resources:
- TelemetryClientProvider is a process-wide singleton (getInstance()).
- TelemetryClient owns DatabricksTelemetryExporter, MetricsAggregator,
CircuitBreakerRegistry, and FeatureFlagCache for its host. Implements
IClientContext itself so the owned components have a stable context
that survives any single DBSQLClient's close.
- DBSQLClient instances on the same host share the breaker counters,
feature-flag cache, exporter, and HTTP batches. Fixes the per-instance
breaker-fragmentation noted in iter-2 architecture review.
- Each DBSQLClient still holds its own TelemetryEventEmitter (respects
per-client telemetryEnabled); emitters bridge into the shared aggregator.
- Exporter falls back to context.getAuthProvider() when no explicit auth
provider is passed, so the shared exporter resolves auth through the
TelemetryClient's FIFO of registered DBSQLClients.
Error telemetry wired across operation entry points:
- Re-added emitErrorEvent(error) on DBSQLOperation; uses
ExceptionClassifier.isTerminal() to classify.
- fetchChunk, cancel, close, getMetadata wrap their bodies in try/catch
that calls emitErrorEvent before re-throwing. Verified end-to-end
against a real Azure Databricks workspace: failed query produces
STATEMENT_COMPLETE + ERROR (with redacted stack) on the wire.
- Removed the await getMetadata() call from emitStatementComplete —
eliminates the extra Thrift RPC on every close (F19) AND prevents
spurious error telemetry from getMetadata's wrapper firing during
close-cleanup of an already-failed operation.
Other:
- Iterating Map.keys() while mutating made safe via snapshot in close().
- STATEMENT_COMPLETE no longer zeroes accumulated chunk metrics when
the emit doesn't supply them (matches sibling-field guards).
- Tests for the rebased modules restored from main; provider tests
updated for the singleton API; deleted unused TelemetryExporterStub.
484 unit tests passing. Diff vs main: ~+2110/-383, down from the
original PR's +3640/-1173.
Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Targeted fixes addressing Vikrant's Apr 21 review comments on #325.
DatabricksTelemetryExporter,FeatureFlagCache):sendRequestandfetchWithRetrynow useconnectionProvider.getRetryPolicy().invokeWithRetry(), matching theCloudFetchResultHandlerpattern. Removes bespoke per-file fetch/retry logic.MetricsAggregator.close()flush race: adds aclosingflag so batch-triggered fire-and-forget flushes are suppressed during the close loop, ensuringflushForClose()is the single awaited drain with no escaping promises.Test plan
This pull request was AI-assisted by Isaac.