Fix LoginWithFailover missing parser state check#4140
Fix LoginWithFailover missing parser state check#4140paulmedynski wants to merge 4 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a failover login behavior discrepancy in SqlConnectionInternal.LoginWithFailover() by aligning its error-handling logic with LoginNoFailover(), ensuring login-phase transient/server errors are thrown for the outer ConnectRetryCount retry loop instead of triggering failover alternation.
Changes:
- Added a
_parser?.State is not TdsParserState.Closedcheck inLoginWithFailover()’sSqlExceptioncatch block. - Documented the rationale in-code to distinguish login-phase errors from network-level failures during failover alternation.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4140 +/- ##
==========================================
- Coverage 65.96% 64.44% -1.53%
==========================================
Files 275 270 -5
Lines 42993 65779 +22786
==========================================
+ Hits 28361 42391 +14030
- Misses 14632 23388 +8756
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
mdaigle
left a comment
There was a problem hiding this comment.
On second read, I would expect to see some tests requiring modification due to this change. Can you add test coverage if it's missing to prove the behavior is what we expect?
371647f to
24d5a77
Compare
| IsEnabledTransientError = true, | ||
| Number = 40613, | ||
| // Use non-fatal severity so break/doom logic does not short-circuit the path. | ||
| ErrorClass = 16, |
There was a problem hiding this comment.
This was the key to observing the old incorrect behaviour, where a non-doomed connection failed-over to the secondary due to a transient non-fatal login error.
…sabled failover tests
mdaigle
left a comment
There was a problem hiding this comment.
This seems like the correct behavior. My final thought is that it may make sense to add an app context switch to allow customers to revert to the old behavior. I could easily imagine someone taking advantage of the old behavior to get auto-switching to a backup when they perform maintenance on the primary.
Alternatively, we should call this change out specifically in our release notes and make sure to update this documentation to be clearer about network vs. TDS error handling and ensure that other drivers are in alignment: https://learn.microsoft.com/en-us/sql/database-engine/database-mirroring/connect-clients-to-a-database-mirroring-session-sql-server?view=sql-server-ver17
| IsEnabledTransientError = true, | ||
| Number = 40613, | ||
| // Use non-fatal severity so break/doom logic does not short-circuit the path. | ||
| ErrorClass = 16, |
Fixes #4139
Description
LoginWithFailover()inSqlConnectionInternal.cswas missing the_parser?.Statecheck thatLoginNoFailover()already has. This caused transient errors (40613, 42108, 42109) to trigger failover alternation instead of being thrown immediately for the outerConnectRetryCountloop to handle.The code comment in
LoginWithFailover()even notes: "The logic in this method is paralleled by the logic in LoginNoFailover. Changes to either one should be examined to see if they need to be reflected in the other."Changes
Added the same
_parser?.State is not TdsParserState.Closedcheck toLoginWithFailover()catch block, consistent withLoginNoFailover():Behavioral change
When using
Failover Partner, login-phase SQL failures that arrive while the parser remains open are now treated as login failures (outerConnectRetryCountpath) instead of network-level failover signals.Before this fix:
LoginWithFailover()could enter failover alternation.After this fix:
Public docs analysis
Reviewed public Microsoft docs related to failover and login behavior:
SqlConnectionStringBuilder.FailoverPartnerAPI docs explain high-level failover partner usage and mirroring prerequisites.Current docs do not explicitly define this internal boundary:
This change aligns implementation with intended internal behavior and with
LoginNoFailover()logic.Testing