fix(store): [OCISDEV-834] survive permanently-closed NATS connections#628
Draft
kobergj wants to merge 1 commit into
Draft
fix(store): [OCISDEV-834] survive permanently-closed NATS connections#628kobergj wants to merge 1 commit into
kobergj wants to merge 1 commit into
Conversation
The nats-js and nats-js-kv store backends used the NATS client defaults, which give up reconnecting after 60 attempts (~2 min). Any NATS outage longer than that left the client permanently closed, surfacing as 'nats: connection closed' on every subsequent cache operation until the process was restarted. Reconnect forever (MaxReconnect = -1) with a 5s backoff, set a proper client name, and log the previously-silent connection state transitions. Additionally, loadAttributes in the decomposedfs messagepack metadata backend no longer fails a successful disk read when the cache write-back fails: the error is logged and the disk data is returned. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Julian Koberg <julian.koberg@kiteworks.com>
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proactive hardening for the NATS dead-connection class of failures tracked in OCISDEV-834. Complements the go-micro
nats-js-kvpluginhasConn()fix (already shipping in oCIS via owncloud/ocis#12401 / #12402 / #12404) — this side prevents the NATS client from giving up in the first place and makes the failure visible.Problem
The
nats-jsandnats-js-kvstore backends used the NATS client defaults:MaxReconnect = 60,ReconnectWait = 2s. Any NATS outage longer than ~2 minutes exhausts the reconnect budget, the client closes permanently, and every subsequent cache operation fails withnats: connection closeduntil the process is restarted. None of these state transitions were logged, so the failure was silent in Loki.Changes
pkg/store/store.go— factor the shared nats option setup intodefaultNatsOptions()(used by both thenats-jsandnats-js-kvbranches) and:MaxReconnect = -1— reconnect foreverReconnectWait = 5sName = "reva-store"(was the literal"TODO")DisconnectedErrCB/ReconnectedCB/ClosedCBto log the previously-silent state transitionspkg/storage/utils/decomposedfs/metadata/messagepack_backend.go—loadAttributesno longer fails a successful disk read when the cache write-back (PushToCache) fails. The error is logged and the data read from disk is returned. This is what made all spaces invisible in storage-users when the cache connection was dead (SE-1220).Test — white-box regression test (
messagepack_backend_internal_test.go) confirming a disk read succeeds despite a failing cache write-back.Notes for reviewer
loadAttributesfix is intentionally scoped to the read path.saveAttributes(the write path) also returns itsPushToCacheerror — arguably the same class of issue, but left out of scope here. Happy to fold it in if you'd prefer.Refs OCISDEV-834.