Idle auto-cancel: cancel Pro subs after 2 idle periods, with reversible keep flow#178
Open
btc wants to merge 37 commits into
Open
Idle auto-cancel: cancel Pro subs after 2 idle periods, with reversible keep flow#178btc wants to merge 37 commits into
btc wants to merge 37 commits into
Conversation
A small Go library plus a sibling code generator that turn AIP-134 PATCH RPCs into safe, dynamic Postgres UPDATE statements. The proto message and FieldMask carry intent on the wire; an aippatch.yaml plus codegen produce a typed Mapping[T] per resource; a runtime Apply[T] call validates the mask, builds the SQL, executes it, and returns the post-update proto. Built first at drill/thirdparty/aippatch/ with UpdateProfile as the v0 target; designed for clean extraction into a standalone module reusable across Spanda LLC products. Codegen consumes proto FileDescriptorSet plus parsed sql/migrations via pg_query_go plus aippatch.yaml policy; output is committed *.gen.go files reviewable in PRs and guarded by a --check mode in CI. v0 codec set covers scalars, timestamps, and enums — sufficient to retire drill's hand-rolled UpdateProfile handler. JSONB, declarative validators, declarative authz, AIP-193 error mapping, and AIP-154 ETag are roadmap items; each is backwards-compatible with v0 call sites.
Auto-cancel a Pro subscription via cancel_at_period_end=true when the user has been idle for two consecutive billing periods. No settings toggle — this is the default Sabermatic behavior. Establishes a Spanda tenet: earn money for ongoing value delivered. Trigger fires at Stripe invoice.upcoming (~T-7) when MAX(last_active) across auth_sessions falls before the previous period's start, with launch-day grandfathering via users.idle_eligible_after. Reversal via email-link click (signed token, single-use) or auto-reverse on any authenticated activity during the cancel window. sub_cancel_is_auto distinguishes our auto-cancels from user-initiated portal cancels. Migrations 015–018: auth_sessions soft-delete, users sub-state cache, stripe_webhook_dedup, keep_link_token_uses. Feature package home at internal/feat/idleunsub/.
Both opus and sonnet review agents returned ~30 findings. Major changes:
HIGH
- Acknowledge AIP-134 empty-mask deviation explicitly (Wire conformance
note); ErrorOnEmpty stays as v0 default.
- Fix s.b.Pool() — it is a method, not a field.
- Drop "matches sqlc's DBTX" claim; document it is a strict subset
satisfied by *pgxpool.Pool, *pgx.Conn, and pgx.Tx.
- Remove mustValidate panic helper from generated file; show clean
Mapping literal plus generated InitPatches() error per drill's
no-panic-at-init rule.
- Replace ub.SQL("RETURNING *") with ub.Returning(boundCols...) —
first-class API, sends only mapped columns, excludes unmapped sensitive
columns from the wire.
- Add AutoSet mechanism (yaml auto_set: { updated_at: NOW() }) so PATCH
bumps updated_at like the existing sqlc UPDATEs do; without this every
PATCH would leave updated_at stale.
- Use proto.CloneOf instead of proto.Clone(...).(T); add nil op.Message
guard.
- Add buf build -o buf.binpb to make codegen — current buf generate does
not emit a descriptor set.
MEDIUM
- Add Transactions subsection: Apply takes any DBTX including pgx.Tx.
- Add audit-logging hooks (returned diff) to v1 roadmap.
- Document CGO requirement for aippatchgen explicitly.
- Add NOT NULL extraction to migration replay; nullable columns deferred
to v1.
- Define module-extraction trigger as "second project consumes aippatch
in production" (v0.5).
- Audit existing test wording in rollout plan.
- Split smallint into separate type-table row with int16-to-int32
widening on decode.
- Add validated bool to Mapping; Apply rejects unvalidated mappings with
Internal.
LOW
- Document v0 nested-mask-path constraint and emit codegen diagnostic.
- List github.com/google/uuid in runtime imports.
- Add bytes/float/double to deferred-types table.
- State codecs are global and reusable across resources.
- Make algorithm step explicit that unmatched proto fields without
skip:true are diagnostic.
- Add go get step to rollout plan.
- Note RETURNING uses bound-columns list (privacy benefit).
- Correct codec key description (yaml name, not protoreflect name).
- Add grep-for-callers step before deleting UpdateDisplayName.
NIT
- Use op.Mask.GetPaths() throughout.
- Clarify two-phase ordering (field-number processing, alphabetical emit).
- Spell out "foreign-key constraints".
- Document yaml string-to-Go-enum mapping for empty_mask.
- Diagram label uses thirdparty/aippatch/cmd/aippatchgen.
- Spanda replication wording: runtime/codegen are drill-free, not the yaml.
Spec grew from 690 to ~770 lines.
Round 2 surfaced one structural bug (m.codecs referenced but missing from Mapping struct), pg_query_go v5/v6 drift, several real correctness issues, and several smaller items. Fixes: HIGH - Add Codecs map[string]EnumCodec plumbing: shared Codecs registry in generated init.gen.go, passed into Mapping.Validate(codecs) which copies the relevant subset into unexported m.codecs. Apply now references m.codecs that actually exists. - Add EnumCodec type to public API (ProtoEnum, ToText, FromText). - Scrub pg_query_go to v6 throughout (latest published; v6.2.2). - Tone down Returning(boundCols) "privacy" claim — bound non-writable columns ARE transmitted (e.g. email); only unmapped columns are excluded. MEDIUM - Fix typed-nil trap: any(op.Message) == nil is false for a nil *T; switch to op.Message.ProtoReflect().IsValid(). - Sort op.Where keys before iterating to keep SQL deterministic (otherwise pgx prepared-statement cache misses on every call). - Validate op.Where keys exist in m.bindingsByColumn — prevents attacker-controlled identifier injection via the map-key surface. - Tighten AutoSet conflict check: reject if column is *any* binding (writable or not), not just writable. - Add pg_query_go-based validation for AutoSet SQL literal: parse and reject multi-statement / non-expression input. - Remove faulty test-wording audit step — verified existing tests assert via connect.CodeOf(err), not on message text. - Document soft-deleted user wire change: today's UpdateProfile returns Internal; aippatch returns NotFound. Add to Wire conformance note, rollout plan, and Risks. - Fix cmd/server/main.go → cmd/drill/main.go (drill's actual binary path). LOW - UpdateAllWritable empty-mask path: explicit Internal in v0 (was silently emitting AutoSet-only UPDATE). - Fix uuid.UUID(v).String() → uuid.UUID(v.([16]byte)).String() in type table. - Switch validated bool to atomic.Bool for race safety under -race. - Replace fictional maskHasPath with slices.Contains in handler example. - Add field-number-vs-alphabetical ordering explanation to algorithm. - Add GetPaths nil-safety note. - Note PK appears in Bindings (no dedup with PK column). - AutoSet test pattern: SELECT before / Apply / SELECT after — within a tx, NOW() is constant, so the framework's separate query advances it. - Remove dead yaml override comments (free_full_educators_used skip entry was never visited). NIT - AIP-203 → AIP-134 §Update_Mask citation. - "stale" → "superseded" sqlc deletion wording. - Architecture diagram pg_query_go shows version. Spec grew from ~770 to ~830 lines.
Both opus and sonnet round-3 reviews returned "Approve with minor revisions"
— no HIGH severity findings. Sonnet's "H1" was a redundant nil check that
opus correctly classified NIT (the IsValid check actually works); fixed as
cosmetic.
MEDIUM (both reviewers)
- Validate(codecs) now explicitly fails when a Binding references an
enum codec name not in the supplied codecs map. Closes the silent
failure mode where dangling refs would only surface at first Apply.
- Validate doc spells out FromText construction, dangling-codec check,
and validated-stays-false-on-error.
- Pin huandu/go-sqlbuilder to @v1.36.0 in `go get` step (UpdateBuilder.
Returning was added in that release).
LOW
- UpdateAllWritable empty-mask path now returns CodeUnimplemented (was
Internal); AIP-aligned for "feature not implemented".
- SET-clause iteration now walks Mapping.Bindings order (alphabetical)
filtered by mask membership, instead of client-supplied path order.
Makes SQL deterministic regardless of how clients order the mask
paths — pgx prepared-statement cache hits, and golden tests are
stable.
- Add codegen step rejecting EnumCodec maps with duplicate ToText
values (would produce a lossy FromText).
- Note `sqlbuilder.PostgreSQL.NewUpdateBuilder()` is required (default
flavor would emit MySQL-style `?` placeholders).
- Fix AutoSet test pattern — NOW() is constant within a transaction so
SELECT-before/Apply/SELECT-after inside BeginFunc would see equal
values. Use clock_timestamp() OR top-level statements OR capture
pre-Apply NOW() and assert post >= captured.
NIT
- Drop dead `op.Message.ProtoReflect() == nil` clause; ProtoReflect
never returns nil for protoc-gen-go types.
- Hoist ProtoReflect() to a `src` local instead of calling repeatedly.
- Use `ub.IsNull(m.SoftDelete)` instead of string concatenation.
- Add bullet for ALTER TABLE ADD/DROP CONSTRAINT and other unlisted
AlterTableCmd kinds being ignored in v0.
- Sort AutoSet alphabetically by Column for stable diff (in addition
to Bindings).
- Move nested-mask-path rejection out of proto-field-iteration loop
(proto field names never have dots; the check belongs at yaml
validation time on writable/overrides keys).
- Document CGO first-build cost (~3 min) and CI cache requirement.
- Align spec on Makefile's existing `generate` target (was claiming a
nonexistent `codegen` target).
Final spec is implementation-ready. Both reviewers gave "Approve with
minor revisions"; ending the review loop after round 3 per CLAUDE.md
("up to 3 rounds, end early if clean").
15 tasks across 6 phases covering migrations 015–018, the internal/feat/idleunsub/ feature package (token sign/verify, trigger evaluation, KeepSubscription, AutoReverse, email composition), Stripe webhook integration, the public /sub/keep endpoint, the auth-middleware AutoReverse hook, the user-info RPC + AckKeptBanner, and the React KeptBanner component. Acceptance criteria + manual smoke test + spec out-of-scope list at the end. Spec: docs/superpowers/specs/2026-05-07-idle-auto-cancel-design.md
Round 4: opus + sonnet both rated "Approve with minor revisions". Both flagged the same M1 (UpdateAllWritable error-code inconsistency across four sections). Opus added M2 (codegen column-uniqueness) and several LOWs/NITs; sonnet added decode fd-nil-guard. MEDIUM - Reconcile UpdateAllWritable across all four sections that mentioned it: Errors section now lists CodeUnimplemented; Roadmap aligns with Apply walkthrough's defense-in-depth check; yaml-mapping prose says codegen is the primary rejection point. - Codegen step 10 added: verify column uniqueness across emitted bindings to catch overlapping override.column entries that would produce duplicate RETURNING and SET clauses Postgres rejects. - Codegen step 9 added explicitly: yaml empty_mask: update_writable is rejected at codegen, not just at runtime. LOW - decode() in Apply step 7 now guards fd == nil before passing to decode, symmetric with encode() in step 2. - AutoSet NOW() gotcha explained inline in the canonical yaml example (before, only in Testing strategy — readers who scrolled past would write a flaky test). - Op.Where doc strings now state values must be scalar / pgx-bindable; list comparators (IN, !=, <) deferred to v1+. - Generalize "call patches.InitPatches()" error message to "your generated InitPatches()" — the runtime is consumer-agnostic. - PK-without-proto-field case explained: column simply doesn't appear in Bindings; WHERE uses m.PK for identity regardless. NIT - Decisions row 8 now matches the wire-conformance "permanent per resource" claim (was contradictorily saying "relax later"). - "drop" in algorithm step 3.iii reworded to "emit no binding; field is registered as intentionally skipped" so the relationship to step 4 is unambiguous. - Risks #1 first-build cost wording softened from "~3 minutes" to "several minutes (varies by runner)". - Validate concurrency note added: callers serialize startup; sync.Once is left to the caller if needed.
Add revoked_at column to auth_sessions and index on (user_id, last_active DESC). Replace hard-delete (DeleteAuthSession, DeleteUserAuthSessions) with soft-revoke (RevokeAuthSession, RevokeUserAuthSessions) so session rows are preserved for the new GetUserLastActive activity query. Account deletion (DeleteAccount) retains hard-delete for GDPR data minimization.
Service skeleton plus the cancel-decision flow per spec §4.1/4.2: fetch sub from Stripe, dedup by webhook event_id, period-key dedup against prior subscription_kept/auto_canceled rows, evaluate the idle predicate (MAX(last_active) < periodStart - one interval) under a per-user row lock, write the audit row + cache flags inside a tx, then call UpdateSubscriptionCancel(idempotency_key=event.ID) outside the tx. Tests cover the happy path plus eight early-return shapes (trialing, already-canceled, grandfathered, retry-storm dedup, post-keep dedup, no-activity-history, still-active, first-period grace). They use a fake StripeClient and a per-test fresh Postgres via the package's testutil.SharedPostgres TestMain. Adds a GetUserByStripeCustomer query to map Stripe customers to users on the webhook path (nullable text param matches the column).
- Add per-symbol nolint:unused directives (with Task 8 rationale) for 7 symbols in metrics.go and metadata.go that are not yet wired up - Honor Price.Recurring.IntervalCount in subtractInterval (was always subtracting exactly 1 interval regardless of billing cycle multiplier) - Add TODO comment on enqueueCancelEmail stub documenting Task 9's required test injection - Strengthen DedupesRetryStorm test: three-step scenario isolates the webhook-dedup gate from the period-dedup gate; documents that evt_2's TryClaimWebhookEvent row rolls back with the TX on period-dedup hit - Happy-path test asserts stripe_subscription_id remains NULL (documents spec §4.1 single-population-path invariant) - Hoist metadata.go's free-floating doc comment to attach to cancelMetadata
- Document empty idempotency key intent on both UpdateSubscriptionCancel reversal call sites (KeepSubscription and AutoReverse) - Wrap KeepSubscription and AutoReverse real-reversal DB writes in a single transaction so ClearUserAutoCancelState + InsertSubscriptionKeptEvent commit-or-fail atomically - Add stuck-state comment on StripeSubscriptionID.Valid early-return in AutoReverse - Add Task 9 email TODO comment in TestKeepSubscription_Idempotent - Add TestAutoReverse_StripeUpdateFails: Stripe error leaves gates ON for self-heal - Fix unlambda in NewTokenSigner (gocritic: replace lambda with time.Now directly)
- Wire real TokenSigner (32-byte random key) in setupHappyPath; signer was nil, causing enqueueCancelEmail to silently exercise the error branch - Add recordingMailer to capture sent messages; assert cancel/kept email subject, To, and body content in happy-path tests - Assert require.Empty(fx.Mailer.msgs) in skip/no-op tests - Rename enqueueKeptEmail's unused subID param to _ (removes _ = subID smell) - Add sub_id to kept-email error log lines in KeepSubscription and AutoReverse - Add TestComposeCancelEmail_HTMLEscapes: verifies DisplayName is HTML-escaped - Add <a href=...> assertion to TestComposeCancelEmail_Renders - Use .UTC() before .Format() in composeCancelEmail and composeKeptEmail (defensive: robust to callers that pass non-UTC time)
Extract the duplicated makeIdleunsubSvc/makeIdleunsubOverride helpers from auth_test.go and billing_test.go into a single exported NewServiceWithFake constructor in idleunsubtest. Add concurrency safety docs to FakeStripe.
Replaces the synchronous mailer.Send call in HandleInvoiceUpcoming, KeepSubscription, and AutoReverse with a River-backed enqueue. Mailgun delivery latency must not block the Stripe webhook (30s timeout) or the auth-middleware AutoReverse path (5s context). Spec §4.1 step 6, §4.3 step 6, and §9 metric idleunsub.email.enqueue all describe this as an async enqueue, not a synchronous send. - Add idleunsub.EmailEnqueuer interface; replace the mailer field on Service with an enqueuer. composeCancelEmail/composeKeptEmail still produce email.Message, then idleunsub copies fields into a jobs.SendEmailArgs and calls EnqueueSendEmail. - internal/backend/idleunsub_enqueuer.go: production adapter that wraps the Backend's River client (via the Jobs interface) and forwards to river.Insert with the standard SendEmailInsertOpts. - backend.go: defer idleunsub.NewService construction until after riverClient exists so the enqueuer can be wired with a live client. - idleunsubtest: add RecordingEnqueuer (mutex-protected, captures every jobs.SendEmailArgs). NewServiceWithFake now wires this enqueuer instead of NullMailer so the helper continues to compile. - All idleunsub unit + E2E tests updated to assert on enqueuer.Snapshot() rather than mailer.Msgs. - Drop unused subID arg from enqueueKeptEmail.
Round-1 multi-reviewer fixes (Opus + Sonnet) on the idle-auto-cancel branch. Eight changes; none alter the externally observable contract. - ClearStaleKeptBanners worker + daily periodic job. Spec §4.5 mandates 14-day banner auto-clear; the SQL existed (sql/queries/users.sql:138) but had no caller. Wired as a 24h River periodic job alongside the existing maintenance jobs in backend.New. - AutoReverse comment fix: the empty-idempotency-key justification was copy-pasted from KeepSubscription and incorrectly referenced keep_link_token_uses single-use enforcement. AutoReverse converges via the gate re-read at the top of the function instead. - KeepSubscription: document the deliberate Stripe-before-DB ordering. If the post-Stripe DB write fails, AutoReverse on the user's next authed request silently corrects via the cache-drift path (spec §4.3) — the user keeps their sub; only the banner/email are lost. - Integration test for spec §8.2 concurrent invoice.upcoming. Two goroutines fire HandleStripeWebhook with distinct event IDs; the per-user FOR UPDATE lock + period-keyed dedup must serialize them to exactly one Stripe Update, one audit row, and one webhook dedup row (the loser's row is rolled back inside its TX). - Boundary unit test: last_active EXACTLY equal to threshold must NOT trigger a cancel (spec §8.1 calls out '<' as strict). - TouchAuthSession error: log the warning instead of dropping. Per CLAUDE.md "Error handling over nolint" — failure is non-fatal but shouldn't be invisible. - Cache email templates at package init via template.Must. Embedded templates are validated by package tests, so the panic-on-parse contract is safe at runtime; per-Send cost drops to two Execute calls instead of two ParseFS round-trips. - subtractInterval default-case slog.Warn so a future Stripe schema change surfaces in logs instead of silently falling through to monthly arithmetic.
Fix ClearStaleKeptBanners SQL to use MAX(created_at) GROUP BY so a user's banner is only cleared when their MOST RECENT subscription_kept event is older than 14 days; the prior subquery matched any stale event, wrongly clearing a fresh banner when an older event also existed. Route the ClearStaleKeptBanners periodic job to QueueMaintenance instead of QueueDefault so it does not compete with user-facing work. Remove RecordingMailer from idleunsubtest (no callers; production uses RecordingEnqueuer for email assertions). Add TestClearStaleKeptBanners_MostRecentEventGates covering the multi-event edge: stale-only cleared, recent-plus-stale retained, recent-only retained.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Auto-cancel a Pro subscription via Stripe
cancel_at_period_end=truewhen the user has been idle for two consecutive billing periods. No settings toggle — this is the default Sabermatic behavior. Reversible via email-link click or any authenticated activity during the cancel window.Architecture: Event-driven via Stripe
invoice.upcomingwebhook (~T-7 days before each renewal). Cancel decision happens inside a per-user-locked DB transaction (SELECT ... FOR UPDATE) with a genericstripe_webhook_deduptable for retry-storm protection and period-keyeduser_eventsrows for multi-firing dedup. Reversal paths verify Stripe state before acting (handles partial-failure cache drift). Activity signal =MAX(last_active)from soft-deletedauth_sessions.Spec:
docs/superpowers/specs/2026-05-07-idle-auto-cancel-design.mdPlan:
docs/superpowers/plans/2026-05-07-idle-auto-cancel.mdWhat's in the box
auth_sessionssoft-delete,userssub-state cache columns,stripe_webhook_deduptable,keep_link_token_usestable.internal/feat/idleunsub/package: HMAC-SHA256 keep-token sign/verify,Service.HandleInvoiceUpcoming(cancel decision in a per-user-locked tx),Service.KeepSubscription(link-click reversal),Service.AutoReverse(auth-middleware reversal with cache-drift correction), email composition + 4 templates.invoice.upcomingandcustomer.subscription.createdthrough the new service;customer.subscription.updated/deletedsync the cache./sub/keeppublic endpoint: token verify → single-use claim → reversal. Token IS the auth.Backend.AuthenticateSession: fire-and-forget goroutine on authed requests with gates set.pending_kept_bannerfield onUserproto,AckKeptBannerRPC,<KeptBanner>React component with optimistic dismiss + getMe cache invalidation.SendEmailJob(not synchronous Mailgun on the webhook hot path).QueueMaintenanceclears banners older than 14 days (gated byMAX(created_at)per user).Test coverage
idleunsub(handler, service, token, email, dedup, race).TestE2E_CancelAndKeepViaLink: webhook → cancel → email → click → reverse → replay-idempotent.TestE2E_AutoReverseOnLogin: signup → seed → login → auth → goroutine completes.TestE2E_ConcurrentInvoiceUpcomingSerializes: two concurrent webhook deliveries, assertsSELECT FOR UPDATEserializes them.Test plan
make testgreen locally (last run: backend-race -count=1all 39 packages OK; frontend tsc/lint/vitest clean; buf lint clean).<KeptBanner>per CLAUDE.md UI rule:users.pending_kept_banner = TRUEvia psql.KEEP_TOKEN_HMAC_KEY(32-byte secret) in production env before deploy.Deployment notes
KEEP_TOKEN_HMAC_KEY. Empty key = degraded mode (cancel decisions still fire; keep-link emails skipped;/sub/keepreturns invalid for any token — fail closed).userstable (DEFAULT NOW() onidle_eligible_aftertriggers a one-time table rewrite — fine at current scale).idle_eligible_after = NOW()at migration time); the trigger fires on their nextinvoice.upcomingif idle.Out of scope (per spec §10)
auth_sessionsandkeep_link_token_uses(boy-scout cleanup, not blocking)invoice.paid, etc.)