Skip to content

Fix audit findings (non-docs): spec-conformance, bugs, security, refactor#112

Merged
nficano merged 9 commits into
mainfrom
fix/audit-issues-2026-06-11
Jun 12, 2026
Merged

Fix audit findings (non-docs): spec-conformance, bugs, security, refactor#112
nficano merged 9 commits into
mainfrom
fix/audit-issues-2026-06-11

Conversation

@nficano

@nficano nficano commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes the non-docs audit findings across arcp-core, arcp-runtime, arcp-client, and the middleware adapters: spec-conformance gaps, security leaks, resource/lifecycle bugs, a high-severity resume bug, and a partial refactor of SessionLoop. All modules build and mvn test is green (run with -Darcp.skip.spotless=true locally on JDK 25, per the pom note; CI runs Spotless on JDK 21).

Spec-conformance

Security

Bugs / lifecycle

Refactor / testing

Notes / partial

Test plan

  • mvn -DskipTests -Darcp.skip.spotless=true install
  • mvn -Darcp.skip.spotless=true test (all modules green; one pre-existing-style timing flake in SubscribeReplayTest is auto-reran green by the repo's surefire config)
  • CI Spotless/format check on JDK 21

Summary by CodeRabbit

Release Notes

  • New Features

    • Sessions can now be automatically parked and resumed when transport connections unexpectedly drop, allowing clients to reconnect and replay missed events.
    • Added host allowlisting for enhanced security control over WebSocket connections.
    • Job listings now support cursor-based pagination.
    • Introduced configurable timeout for blocking job submission operations.
  • Bug Fixes

    • Improved handling of transport write failures and connection termination.
    • Enhanced duplicate chunk detection and rejection.
    • Better error handling for unknown credential schemes.
  • Tests

    • Added comprehensive test coverage for session resumption, job lifecycle, pagination, and credential handling.

nficano and others added 4 commits June 11, 2026 21:58
Protocol/spec conformance:
- Lease-expiry termination now emits final_status "error" with LEASE_EXPIRED (#74)
- Enforce max_runtime_sec via a watchdog emitting TIMEOUT/"timed_out" (#89)
- Release idempotency claim on failed accept so identical retries are not
  poisoned with DUPLICATE_KEY (#90)
- Canonical idempotency fingerprint: sort JsonNode properties (#91)
- Subscriber events use the subscriber session's own event_seq space (#76)
- Invalid/undecodable job.submit now returns INVALID_REQUEST (#78)
- Idempotent replay returns the budget captured at acceptance (#79)
- add session.close/session.closed (alias session.bye) ack (#96)
- add job.cancelled ack; cancel of unknown job -> JOB_NOT_FOUND, no silent drops (#95)
- Unknown credential scheme decodes to UNKNOWN instead of throwing (#97)
- list_jobs maps only the requested page (#83)
- Lease.contains compares cost.budget numerically; add LeaseSubset validator (#77)

Security:
- credential_rotated is no longer fanned out to subscribers (#75)
- subscribe history replay re-envelopes events with the subscriber's session id
  and seq, and redacts credential_rotated (#94)
- rotateCredential tracks/revokes surplus minted credentials and fails loudly on
  empty issue rather than aliasing a revoked handle (#98)

Bugs:
- runJob catch blocks guard terminal emission to avoid duplicate job.error (#92)
- terminal job.result/job.error fan out to subscriber sessions (#93)
- bound per-job event history and evict terminal jobs after the resume window;
  unlink closed sessions from subscriber lists (#101)
- top-level errors echo the originating request id; client correlates them so a
  list/subscribe error never fails an unrelated submit (#102)
- handshake uses CAS and volatile heartbeat handle to avoid resurrecting a closed
  session or leaking a heartbeat task (#103)
- start worker before scheduling watchdog; cancelled() covers all terminal states (#104)
- client completes subscribe publishers when a job terminates (#105)
- blocking submit is bounded by a timeout and rejects dispatch-thread reentrancy;
  add submitAsync (#106)
- ArcpRuntime.accept inserts the session before start() to avoid zombie entries (#107)
- cost.budget.* gauges are not treated as spend; guard null metric value (#108)
- ResultStream rejects divergent duplicate pending chunks, tolerates identical (#111)

Adds protocol/unit tests covering the above (#33).

Co-authored-by: Cursor <cursoragent@cursor.com>
- Jetty: enforce allowedHosts with a servlet filter returning 403 before the
  WebSocket upgrade, instead of a builder field that was never read (#99)
- Spring Boot: enforce arcp.middleware.allowed-hosts with a HandshakeInterceptor
  rejecting disallowed Hosts with 403 (#99)
- Jakarta: record the host decision per handshake and close the session with
  VIOLATED_POLICY before ArcpRuntime.accept, instead of merely clearing response
  headers (which the container ignored) (#100)
- Vert.x: observe the async write result and fail the transport on write error so
  the runtime detects a dead session instead of silently dropping events (#110)

Co-authored-by: Cursor <cursoragent@cursor.com>
An unexpected transport drop now parks the session for the resume window instead
of cancelling its in-flight jobs: job ownership, event history, and the resume
buffer are retained and registered by resume token. A session.hello carrying a
valid resume_token (and matching principal) reattaches the new transport to the
existing session identity, replays buffered envelopes with event_seq greater than
last_event_seq, and continues delivering live events; an unknown or expired token
returns RESUME_WINDOW_EXPIRED. Explicit session.close still cancels in-flight
jobs. Parked sessions are torn down when the resume window elapses or the runtime
closes. Adds resume + explicit-close tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ng (#80, #82)

Begin splitting the oversized SessionLoop by moving two cohesive responsibilities
into their own tested types: IdempotencyFingerprint (canonical 7.2 hashing) and
JobListing (6.6 filter/sort/paging). SessionLoop no longer contains the
list-jobs implementation directly. Partial: the full job-execution/submit/
subscribe extraction and the under-800-line target remain follow-up work,
deferred to avoid regression risk to the behavioral fixes in this PR.

Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@nficano, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 49 minutes and 29 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2247ce10-b599-4bbb-9035-563447edcf8d

📥 Commits

Reviewing files that changed from the base of the PR and between 5e0e13d and 5ef793a.

📒 Files selected for processing (1)
  • arcp-runtime/src/main/java/dev/arcp/runtime/session/SessionLoop.java
📝 Walkthrough

Walkthrough

This PR implements session resume resilience, credential scheme flexibility, host allowlist security, and idempotent job submission. The changes span message protocol extensions, client-side blocking behavior with timeout, lease delegation validation, bounded event history, idempotency fingerprinting, session parking across unexpected transport drops, and host validation across multiple middleware frameworks.

Changes

Session Resume, Resilience, and Credential Security

Layer / File(s) Summary
Message Protocol Extensions & Credential Scheme Flexibility
arcp-core/src/main/java/dev/arcp/core/messages/*, arcp-core/src/main/java/dev/arcp/core/credentials/CredentialScheme.java, arcp-core/src/test/java/dev/arcp/core/*
SessionClosed and JobCancelled message records added to sealed Message interface. Message.Type enum gains SESSION_CLOSED and JOB_CANCELLED discriminators; SESSION_BYE wire value updated to session.close with legacy session.bye mapping for backward compatibility. CredentialScheme introduces UNKNOWN constant and changes fromWire to return UNKNOWN instead of throwing for unrecognized schemes. Messages.decode extended to handle new message types. All message kinds round-trip tested.
Lease Subset Delegation Validation
arcp-core/src/main/java/dev/arcp/core/lease/Lease.java, arcp-core/src/main/java/dev/arcp/core/lease/LeaseSubset.java, arcp-core/src/test/java/dev/arcp/core/lease/*
New LeaseSubset utility enforces strict delegation constraints: validates child namespace presence in parent, special-cases cost.budget with per-currency numeric amount comparison (not glob patterns), checks non-budget pattern coverage via Lease.contains, and validates temporal expiry constraints. Lease.contains extended to branch on cost.budget namespace. Tests verify budget subset enforcement, valid delegation, and constraint violations.
Job Event History Redesign with Idempotency
arcp-runtime/src/main/java/dev/arcp/runtime/session/JobRecord.java, arcp-runtime/src/main/java/dev/arcp/runtime/idempotency/IdempotencyFingerprint.java, arcp-runtime/src/main/java/dev/arcp/runtime/idempotency/IdempotencyStore.java, arcp-runtime/src/main/java/dev/arcp/runtime/session/JobListing.java, arcp-runtime/src/test/java/dev/arcp/runtime/*
JobRecord replaces unbounded envelope history with bounded deque of RecordedEvent entries (producerSeq + JobEvent), enforces DEFAULT_HISTORY_CAPACITY=1024 via FIFO eviction, adds watchdog and budget snapshot storage. IdempotencyFingerprint canonicalizes JobSubmit JSON (sorted keys, exact decimals), computes SHA-256 digest. IdempotencyStore.release() conditionally removes stale idempotency claims. JobListing provides pagination with Base64-url cursor encoding, deterministic sorting, and filter support. Tests cover fingerprint key-order invariance, cursor handling, pagination, and audit scenarios.
Client-side Blocking Submit with Timeout & Error Correlation
arcp-client/src/main/java/dev/arcp/client/ArcpClient.java, arcp-client/src/main/java/dev/arcp/client/ResultStream.java, arcp-client/src/test/java/dev/arcp/client/*
ArcpClient.Builder exposes submitTimeout(Duration). Blocking submit(jobSubmit, traceId) forbids execution from dispatch thread (via inDispatch ThreadLocal), waits for job.accepted using Future.get with timeout, removes pending-submit on timeout, normalizes failures to IllegalStateException. submitAsync returns future directly (no blocking). handleError rewritten to correlate top-level errors to listJobs via envelope id or pending submits. handleAccepted filters credentials via withRecognizedCredentials to exclude unknown schemes. completeLiveSubscriber helper ensures consistent subscriber shutdown. ResultStream adds duplicate chunk detection: tolerates byte-identical retransmissions, rejects divergent duplicates. Tests verify live subscriber completion, async resolution, cross-session subscription, and unknown resume token rejection.
Host Allowlist Security Across Frameworks
arcp-middleware-spring-boot/src/main/java/dev/arcp/middleware/spring/*, arcp-middleware-jakarta/src/main/java/dev/arcp/middleware/jakarta/*, arcp-runtime-jetty/src/main/java/dev/arcp/runtime/jetty/*, arcp-middleware-vertx/src/main/java/dev/arcp/middleware/vertx/VertxWebSocketTransport.java, .github/workflows/ci.yml
Spring Boot adds HostAllowlistHandshakeInterceptor WebSocket interceptor returning 403 for disallowed hosts. Jakarta refactors rejection via ThreadLocal bridge in modifyHandshake/getEndpointInstance to spec-compliant early termination in ArcpJakartaEndpoint.onOpen. Jetty adds HostAllowlistFilter servlet filter blocking disallowed hosts before upgrade, installed on root context for REQUEST dispatcher type. Vert.x improves reliability with async failure callback on writeTextMessage, surfacing write errors. CI Maven command updated to run package before javadoc:javadoc for inter-module resolution.
SessionLoop Resume Support with Parking & Event Buffering
arcp-runtime/src/main/java/dev/arcp/runtime/session/SessionLoop.java, arcp-runtime/src/test/java/dev/arcp/runtime/SessionLoopAuditTest.java, arcp-runtime/src/test/java/dev/arcp/runtime/SessionResumeTest.java
SessionLoop adds Phase.PARKED state, volatile transport field, parkExpiry/explicitClose/resumeDelegate/heartbeatInterval fields. Transport errors route through onTransportDropped(reason): unexpected drops park if resumable; explicit close shuts down. Handshake enhanced to reattach parked sessions, replay buffered events, reschedule heartbeat via resumeOnto(...)/rejectResume() helpers. Message dispatch handles SessionClosed/JobCancelled as client-only, correlates errors to envelopes. handleListJobs uses JobListing.page(...) for pagination. Idempotency uses IdempotencyFingerprint.of(mapper, submit) with stale-claim release/reclaim on retry. Job events recorded with per-subscriber seq allocation, credential_rotated redacted for non-owners. sendWithId(...) buffers sequenced messages; while PARKED, messages buffered instead of sent. Comprehensive audit and resume tests.
ArcpRuntime Resume Session Management
arcp-runtime/src/main/java/dev/arcp/runtime/ArcpRuntime.java
New resumable map (concurrent, token-keyed) stores parked SessionLoop instances. Public APIs: parkResumable(token, loop), takeResumable(token), removeResumable(token, loop) conditionally removes only if loop matches. accept(Transport) inserts loop into sessions before calling start(), removes if start results in CLOSED phase (prevents zombie entries). close() shuts down all parked sessions and clears resumable registry before active-session cleanup.
Credential Revocation & Resume Examples
arcp-runtime/src/main/java/dev/arcp/runtime/credentials/CredentialBinding.java, examples/resume/src/main/java/dev/arcp/examples/resume/Main.java, recipes/stream-resume/src/main/java/dev/arcp/recipes/streamresume/Main.java
CredentialBinding.revokeMinted(IssuedCredential) records credential identity in revocation store before delegating to internal revocation. Examples explicitly model transport drop, capture resume token/sequence, wait for PARKED phase, reconnect with saved token/sequence for history replay. Demonstrates fault-tolerant session lifecycle with parking vs. explicit-close semantics.
ResultStream Duplicate Chunk Idempotency
arcp-client/src/main/java/dev/arcp/client/ResultStream.java, arcp-client/src/test/java/dev/arcp/client/ResultStreamTest.java
ResultStream.accept(...) checks for duplicate chunkSeq: tolerates byte-identical retransmissions, throws DuplicateChunkException for divergent duplicates of still-pending chunks. Test verifies both behaviors.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐰 Across the network drops and bounces we hop,
Parked in patience till transport returns from the stop.
Fingerprints guard against duplicate claims,
While credentials dance through unknown schemes' games.
Budgets align in delegated grace,
And leases find their proper place.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/audit-issues-2026-06-11

Comment thread arcp-client/src/main/java/dev/arcp/client/ArcpClient.java
Comment thread arcp-runtime/src/main/java/dev/arcp/runtime/session/JobListing.java Fixed
nficano and others added 3 commits June 11, 2026 22:55
CI's spotless:check gate (JDK 21) flagged formatting drift in the audit-fix
commits, which were verified locally with -Darcp.skip.spotless=true on JDK 25.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The resume semantics introduced for #22 broke the resume example and the
stream-resume recipe in two ways:

- ArcpClient dropped a top-level RESUME_WINDOW_EXPIRED error that arrives
  before session.welcome as uncorrelated, leaving connect() to hang until
  its timeout. The handshake is now rejected with the decoded error so
  callers can fall back to a fresh session (and the in-memory transport
  never completes the client's inbound stream on a runtime-side close, so
  the existing transport-closed fallback could not catch this).

- examples/resume and recipes/stream-resume closed the first client
  gracefully, which cancels the session (§6.7) and invalidates its resume
  token. They now authenticate with a stable bearer principal, drop the
  transport without session.close so the runtime parks the session for the
  resume window, and await the parked phase before reconnecting.

Adds a ClientAuditTest regression test asserting connect() surfaces
RESUME_WINDOW_EXPIRED promptly for an unknown token.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Integer.parseInt on a valid-Base64 but non-numeric cursor threw a raw
NumberFormatException whose parser message was echoed to the client in
INVALID_REQUEST. Both decode failures now surface as a uniform
"invalid cursor" IllegalArgumentException. Addresses the code-quality
review finding on decodeCursor.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
case JobSubscribe sub -> handleSubscribe(sub);
case JobSubscribe sub -> handleSubscribe(envelope, sub);
case JobUnsubscribe unsub -> handleUnsubscribe(unsub);
case SessionClosed ignored -> log.warn("client-only message received: {}", m);
Comment thread arcp-runtime/src/main/java/dev/arcp/runtime/session/SessionLoop.java Outdated
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

nficano and others added 2 commits June 11, 2026 23:07
javadoc:javadoc alone does not compile, so arcp-runtime's javadoc
resolved arcp-core from the runner's cached SNAPSHOT — which lacks
classes a PR adds upstream (here JobCancelled/SessionClosed from #95,
#96). Run package in the same invocation so inter-module dependencies
come from the reactor. Verified against a cold local repository.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
@nficano nficano merged commit d512fbc into main Jun 12, 2026
5 of 6 checks passed
nficano added a commit that referenced this pull request Jun 12, 2026
…oc, 80% coverage gate (#114)

* fix(runtime): make credential issuance non-blocking on the dispatch thread (#109)

acceptJob ran the provisioner's CompletableFuture to completion with
join() inside onNext, so a slow upstream key-minting call starved every
inbound message on the session: pings went unanswered, heartbeat.onInbound
never advanced, and a stall beyond 2x heartbeat_interval_sec had the
session reaped as HEARTBEAT_LOST mid-submit. One slow job.submit also
head-of-line-blocked cancels and acks for every other job.

Acceptance is now split: the prologue (resolve, register) stays on the
dispatch thread; the epilogue (attach credentials, job.accepted, worker
start, watchdogs) runs when issuance completes, chained through a
per-session acceptSequence so job.accepted preserves submit order — the
FIFO correlation clients rely on, which is why this was deferred in #112.
Idempotent replays join the same chain so a replayed acceptance cannot
overtake the original and always observes the captured budget (#79).
If the session dies or the job is cancelled while issuance is in flight,
freshly minted credentials are revoked and nothing is announced; the
lease-expiry watchdog measures its delay from the current clock so
issuance latency cannot postpone an absolute expires_at.

AsyncAcceptanceTest covers both acceptance criteria: pongs continue
through a >2x-interval provisioner stall, and a slow first issuance does
not let a later acceptance overtake it.

Closes #109

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs: correct spec-section citations to existing sections

Closes #81

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs: complete Javadoc for exported APIs and enforce doclint

Document every public/protected type, constructor, method, record
component, and enum constant across the ten published modules
(arcp-core, arcp-client, arcp-runtime, arcp, arcp-otel,
arcp-runtime-jetty, arcp-middleware-jakarta,
arcp-middleware-spring-boot, arcp-middleware-vertx, arcp-tck):
~575 doclint findings fixed, including two broken {@link} references.

Enforce it in the root pom: maven-javadoc-plugin now runs with
doclint `all` (including `missing`) and failOnWarnings=true, so the
CI javadoc job fails on any undocumented exported API. The gate is
overridable for local triage via -Darcp.javadoc.failOnWarnings=false.

A few doclint "use of default constructor" warnings required adding
explicit, documented no-arg constructors with semantics identical to
the implicit ones; no other code changed.

Closes #34

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(middleware): cover host-allowlist, handshake, and transport-failure branches (#33)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(runtime): raise branch coverage for session/job/credential paths (#33)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(client,core): raise branch coverage for client correlation, transport, and core wire paths (#33)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(otel): cover tracing-transport branches; enforce 80% line+branch coverage (#33)

arcp-otel was the last module under the bar (66.7% branch): the new
ArcpOtelBranchTest covers send-failure span recording, optional envelope
attributes, trace-context inject (no-op propagator, merge into existing
extensions, non-object extensions left untouched) and extract (malformed
shapes fall back to Context.current, valid traceparent becomes the
receive span's parent), publisher caching, close ordering, and upstream
completion propagation.

With every library module now >=80% on both counters (line 97.4%,
branch 94.7% overall), jacoco:check enforces a 0.80 line+branch BUNDLE
minimum at verify time. Library modules opt in via
arcp.skip.coverage.check=false; examples/recipes and the sourceless
arcp umbrella stay exempt, and check self-skips when tests were skipped
(no execution data), so -DskipTests CI jobs are unaffected.

Closes #33

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment