Fix audit findings (non-docs): bugs, spec-conformance, security#77
Conversation
Bumps Microsoft.Extensions.Hosting.Abstractions from 9.0.0 to 10.0.8 Bumps Microsoft.NET.Test.Sdk from 18.5.1 to 18.6.0 --- updated-dependencies: - dependency-name: Microsoft.Extensions.Hosting.Abstractions dependency-version: 10.0.8 dependency-type: direct:production update-type: version-update:semver-major dependency-group: dotnet-dependencies - dependency-name: Microsoft.NET.Test.Sdk dependency-version: 18.6.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: dotnet-dependencies ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 6.0.1 to 7.0.0. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@e79a696...fb8b358) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-version: 7.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
…#71, #72, #73, #74, #75, #76) - #71 idempotent job.submit replay no longer re-runs the agent (skip Resolve/RunAsync on replay) - #72 lease glob "/prefix/**" keeps the path-boundary separator so siblings are not authorized - #73 server-rejected job.submit/list_jobs now fault the awaiting client instead of hanging - #74 cancel authority is session-scoped (fail-closed), not principal-scoped - #75 add job.cancelled ack and JOB_NOT_FOUND for unknown jobs - #76 populate session.jobs last_event_seq from a per-job high-water mark Co-authored-by: Cursor <cursoragent@cursor.com>
#39, #40, #41, #42, #43, #45, #46, #47, #67, #68) Runtime / spec-conformance: - #37 root running jobs at a runtime-scoped token so session teardown (heartbeat loss, graceful close, transport drop) no longer terminates in-flight jobs (spec §6.4, §6.7) - #41 add session.close/session.closed wire types; the runtime acks a graceful close with session.closed (session.bye kept as a deprecated alias) (spec §6.7) - #40 dispatch now surfaces session.error{INTERNAL_ERROR} for unexpected exceptions (spec §12) - #46 advertise model.use independently of credential provisioning (spec §9.7) Event delivery / ordering: - #39 serialize event_seq assignment with the outbound enqueue via a per-session emit gate so wire order is strictly monotonic under concurrent emitters (spec §8.3) - #38 subscriber fan-out is back-pressure-aware: on a full channel the subscription is torn down deterministically instead of silently dropping an already-sequenced event (spec §8.3) - #44 make the subscribe history/live-fan-out boundary exact via an atomic register+snapshot and a per-job event index, so a mid-stream subscriber sees each event exactly once (spec §7.6) Authorization / security: - #42 gate lease operations on remaining budget (BUDGET_EXHAUSTED) before the pattern check (spec §9.6) - #43 deny-by-default for uncovered tool.call/agent.delegate, with an explicit PermissiveUnleasedOperations opt-in (spec §9.3) - #45 list_jobs fails closed: an empty/absent principal sees only what the policy permits (spec §6.6, §14) Performance / correctness: - #67 keyset pagination over (created_at, job_id): stable ties and page work bounded to limit+1 - #68 AgentRegistry no longer exposes its mutable version dictionary; ToInventory snapshots under lock - #47 client detects event_seq gaps and raises a broken-session signal (spec §8.3) Co-authored-by: Cursor <cursoragent@cursor.com>
…eases - Add unit tests for the budget authorization gate (#42) and AgentRegistry concurrency (#68). - Add integration tests for job survival across session teardown (#37), session.close/closed (#41), INTERNAL_ERROR on unexpected dispatch failure (#40), fail-closed empty-principal listing (#45), model.use advertisement (#46), keyset pagination stability (#67), deny-by-default tool.call (#43), strictly-monotonic event_seq under concurrent emitters (#39), exactly-once subscribe boundary (#44), and client event_seq gap detection (#47). - Grant tool.call/agent.delegate leases in the CostBudget, Delegate, and multi-agent-budget samples/recipes so they remain runnable under deny-by-default (#43); enable permissive mode in the event round-trip test which exercises event kinds rather than lease enforcement. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Warning Review limit reached
More reviews will be available in 8 minutes and 26 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more credits in the billing tab to continue. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (22)
WalkthroughThis PR implements nine major spec-conformance and correctness fixes for the ARCP protocol runtime, client, and core libraries. Key changes: deny-by-default lease enforcement (§9.3), job cancellation with session-scoped authority and acknowledgement (§7.4, §7.6), atomic monotonic event sequencing under concurrent emitters (§8.3), prevention of idempotent replay double-execution (§7.2), subscription boundary atomicity (§8.3), stable keyset pagination, and session-close protocol alignment (§6.7). All changes are extensively tested with integration and unit tests validating authorization, event ordering, pagination stability, and error propagation. ChangesAudit fixes and spec compliance: deny-by-default leases, job cancellation, event sequencing, idempotent replay
🎯 4 (Complex) | ⏱️ ~60 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
|
| } | ||
| } | ||
|
|
||
| closed.Should().NotBeNull("the runtime must acknowledge session.close with session.closed"); |
| if (cancelledEnv is not null && errorEnv is not null) break; | ||
| } | ||
|
|
||
| cancelledEnv.Should().NotBeNull(); |
| cancelledEnv.Should().NotBeNull(); | ||
| ((JobCancelledPayload)cancelledEnv!.Payload!).JobId.Should().Be(jobId); | ||
| sawCancelledBeforeError.Should().BeTrue("the cancel ack must precede the terminal job.error"); | ||
| errorEnv.Should().NotBeNull(); |
| } | ||
| } | ||
|
|
||
| errEnv.Should().NotBeNull(); |
| var handle = await c.SubmitAsync("fanout"); | ||
|
|
||
| var seqs = new List<long>(); | ||
| var cts = new CancellationTokenSource(TimeSpan.FromSeconds(10)); |
| var sub = await watcher.SubscribeAsync(handle.JobId, history: true); | ||
|
|
||
| var received = new List<int>(); | ||
| var cts = new CancellationTokenSource(TimeSpan.FromSeconds(10)); |
| foreach (var job in FilterByPrincipal(requesterPrincipal, policy)) | ||
| { | ||
| if (!MatchesFilter(job, filter, statusSet)) continue; | ||
| var key = JobKey.From(job); | ||
| if (after is { } a && JobKey.Compare(key, a) <= 0) continue; // at or before the cursor | ||
| InsertBounded(page, job, take + 1); | ||
| } |
| foreach (var historic in history) | ||
| { | ||
| if (fromSeq is { } f && historic.JobEventIndex is { } idx && idx <= f) continue; | ||
| var rekeyed = (subscriberSeesOwnerSecrets ? historic : RedactForNonOwner(historic, job)) | ||
| with | ||
| { SessionId = SessionId.Value }; | ||
| var stamped = EventLog.Append(rekeyed); | ||
| await WriteToOutboundAsync(stamped, cancellationToken).ConfigureAwait(false); | ||
| } |
| catch (Exception sendEx) | ||
| { | ||
| _logger.LogError(sendEx, "Failed to send INTERNAL_ERROR for type {Type}", env.Type); | ||
| } |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/Arcp.Runtime/JobManager.cs (1)
113-183:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep idempotency records alive for the full configured replay window.
SubmitAsyncnow advertises replay matching viaIdempotencyWindowSec, but terminal cleanup still deletes the_idempotencyentry when the job is GC’d afterTerminalJobRetentionSec. With the current defaults (3600vs600), a completed job starts replaying as a fresh submit after 10 minutes, which reopens the double-execution bug this change is meant to close.Please decouple idempotency-record expiry from terminal job retention, or otherwise ensure the record survives until the full idempotency window elapses.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/Arcp.Runtime/JobManager.cs` around lines 113 - 183, SubmitAsync correctly records idempotency entries in _idempotency but the cleanup logic currently removes those entries when jobs are GC’d (after TerminalJobRetentionSec), causing idempotency to expire sooner than _idempotencyWindowSec; update the cleanup/terminalization path that removes entries from _idempotency so it only deletes an IdempotencyRecord when its CreatedAt is older than _idempotencyWindowSec (or store an explicit expiry timestamp on IdempotencyRecord and compare to _time.GetUtcNow()), rather than removing on job terminal retention; reference SubmitAsync, _idempotency, IdempotencyRecord, TerminalJobRetentionSec and _idempotencyWindowSec to locate and change the garbage-collection/removal logic accordingly.src/Arcp.Runtime/SessionState.cs (2)
119-135:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift
session.closednow bypasses the session's single-writer send path.
SenderLoopis still draining_outboundwhen this direct_transport.SendAsync(...)runs, so the transport can see concurrent writers andsession.closedcan overtake frames that were already queued before the close. The old channel-based design serialized all transport writes through one place; this change breaks that invariant.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/Arcp.Runtime/SessionState.cs` around lines 119 - 135, The direct call to _transport.SendAsync for the SessionClosed envelope breaks the single-writer invariant because SenderLoop is still draining _outbound; instead, enqueue the SessionClosed Envelope into the _outbound channel (using the same Envelope/SessionByePayload used now), then complete the writer (keep _outbound.Writer.TryComplete()), await SenderLoop to finish flushing the channel so the transport write is serialized by SenderLoop, and only after SenderLoop completes call _cts.Cancel() and _transport.CloseAsync(reason, ...); remove the direct _transport.SendAsync(...) for session.closed so all writes go through SenderLoop.
146-152:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDispose still skips owned resources after any prior close.
CloseAsyncandRunAsyncboth setIsClosed = true, so this early return prevents_cts.Dispose()and_emitGate.Dispose()from running on the normal lifecycle path. That leaves every closed session holding onto its cancellation source and gate until GC.Suggested fix
+ private bool _disposed; + public async ValueTask DisposeAsync() { - if (IsClosed) return; - await CloseAsync().ConfigureAwait(false); + if (_disposed) return; + _disposed = true; + if (!IsClosed) + await CloseAsync().ConfigureAwait(false); _cts.Dispose(); _emitGate.Dispose(); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/Arcp.Runtime/SessionState.cs` around lines 146 - 152, DisposeAsync currently returns early when IsClosed is true and thus never disposes owned fields; update DisposeAsync so it always disposes _cts and _emitGate even if IsClosed is true: call CloseAsync() only if not already closed (or skip it when IsClosed), but ensure _cts.Dispose() and _emitGate.Dispose() run unconditionally (or guarded by a separate disposed flag) so resources are released after CloseAsync/RunAsync set IsClosed; reference DisposeAsync, CloseAsync, RunAsync, IsClosed, _cts, and _emitGate when making the change.
🧹 Nitpick comments (1)
tests/Arcp.IntegrationTests/IdempotencyTests.cs (1)
101-110: ⚡ Quick winExercise the replay before the first run completes.
This test waits for
first.Resultbefore submitting the duplicate, so it only proves “completed-job replay” behavior. Issue#71is specifically about replaying an already-accepted job; moving the secondSubmitAsyncahead ofawait first.Resultwould catch a regression that only re-runs in-flight jobs.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/Arcp.IntegrationTests/IdempotencyTests.cs` around lines 101 - 110, The test currently awaits first.Result before submitting the duplicate, so change the sequence to submit the duplicate idempotent request using c.SubmitAsync("counter", ..., idempotencyKey: "dup") before awaiting first.Result.WaitAsync(...) to exercise replay of an in-flight job; keep the assertion that second.JobId.Value equals first.JobId.Value, then await first.Result (and/or the second result) and preserve the runCount.Should().Be(1) check after the short Task.Delay to ensure no replay occurred.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/Arcp.Client/ArcpClient.Dispatch.cs`:
- Around line 149-167: The current handler calls OnError on every accepted job
handle (_handles) for any session.error; change it so session.error is
distinguished between request-scoped rejections (for job.submit and
session.list_jobs) and truly session-fatal errors: when the error is
request-scoped only dequeue and call OnError on _pendingSubmits and fail entries
in _listJobsRequests (using JobHandle.ToException), but do NOT iterate _handles;
when it is a session-fatal error, set a session-failure flag (e.g.
_sessionBroken) to stop further dispatch and then iterate _handles and call
OnError; update the dispatch logic to check _sessionBroken before processing
incoming events so already-accepted jobs are only failed on actual session-level
fatal errors.
In `@src/Arcp.Client/JobHandle.cs`:
- Around line 92-111: The ToException method currently doesn't handle
ErrorCode.InternalError and falls back to ArcpException; update the switch in
JobHandle.ToException to include ErrorCode.InternalError => new
InternalErrorException(message, detail) so INTERNAL_ERROR maps to the concrete
InternalErrorException type (ensure InternalErrorException is
referenced/imported where used).
In `@src/Arcp.Runtime/JobManager.Listing.cs`:
- Around line 55-59: When requesterPrincipal is empty, do not consult
policy.CanObserve; change the branch that currently constructs AuthPrincipal and
filters _jobs to instead return an empty IEnumerable immediately (i.e., return
Enumerable.Empty<Job>() or equivalent) to ensure fail-closed behavior; update
the logic around requesterPrincipal and AuthPrincipal in the method containing
_jobs and SubmitterPrincipal to unconditionally return no jobs for empty/absent
requesterPrincipal, and add a regression test using a permissive policy to
assert that listing jobs with an empty requesterPrincipal yields an empty
sequence.
In `@src/Arcp.Runtime/SessionState.Dispatch.cs`:
- Around line 40-62: The catch in DispatchAsync currently treats all exceptions
(including OperationCanceledException) as INTERNAL_ERROR and attempts to
SendAsync a session.error; change it to detect cooperative cancellation and
avoid emitting an INTERNAL_ERROR: if the exception is OperationCanceledException
or cancellationToken.IsCancellationRequested (or Exception.InnerException is an
OCE), rethrow or return immediately instead of logging/sending the
session.error; otherwise proceed with the existing _logger.LogError and
SendAsync of the Envelope (SessionErrorPayload / ErrorCode.InternalError).
Ensure checks reference DispatchAsync, SendAsync, CloseAsync, Envelope,
SessionErrorPayload, SessionId and ErrorCode.InternalError so the altered catch
only handles unexpected failures.
In `@tests/Arcp.IntegrationTests/AuditFixesTests.cs`:
- Around line 24-39: StartServer currently returns a live ArcpServer that
launches background work (via server.AcceptAsync) and is never disposed; change
StartServer to return a small disposable harness (e.g., TestServerHost
implementing IAsyncDisposable) that encapsulates the ArcpServer and the
MemoryTransport client pair, starts AcceptAsync internally, and on
Dispose/DisposeAsync cancels/awaits the AcceptAsync task and calls
ArcpServer.DisposeAsync (or a Stop/Shutdown method) to ensure background
sweeper/session work is stopped; update callers to await using the returned
harness. Ensure references to StartServer, ArcpServer, AcceptAsync, and
MemoryTransport.Pair are used to locate and modify the code.
- Around line 165-176: The object-initializer blocks passed to
ArcpClient.ConnectAsync (the ArcpClientOptions instances in the calls that
create `alice` and `bob`) are currently single-line and failing dotnet format;
reformat each initializer so properties are on separate lines (one line per
property) with proper indentation and trailing commas as per repository style
(e.g., break the `new ArcpClientOptions { Client = new ClientInfo { Name =
"...", Version = "..." }, Token = "..." }` into a multi-line initializer for
ArcpClientOptions and a nested multi-line initializer for ClientInfo) so dotnet
format passes.
---
Outside diff comments:
In `@src/Arcp.Runtime/JobManager.cs`:
- Around line 113-183: SubmitAsync correctly records idempotency entries in
_idempotency but the cleanup logic currently removes those entries when jobs are
GC’d (after TerminalJobRetentionSec), causing idempotency to expire sooner than
_idempotencyWindowSec; update the cleanup/terminalization path that removes
entries from _idempotency so it only deletes an IdempotencyRecord when its
CreatedAt is older than _idempotencyWindowSec (or store an explicit expiry
timestamp on IdempotencyRecord and compare to _time.GetUtcNow()), rather than
removing on job terminal retention; reference SubmitAsync, _idempotency,
IdempotencyRecord, TerminalJobRetentionSec and _idempotencyWindowSec to locate
and change the garbage-collection/removal logic accordingly.
In `@src/Arcp.Runtime/SessionState.cs`:
- Around line 119-135: The direct call to _transport.SendAsync for the
SessionClosed envelope breaks the single-writer invariant because SenderLoop is
still draining _outbound; instead, enqueue the SessionClosed Envelope into the
_outbound channel (using the same Envelope/SessionByePayload used now), then
complete the writer (keep _outbound.Writer.TryComplete()), await SenderLoop to
finish flushing the channel so the transport write is serialized by SenderLoop,
and only after SenderLoop completes call _cts.Cancel() and
_transport.CloseAsync(reason, ...); remove the direct _transport.SendAsync(...)
for session.closed so all writes go through SenderLoop.
- Around line 146-152: DisposeAsync currently returns early when IsClosed is
true and thus never disposes owned fields; update DisposeAsync so it always
disposes _cts and _emitGate even if IsClosed is true: call CloseAsync() only if
not already closed (or skip it when IsClosed), but ensure _cts.Dispose() and
_emitGate.Dispose() run unconditionally (or guarded by a separate disposed flag)
so resources are released after CloseAsync/RunAsync set IsClosed; reference
DisposeAsync, CloseAsync, RunAsync, IsClosed, _cts, and _emitGate when making
the change.
---
Nitpick comments:
In `@tests/Arcp.IntegrationTests/IdempotencyTests.cs`:
- Around line 101-110: The test currently awaits first.Result before submitting
the duplicate, so change the sequence to submit the duplicate idempotent request
using c.SubmitAsync("counter", ..., idempotencyKey: "dup") before awaiting
first.Result.WaitAsync(...) to exercise replay of an in-flight job; keep the
assertion that second.JobId.Value equals first.JobId.Value, then await
first.Result (and/or the second result) and preserve the runCount.Should().Be(1)
check after the short Task.Delay to ensure no replay occurred.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 73bcc1ff-8477-4ab7-9cb1-a2bd5c013f95
📒 Files selected for processing (35)
recipes/multi-agent-budget/Program.cssamples/CostBudget/Program.cssamples/Delegate/Program.cssrc/Arcp.Client/ArcpClient.Dispatch.cssrc/Arcp.Client/ArcpClient.cssrc/Arcp.Client/JobHandle.cssrc/Arcp.Client/PublicAPI.Unshipped.txtsrc/Arcp.Core/Envelope/Envelope.cssrc/Arcp.Core/Envelope/MessageTypeRegistry.cssrc/Arcp.Core/Messages/JobCancelledPayload.cssrc/Arcp.Core/Messages/MessageTypeNames.cssrc/Arcp.Core/PublicAPI.Unshipped.txtsrc/Arcp.Runtime/Agents/AgentRegistry.cssrc/Arcp.Runtime/ArcpServer.cssrc/Arcp.Runtime/ArcpServerOptions.cssrc/Arcp.Runtime/Job.cssrc/Arcp.Runtime/JobContext.cssrc/Arcp.Runtime/JobManager.Listing.cssrc/Arcp.Runtime/JobManager.cssrc/Arcp.Runtime/Leases/LeaseManager.cssrc/Arcp.Runtime/PublicAPI.Unshipped.txtsrc/Arcp.Runtime/SessionState.Dispatch.cssrc/Arcp.Runtime/SessionState.Jobs.cssrc/Arcp.Runtime/SessionState.Outbound.cssrc/Arcp.Runtime/SessionState.cstests/Arcp.IntegrationTests/AuditFixesTests.cstests/Arcp.IntegrationTests/CancellationAckTests.cstests/Arcp.IntegrationTests/ClientGapDetectionTests.cstests/Arcp.IntegrationTests/EndToEndTests.cstests/Arcp.IntegrationTests/IdempotencyTests.cstests/Arcp.IntegrationTests/JobContextEventsTests.cstests/Arcp.IntegrationTests/JobListingTests.cstests/Arcp.IntegrationTests/SubscriptionOrderingTests.cstests/Arcp.UnitTests/AuditFixesUnitTests.cstests/Arcp.UnitTests/LeaseTests.cs
| foreach (var h in _handles.Values) | ||
| { | ||
| h.OnError(new JobErrorPayload | ||
| { | ||
| Code = err.Code, | ||
| Message = err.Message, | ||
| Retryable = err.Retryable, | ||
| Detail = err.Detail, | ||
| }); | ||
| h.OnError(jobError); | ||
| } | ||
|
|
||
| // A submission rejected before acceptance lives in _pendingSubmits, not _handles, and a | ||
| // list_jobs request lives in _listJobsRequests. session.error is not correlated to a | ||
| // specific request id, so the safe contract is to fault every outstanding request — leaving | ||
| // them pending would hang SubmitAsync/ListJobsAsync until the caller's token fires. | ||
| while (_pendingSubmits.TryDequeue(out var pending)) | ||
| { | ||
| pending.OnError(jobError); | ||
| } | ||
|
|
||
| foreach (var key in _listJobsRequests.Keys) | ||
| { | ||
| if (_listJobsRequests.TryRemove(key, out var tcs)) | ||
| tcs.TrySetException(JobHandle.ToException(err.Code, err.Message, err.Detail)); | ||
| } |
There was a problem hiding this comment.
Don’t fail every accepted job on a request-level session.error.
Now that session.error is the rejection path for pending job.submit and session.list_jobs, this method will also call OnError(...) on every already-accepted handle in _handles. One bad submit or list request can therefore terminally fail unrelated running jobs even if the server keeps streaming their events/results. Split request-scoped rejection handling from truly session-fatal errors, or mark the session broken and stop further dispatch before you fail accepted handles.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/Arcp.Client/ArcpClient.Dispatch.cs` around lines 149 - 167, The current
handler calls OnError on every accepted job handle (_handles) for any
session.error; change it so session.error is distinguished between
request-scoped rejections (for job.submit and session.list_jobs) and truly
session-fatal errors: when the error is request-scoped only dequeue and call
OnError on _pendingSubmits and fail entries in _listJobsRequests (using
JobHandle.ToException), but do NOT iterate _handles; when it is a session-fatal
error, set a session-failure flag (e.g. _sessionBroken) to stop further dispatch
and then iterate _handles and call OnError; update the dispatch logic to check
_sessionBroken before processing incoming events so already-accepted jobs are
only failed on actual session-level fatal errors.
| /// <summary>Map a wire error code to the most specific <see cref="ArcpException"/> subtype so | ||
| /// callers can <c>catch</c> on the concrete type (e.g. <see cref="DuplicateKeyException"/>).</summary> | ||
| internal static ArcpException ToException(string code, string message, string? detail) => code switch | ||
| { | ||
| ErrorCode.DuplicateKey => new DuplicateKeyException(message, detail), | ||
| ErrorCode.AgentNotAvailable => new AgentNotAvailableException(message, detail), | ||
| ErrorCode.AgentVersionNotAvailable => new AgentVersionNotAvailableException(message, detail), | ||
| ErrorCode.LeaseSubsetViolation => new LeaseSubsetViolationException(message, detail), | ||
| ErrorCode.PermissionDenied => new PermissionDeniedException(message, detail), | ||
| ErrorCode.JobNotFound => new JobNotFoundException(message, detail), | ||
| ErrorCode.InvalidRequest => new InvalidRequestException(message, detail), | ||
| ErrorCode.Unauthenticated => new UnauthenticatedException(message, detail), | ||
| ErrorCode.BudgetExhausted => new BudgetExhaustedException(message, detail), | ||
| ErrorCode.LeaseExpired => new LeaseExpiredException(message, detail), | ||
| ErrorCode.ResumeWindowExpired => new ResumeWindowExpiredException(message, detail), | ||
| ErrorCode.HeartbeatLost => new HeartbeatLostException(message, detail), | ||
| ErrorCode.Timeout => new Arcp.Core.Errors.TimeoutException(message, detail), | ||
| ErrorCode.Cancelled => new CancelledException(message, detail), | ||
| _ => new ArcpException(code, message, detail), | ||
| }; |
There was a problem hiding this comment.
Map INTERNAL_ERROR to InternalErrorException.
ToException(...) is now the central wire-to-client exception map, but ErrorCode.InternalError still falls through to plain ArcpException. That loses the concrete SDK type on the newly added INTERNAL_ERROR path.
Suggested fix
ErrorCode.ResumeWindowExpired => new ResumeWindowExpiredException(message, detail),
ErrorCode.HeartbeatLost => new HeartbeatLostException(message, detail),
+ ErrorCode.InternalError => new InternalErrorException(message, detail),
ErrorCode.Timeout => new Arcp.Core.Errors.TimeoutException(message, detail),
ErrorCode.Cancelled => new CancelledException(message, detail),
_ => new ArcpException(code, message, detail),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// <summary>Map a wire error code to the most specific <see cref="ArcpException"/> subtype so | |
| /// callers can <c>catch</c> on the concrete type (e.g. <see cref="DuplicateKeyException"/>).</summary> | |
| internal static ArcpException ToException(string code, string message, string? detail) => code switch | |
| { | |
| ErrorCode.DuplicateKey => new DuplicateKeyException(message, detail), | |
| ErrorCode.AgentNotAvailable => new AgentNotAvailableException(message, detail), | |
| ErrorCode.AgentVersionNotAvailable => new AgentVersionNotAvailableException(message, detail), | |
| ErrorCode.LeaseSubsetViolation => new LeaseSubsetViolationException(message, detail), | |
| ErrorCode.PermissionDenied => new PermissionDeniedException(message, detail), | |
| ErrorCode.JobNotFound => new JobNotFoundException(message, detail), | |
| ErrorCode.InvalidRequest => new InvalidRequestException(message, detail), | |
| ErrorCode.Unauthenticated => new UnauthenticatedException(message, detail), | |
| ErrorCode.BudgetExhausted => new BudgetExhaustedException(message, detail), | |
| ErrorCode.LeaseExpired => new LeaseExpiredException(message, detail), | |
| ErrorCode.ResumeWindowExpired => new ResumeWindowExpiredException(message, detail), | |
| ErrorCode.HeartbeatLost => new HeartbeatLostException(message, detail), | |
| ErrorCode.Timeout => new Arcp.Core.Errors.TimeoutException(message, detail), | |
| ErrorCode.Cancelled => new CancelledException(message, detail), | |
| _ => new ArcpException(code, message, detail), | |
| }; | |
| /// <summary>Map a wire error code to the most specific <see cref="ArcpException"/> subtype so | |
| /// callers can <c>catch</c> on the concrete type (e.g. <see cref="DuplicateKeyException"/>).</summary> | |
| internal static ArcpException ToException(string code, string message, string? detail) => code switch | |
| { | |
| ErrorCode.DuplicateKey => new DuplicateKeyException(message, detail), | |
| ErrorCode.AgentNotAvailable => new AgentNotAvailableException(message, detail), | |
| ErrorCode.AgentVersionNotAvailable => new AgentVersionNotAvailableException(message, detail), | |
| ErrorCode.LeaseSubsetViolation => new LeaseSubsetViolationException(message, detail), | |
| ErrorCode.PermissionDenied => new PermissionDeniedException(message, detail), | |
| ErrorCode.JobNotFound => new JobNotFoundException(message, detail), | |
| ErrorCode.InvalidRequest => new InvalidRequestException(message, detail), | |
| ErrorCode.Unauthenticated => new UnauthenticatedException(message, detail), | |
| ErrorCode.BudgetExhausted => new BudgetExhaustedException(message, detail), | |
| ErrorCode.LeaseExpired => new LeaseExpiredException(message, detail), | |
| ErrorCode.ResumeWindowExpired => new ResumeWindowExpiredException(message, detail), | |
| ErrorCode.HeartbeatLost => new HeartbeatLostException(message, detail), | |
| ErrorCode.InternalError => new InternalErrorException(message, detail), | |
| ErrorCode.Timeout => new Arcp.Core.Errors.TimeoutException(message, detail), | |
| ErrorCode.Cancelled => new CancelledException(message, detail), | |
| _ => new ArcpException(code, message, detail), | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/Arcp.Client/JobHandle.cs` around lines 92 - 111, The ToException method
currently doesn't handle ErrorCode.InternalError and falls back to
ArcpException; update the switch in JobHandle.ToException to include
ErrorCode.InternalError => new InternalErrorException(message, detail) so
INTERNAL_ERROR maps to the concrete InternalErrorException type (ensure
InternalErrorException is referenced/imported where used).
| if (string.IsNullOrEmpty(requesterPrincipal)) | ||
| { | ||
| var anonymous = new AuthPrincipal(string.Empty); | ||
| return _jobs.Values.Where(j => policy.CanObserve(j.SubmitterPrincipal, anonymous)); | ||
| } |
There was a problem hiding this comment.
Make empty/absent principals return no jobs.
This branch still defers to policy.CanObserve(...), so an allow-all or overly broad policy can expose every principal’s jobs when requesterPrincipal is empty. That is not fail-closed and conflicts with the session.list_jobs contract described for #45. Return an empty sequence here unconditionally, and add a regression test with a permissive policy so this cannot reopen.
Suggested fix
if (string.IsNullOrEmpty(requesterPrincipal))
- {
- var anonymous = new AuthPrincipal(string.Empty);
- return _jobs.Values.Where(j => policy.CanObserve(j.SubmitterPrincipal, anonymous));
- }
+ return Enumerable.Empty<Job>();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (string.IsNullOrEmpty(requesterPrincipal)) | |
| { | |
| var anonymous = new AuthPrincipal(string.Empty); | |
| return _jobs.Values.Where(j => policy.CanObserve(j.SubmitterPrincipal, anonymous)); | |
| } | |
| if (string.IsNullOrEmpty(requesterPrincipal)) | |
| return Enumerable.Empty<Job>(); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/Arcp.Runtime/JobManager.Listing.cs` around lines 55 - 59, When
requesterPrincipal is empty, do not consult policy.CanObserve; change the branch
that currently constructs AuthPrincipal and filters _jobs to instead return an
empty IEnumerable immediately (i.e., return Enumerable.Empty<Job>() or
equivalent) to ensure fail-closed behavior; update the logic around
requesterPrincipal and AuthPrincipal in the method containing _jobs and
SubmitterPrincipal to unconditionally return no jobs for empty/absent
requesterPrincipal, and add a regression test using a permissive policy to
assert that listing jobs with an empty requesterPrincipal yields an empty
sequence.
| catch (Exception ex) | ||
| { | ||
| // Spec §12: surface an unexpected failure as session.error{INTERNAL_ERROR} so the | ||
| // peer is not left waiting forever for an acknowledgement that never arrives. | ||
| _logger.LogError(ex, "Dispatch error for type {Type}", env.Type); | ||
| try | ||
| { | ||
| await SendAsync(new Envelope | ||
| { | ||
| Type = MessageTypeNames.SessionError, | ||
| SessionId = SessionId.Value, | ||
| Payload = new SessionErrorPayload | ||
| { | ||
| Code = ErrorCode.InternalError, | ||
| Message = "Internal error while processing request", | ||
| Retryable = true, | ||
| }, | ||
| }, cancellationToken).ConfigureAwait(false); | ||
| } | ||
| catch (Exception sendEx) | ||
| { | ||
| _logger.LogError(sendEx, "Failed to send INTERNAL_ERROR for type {Type}", env.Type); | ||
| } |
There was a problem hiding this comment.
Don't report cooperative cancellation as INTERNAL_ERROR.
DispatchAsync and the downstream SendAsync/CloseAsync calls all take cancellationToken, so a normal session shutdown can throw OperationCanceledException here. This catch will currently log it as a dispatch failure and try to emit a retryable session.error, which is the wrong wire outcome for an expected close path.
Suggested fix
+ catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
+ {
+ break;
+ }
catch (Exception ex)
{
// Spec §12: surface an unexpected failure as session.error{INTERNAL_ERROR} so the
// peer is not left waiting forever for an acknowledgement that never arrives.
_logger.LogError(ex, "Dispatch error for type {Type}", env.Type);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/Arcp.Runtime/SessionState.Dispatch.cs` around lines 40 - 62, The catch in
DispatchAsync currently treats all exceptions (including
OperationCanceledException) as INTERNAL_ERROR and attempts to SendAsync a
session.error; change it to detect cooperative cancellation and avoid emitting
an INTERNAL_ERROR: if the exception is OperationCanceledException or
cancellationToken.IsCancellationRequested (or Exception.InnerException is an
OCE), rethrow or return immediately instead of logging/sending the
session.error; otherwise proceed with the existing _logger.LogError and
SendAsync of the Envelope (SessionErrorPayload / ErrorCode.InternalError).
Ensure checks reference DispatchAsync, SendAsync, CloseAsync, Envelope,
SessionErrorPayload, SessionId and ErrorCode.InternalError so the altered catch
only handles unexpected failures.
| private static (ArcpServer server, MemoryTransport clientT) StartServer(Action<ArcpServer> configure, | ||
| IBearerVerifier? auth = null, IJobAuthorizationPolicy? policy = null, TimeProvider? time = null) | ||
| { | ||
| var opts = new ArcpServerOptions | ||
| { | ||
| Runtime = new RuntimeInfo { Name = "test-runtime", Version = "1.0.0" }, | ||
| Auth = auth, | ||
| AuthorizationPolicy = policy ?? new SamePrincipalPolicy(), | ||
| TimeProvider = time ?? TimeProvider.System, | ||
| }; | ||
| var server = new ArcpServer(opts); | ||
| configure(server); | ||
| var (client, srv) = MemoryTransport.Pair(); | ||
| _ = Task.Run(() => server.AcceptAsync(srv)); | ||
| return (server, client); | ||
| } |
There was a problem hiding this comment.
Make the test helper own ArcpServer disposal.
StartServer returns a live ArcpServer, but most callers discard it immediately. Since ArcpServer spins up background work on construction, these tests leave the sweeper/session infrastructure running past the end of the case, which can bleed state across the integration suite. Please wrap this in a disposable test harness or ensure every caller await usings the returned server.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/Arcp.IntegrationTests/AuditFixesTests.cs` around lines 24 - 39,
StartServer currently returns a live ArcpServer that launches background work
(via server.AcceptAsync) and is never disposed; change StartServer to return a
small disposable harness (e.g., TestServerHost implementing IAsyncDisposable)
that encapsulates the ArcpServer and the MemoryTransport client pair, starts
AcceptAsync internally, and on Dispose/DisposeAsync cancels/awaits the
AcceptAsync task and calls ArcpServer.DisposeAsync (or a Stop/Shutdown method)
to ensure background sweeper/session work is stopped; update callers to await
using the returned harness. Ensure references to StartServer, ArcpServer,
AcceptAsync, and MemoryTransport.Pair are used to locate and modify the code.
Correct XML comments, guides, OTel/conformance tables, and README for all open docs audit findings. Fix dotnet format whitespace in integration tests. Co-authored-by: Cursor <cursoragent@cursor.com>
* origin/dependabot/nuget/dotnet-dependencies-955cd15038: Bump the dotnet-dependencies group with 2 updates
* origin/dependabot/github_actions/codecov/codecov-action-7.0.0: chore(deps): bump codecov/codecov-action from 6.0.1 to 7.0.0
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment Thanks for integrating Codecov - We've got you covered ☂️ |
Summary
Fixes all open non-docs audit findings in the C# SDK — bugs, spec-conformance gaps, and security issues — grouped by theme below. Build is green and the full suite (
dotnet test ARCP.slnx) passes (138 tests).Correctness / bugs
job.submitreplay re-executes the job a second time (§7.2) #71 — idempotentjob.submitreplay no longer re-runs the agent (skip resolve/run on replay).job.submit/session.list_jobsnever reaches the awaiting caller — the client hangs #73 — server-rejectedjob.submit/session.list_jobsnow fault the awaiting client instead of hanging.event_seqassignment and outbound enqueue are atomic, so wire order is strictly monotonic under concurrent emitters.job.subscribehistory/live-fan-out boundary is exact (atomic register+snapshot + per-job event index): a mid-stream subscriber sees each event exactly once.session.error{INTERNAL_ERROR}(§12).event_seqgaps and raises a broken-session signal (§8.3).Spec-conformance
session.close/session.closedwire types; the runtime acks a graceful close withsession.closed(session.byekept as a deprecated alias) (§6.7).BUDGET_EXHAUSTED) before the pattern check (§9.6).tool.call/agent.delegate, with an explicitPermissiveUnleasedOperationsopt-in (§9.3).model.useis advertised independently of credential provisioning (§9.7).job.cancelledack, and cancelling an unknown job is silently dropped (§7.4, §12) #75 — add thejob.cancelledack andJOB_NOT_FOUNDfor unknown jobs (§7.4, §12).session.jobsentries never populatelast_event_seq(§6.6) #76 —session.jobsentries populatelast_event_seq(§6.6).Security
"/prefix/**"strips the boundary slash, authorizing sibling paths (§9.2, §9.3) #72 — lease glob"/prefix/**"keeps the path-boundary separator so sibling paths are not authorized (§9.2, §9.3).session.list_jobsfails closed: an empty/absent principal sees only what the policy permits, never the full cross-principal set (§6.6, §14).Performance / clean code
session.list_jobsuses a stable(created_at, job_id)keyset cursor; page materialization is bounded tolimit + 1and ties are stable.AgentRegistryno longer exposes its mutable version dictionary;ToInventory()snapshots each agent under its lock.Test plan
dotnet build ARCP.slnx— succeeds, 0 warnings/errors.dotnet test ARCP.slnx— 138 passed, 0 failed (unit, integration, conformance, aspnetcore).session.close/session.closed, INTERNAL_ERROR, fail-closed listing, model.use advertisement, keyset pagination stability, deny-by-default tool.call, monotonic event_seq, exactly-once subscribe boundary, client gap detection); plus the pre-existing audit tests for Idempotentjob.submitreplay re-executes the job a second time (§7.2) #71–session.jobsentries never populatelast_event_seq(§6.6) #76.Notes
CostBudget,Delegate,multi-agent-budget) were updated to request the appropriatetool.call/agent.delegateleases so they remain runnable under the new deny-by-default behavior (tool.call / agent.delegate are not gated when the lease omits the namespace (§9.3) #43).Summary by CodeRabbit
New Features
EventSeqGapDetectedevent andIsSessionBrokenpropertyImprovements