feat(sbp2): add session registry and command flow#19
Conversation
2823267 to
9113139
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 91131392ab
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (lock_ != nullptr) { | ||
| IOLockFree(lock_); | ||
| lock_ = nullptr; |
There was a problem hiding this comment.
Unlock the registry lock before freeing it
When the registry is destroyed, this frees lock_ while the IOLockGuard created at the top of the destructor is still alive. As the function returns, the guard destructor will call IOLockUnlock on the already-freed lock pointer it captured, which can crash or corrupt memory during driver shutdown/unload. End the guard scope before calling IOLockFree or free the lock after it has been unlocked.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in the current branch head (166b8a7). The registry destructor now ends the lock guard scope before freeing lock_, so the guard cannot unlock a freed IOLock during shutdown.
| orb->SetFetchAgentWriteRetries(testFetchAgentWriteRetries_); | ||
| orb->SetAppended(true); | ||
| outstandingORBs_[MakeORBKey(orb->GetORBAddress())] = orb; |
There was a problem hiding this comment.
After a command ORB is prepared and inserted into outstandingORBs_, no code starts the SBP2CommandORB timer. In the case where the fetch-agent write succeeds but the device never writes a status block, the registry leaves commandInFlight set and callers polling GetCommandResult wait forever despite SubmitCommand having configured a timeout on the ORB. Start the ORB timer when the ORB is actually submitted to the fetch agent/chain, and cancel it on normal status completion as this code already does.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in the current branch head (166b8a7). Submitted command ORBs now start their timeout once the fetch-agent write succeeds and the target can fetch the ORB; normal completion and failure paths still cancel the timer.
| CleanupCommandResources(*rec); | ||
| }); | ||
|
|
||
| if (!record->session->SubmitORB(orb.get())) { |
There was a problem hiding this comment.
Avoid submit callbacks while holding the registry lock
This calls SubmitORB while still holding the registry IOLock, but SubmitORB can synchronously invoke the ORB completion callback on immediate submit failure (for example when WriteBlock returns an empty handle via AppendORBImmediate → FailSubmittedORB). That callback immediately tries to take the same lock_, so a transient transport submission failure deadlocks the user-client request path instead of returning an error. Move the submission outside the locked section or ensure synchronous submit failures do not call back under this lock.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in the current branch head (166b8a7). SubmitCommand now prepares and records the command under the registry lock, releases that lock before calling SubmitORB, and cleans up safely if submission fails. This avoids synchronous submit-failure callbacks re-entering the registry lock.
eff2132 to
76a8f1d
Compare
76a8f1d to
166b8a7
Compare
|
The failing check on this PR is from the C++ coverage processing step, not from the SBP-2 code or tests. The run completed all 480 tests successfully, then failed while merging LLVM profile data. I split the CI fix into #20. That PR isolates LLVM profile outputs for both test discovery and test execution, and its Build and Test check passes. Once #20 lands, this PR should only need a rerun/rebase against the updated workflow. |
|
Thanks @gly11 — this looks like a great starting point for the SBP-2 implementation. I’m going to merge this as a foundation. The next step will be to adapt and split the SBP-2 pieces so they align better with the newer protocol/device architecture we now have on the DICE branch. That follow-up restructuring can happen separately; this PR gives us a useful base to build from. |
|
I have a few follow-up draft branches from the old stack, but I do not want to open follow-up PRs against the wrong base or architecture. Should small independent fixes still target main, or would you prefer follow-up work to target DICE while the newer protocol/device architecture is being developed there? I can re-split the queue so SBP-2-related work waits for the DICE-aligned structure, while only truly independent app/debug fixes go to main if that is preferred. |
|
Preferably wait a bit. OR f you have some tokens to burn and time to debug/test — read below :). I have some ideas how to re-organize different protocols: main idea could be seen in f19d2d2. Bierfly — decouple audio leakage from discovery and clear separation how protocols should be loaded. In the same time — i'm finishing full Bus/IRM manager implementation. So the goal is not to grow the driver for every single device, but make it hardware agnostic where possible — follow the specs first, quirks later. So for SBP-2 i see it it like that (it's a draft, just the core idea): Ping me on Discord - we could chat about it more there! |
That makes sense. I’ll test the DICE branch with my Nikon hardware next. If I find any issues, I’ll report them in an issue or open a focused PR with a fix. I’ll also keep an eye on Discord for follow-up discussion. |
Summary
This PR builds on the foundation changes merged in #18 and splits out the SBP-2 session and command core from the larger SBP-2 bring-up branch. The branch has been rebased onto current
main, so the visible diff is now limited to the SBP-2 session layer.SBP-2 Session Core
ORB and Addressing
Tests
Why this split
This PR intentionally excludes local debug UI, diagnostic handlers, install-helper changes, diagnostic scripts, and documentation experiments. Those can be reviewed separately or kept local. The async discovery and bus-reset foundations landed in #18; this layer adds SBP-2 session and command behavior on top of that.
Verification
After #18 merged, this branch was rebased onto current
upstream/main;git diff --check upstream/main..pr/sbp2-session-corepasses.