feat(installer): post-enroll tunnel verification + structured errors#1796
Closed
irvingouj@Devolutions (irvingoujAtDevolution) wants to merge 11 commits into
Closed
Conversation
Adds an optional Agent Tunnel wizard step to the Devolutions Agent installer so admins can enroll the agent in a Gateway QUIC tunnel as part of MSI install (UI or unattended). Surfaces three MSI public properties for unattended installs: - AGENT_TUNNEL_ENROLLMENT_STRING (dgw-enroll:v1:<base64> from DVLS/Hub/Gateway) - AGENT_TUNNEL_ADVERTISE_SUBNETS (CSV CIDR; empty = none) - AGENT_TUNNEL_ADVERTISE_DOMAINS (CSV DNS suffixes; empty = auto-detect only) Wires a new deferred elevated custom action (EnrollAgentTunnel) that runs Before StartServices when AGENT_TUNNEL_FEATURE is being installed. It base64-decodes the enrollment payload, shells out to `devolutions-agent.exe enroll <url> <token> <name> [subnets]` with a 60s timeout, and redacts the token in the session log. Advertise domains are persisted by patching `Tunnel.AdvertiseDomains` in agent.json post-enrollment, matching the agreed direction that domain config lives in the file rather than as a CLI flag. The Tunnel feature itself is opt-in (isEnabled:false, allowChange:true); the dialog is skipped when the feature isn't selected. An empty enrollment string also skips tunnel setup, allowing the installer to be used without touching the tunnel.
WixSharp's runtime dialog loader threw at AgentTunnelDialog init (MSI 1603) because the tableLayoutPanel had RowCount=8 but the new gateway URL controls were placed at rows 8/9/10.
…add Agent name field - Propagate AGENT_TUNNEL_* properties to deferred CA via Secure MSI Property declarations + explicit UsesProperties string. The deferred CA was previously seeing empty values because the wizard-set properties never crossed the UAC boundary. - Treat empty enrollment string as install failure (was silent skip). EnrollAgentTunnel CA now returns ActionResult.Failure and surfaces session.Message(InstallMessage.Error, ...) on the empty case and on enrollment timeout, non-zero exit, and exception paths. - Add optional Agent name field to AgentTunnelDialog. Resolution order at install time: dialog value > JWT jet_agent_name claim > computer name. Avoids "missing required --name" failures when the JWT lacks the claim. - Update Wizard.ShouldSkip-gated dialog so blank enrollment is blocked at UI validation (previously the dialog let users click Next on empty).
Captures the root cause behind silent enrollment-success-but-no-tunnel failures we hit during integration testing, the constraints we've confirmed with the team, and the proposed redesign: - Decouple Gateway's cryptographic identity (server cert SAN) from its network reachability (the host agents dial). Replace single conf.hostname with AgentTunnel.AdvertisedNames (multi-SAN, label-able). - Agent derives its QUIC endpoint from the host it enrolled through (jet_gw_url) + a quic_port returned by the gateway, instead of accepting whatever hostname the gateway dictates. - Gateway validates enrollment URL host against AdvertisedNames upfront, with a structured 400 response carrying error/message/help. - New agent.exe verify-tunnel subcommand wired into the MSI CA so install success means the tunnel is actually up, not just that a cert was written. Errors expose a structured kind/detail/next_step triple. - DVLS enrollment-string UI becomes a dropdown over AdvertisedNames (refreshed from gateway diagnostics) instead of a free-text URL box. Includes a 9-entry error catalog with operator-facing next-step text, non-goals (single-use enforcement, gateway farms — deferred), migration path, and a 5-PR implementation plan. Includes Codex's review.
Adds `agent.exe verify-tunnel --timeout <secs>` which performs one QUIC
handshake plus one RouteAdvertise + Heartbeat/HeartbeatAck round-trip and
exits 0 on success or 1 on any classified failure. The last line of stderr
is a single-line JSON triple `{kind, detail, next_step}` consumed by the
installer custom action to surface actionable error dialogs.
Implements the 9+1-kind error catalog from the design doc (section 6):
enrollment_host_not_advertised, dns_resolution_failed, udp_unreachable,
tls_san_mismatch, tls_spki_pin_mismatch, quic_handshake_timeout,
route_advertise_timeout, enrollment_token_expired,
enrollment_token_signature_invalid, and the unexpected_error catch-all
which always carries a correlation_id and log path.
On Windows the triple is also written to the Event Log under the
DevolutionsAgent source with kind/detail/next_step as named properties
so monitoring tools can parse failures without scraping text.
…y-url override
After `agent.exe up` succeeds, the agent-tunnel installation now invokes
`agent.exe verify-tunnel --timeout 10` and only reports success if a real
QUIC handshake + RouteAdvertise/Heartbeat round-trip completes. The CA
parses the structured JSON triple from agent stderr and surfaces
`{kind, detail, next_step}` via `session.Message(InstallMessage.Error, ...)`
so installer failure dialogs contain an actionable next step instead of
"setup failed".
The 10s timeout is hardcoded by design (no MSI property, no escape
hatch); a few extra seconds of wall-clock budget guard against a
misbehaving process. MSI rollbacks engage on any non-zero exit.
Also drops the Gateway URL override field from the AgentTunnel dialog
(textbox, label, hint, RowCount/RowStyles, property declaration,
deferred-CA UsesProperties wiring, localization strings in en-us and
fr-fr). With the identity refactor the enrollment JWT is the single
source of truth for the agent-facing URL — overriding it server-side
would defeat the host validation against AdvertisedNames on the gateway.
add8746 to
c0a9a8c
Compare
63b8c09 to
9322b1b
Compare
Contributor
Author
|
Closing — not authorized; will be reopened after explicit owner approval. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Third of three PRs implementing the agent-tunnel identity refactor
described in
AGENT_TUNNEL_IDENTITY_DESIGN.md. This PR is agent +installer.
Before this PR the installer reported success as soon as
agent.exe upwrote a cert to disk, even if the tunnel was unreachable (wrong host
returned, UDP blocked, SAN mismatch, etc.). After this PR the installer
only reports success when a real QUIC handshake plus one route-advertise
round-trip completes — turning enrollment from "certs were written" into
"the tunnel is reachable and usable".
Scope (per spec PR 3):
agent.exe verify-tunnel --timeout <secs>subcommand. Performs oneQUIC handshake + one
RouteAdvertise+Heartbeat/HeartbeatAckround-trip and exits 0 on success, 1 on any classified failure.
unexpected_errorcatch-all that always carries correlation_id + logpath). Emits a single-line JSON triple
{kind, detail, next_step}onstderr as the last line before exit.
DevolutionsAgent, named properties)so monitoring tools can parse failures structurally.
CA.VerifyAgentTunnelMSI custom action: invokesagent.exe verify-tunnel --timeout 10afterEnrollAgentTunnelsucceeds, parses the JSON triple, surfaces
kind/detail/next_stepthrough
session.Message(InstallMessage.Error, ...). MSI rollbackson any non-zero exit. 10s budget hardcoded; no MSI property, no
skip-verify escape hatch.
(textbox, label, hint, RowCount, localization strings, property
declaration, deferred-CA wiring). With this design the JWT is the
single source of truth for the agent-facing URL; overriding it
server-side would defeat the host validation introduced in PR 1.
Dependency
This PR depends on #1795 (agent derives endpoint from JWT host), which
itself depends on #1794 (gateway returns
quic_port).Spec
See
AGENT_TUNNEL_IDENTITY_DESIGN.md(PR 3 section + section 6 errorcatalog + section 7 Windows Event Log surface).
Test plan
cargo test -p devolutions-agentpasses (45 lib + 7 bin tests)dotnet build package/AgentWindowsManagedsucceedsdialog shows the
next_steptext and MSI rolls backsuccess, agent shows online in DVLS within 30 seconds