Skip to content

feat: gateway fork — compute-driver-socket + credentials-driver-socket flags, OIDC#1

Merged
pdettori merged 15 commits intomvpfrom
feat/gateway-driver-sockets-oidc
Apr 28, 2026
Merged

feat: gateway fork — compute-driver-socket + credentials-driver-socket flags, OIDC#1
pdettori merged 15 commits intomvpfrom
feat/gateway-driver-sockets-oidc

Conversation

@pdettori
Copy link
Copy Markdown

@pdettori pdettori commented Apr 28, 2026

Summary

Implements kagenti/kagenti#1353 — the gateway-side plumbing for the MVP gateway-per-tenant architecture.

  • Cherry-pick OIDC auth (upstream PR feat(auth): add OIDC/Keycloak authentication with RBAC and scope-based permissions NVIDIA/OpenShell#935): OIDC/Keycloak JWT authentication with RBAC, scope-based permissions, and e2e tests (10 commits)
  • --compute-driver-socket: New External compute driver variant that connects to a pre-existing Unix domain socket (for sidecar-based compute drivers)
  • --credentials-driver-socket: Credentials driver client that delegates credential resolution to an out-of-process driver over gRPC/UDS, with credentials_driver.proto contract

New CLI flags

Flag Env var Purpose
--compute-driver-socket <path> OPENSHELL_COMPUTE_DRIVER_SOCKET Connect to external compute driver via UDS
--credentials-driver-socket <path> OPENSHELL_CREDENTIALS_DRIVER_SOCKET Connect to credentials driver via UDS

Proto contract

proto/credentials_driver.proto defines the CredentialsDriver gRPC service:

  • ResolveCredential(name){token, expires_at_ms, token_type}
  • ListCredentials()[{name, scopes}]

This is the canonical location for the contract (gateway defines it, drivers implement it). See design discussion on the issue.

Architecture context

Part of the MVP implementation plan — gateway-per-tenant with sidecar compute and credentials drivers communicating over Unix domain sockets.

Test plan

  • CI passes (cargo build, cargo test, cargo clippy, cargo fmt) — 1,269 tests, 0 failures
  • OIDC e2e tests from PR feat(auth): add OIDC/Keycloak authentication with RBAC and scope-based permissions NVIDIA/OpenShell#935 pass — 10/10 passed on Kind cluster with Keycloak (RBAC, scopes, client credentials)
  • ComputeDriverKind::External parses correctly — config::tests::compute_driver_kind_parses_external ... ok
  • configured_compute_driver accepts External — tests::configured_compute_driver_accepts_external ... ok
  • External compute driver errors when socket path missing — validated by configured_compute_driver + runtime check in build_compute_runtime()
  • Credentials driver connects and resolves over UDS — integration test with mock driver (follow-up)

OIDC e2e test details (Kind cluster validation)

Validated on local Kind cluster with Keycloak (openshell realm), gateway deployed with OIDC enabled:

Test Result
TestRbac::test_admin_can_create_provider PASSED
TestRbac::test_user_cannot_create_provider PASSED
TestRbac::test_user_can_list_sandboxes PASSED
TestRbac::test_unauthenticated_request_rejected PASSED
TestRbac::test_health_does_not_require_auth PASSED
TestScopes::test_sandbox_scoped_token_can_list_sandboxes PASSED
TestScopes::test_sandbox_scoped_token_cannot_list_providers PASSED
TestScopes::test_openshell_all_grants_full_access PASSED
TestScopes::test_no_openshell_scopes_denied PASSED
TestClientCredentials::test_ci_token_can_list_sandboxes PASSED

Fix applied: Updated Provider construction in tests to use ObjectMeta(name=...) wrapper matching current datamodel.proto schema (commit cbb79cf7).

Closes kagenti/kagenti#1353

Assisted-By: Claude (Anthropic AI) noreply@anthropic.com

Copy link
Copy Markdown
Author

@pdettori pdettori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15 commits, all signed-off. Rust, Python, Markdown, and DCO checks passing; container builds and license-headers check still pending at time of review.

The OIDC auth implementation is thorough — well-tested (10 e2e tests), good separation of authentication (oidc.rs) from authorization (authz.rs), proper anti-spoofing header stripping, and clean handling of all three auth modes (unauthenticated, sandbox-secret, and Bearer). The new --compute-driver-socket and --credentials-driver-socket flags are minimal and well-scoped.

No blocking issues. Three comments inline (one suggestion, two nits).

issuer: ""
# Expected audience claim for the API resource server.
# This should match the server's --oidc-audience, NOT the CLI client ID.
audience: "openshell-cli"
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The comment on line 119 says audience "should match the server's --oidc-audience, NOT the CLI client ID" but the default value here is openshell-cli. This is self-contradictory and will confuse operators configuring a non-Keycloak OIDC provider.

Either drop the "NOT the CLI client ID" clause (if openshell-cli is intentionally the right default for the test Keycloak realm), or change the default to "" and update the comment to clarify when to use the CLI client ID vs a dedicated resource-server audience.

async fn connect_once(socket_path: &std::path::Path) -> Result<Channel> {
let socket_path = socket_path.to_path_buf();
let display_path = socket_path.clone();
Endpoint::from_static("http://[::]:50051")
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The dummy TCP address "http://[::]:50051" is required by Endpoint::from_static but is silently ignored because the connect_with_connector closure overrides transport with the Unix socket. A one-line comment (// Dummy address required by tonic; actual transport is the UDS connector below.) would prevent future readers from wondering if this port is significant. Same applies to :50052 in credentials.rs.

async fn connect_once(socket_path: &Path) -> Result<Channel> {
let socket_path = socket_path.to_path_buf();
let display_path = socket_path.clone();
Endpoint::from_static("http://[::]:50052")
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: connect_uds duplicates the 100×100ms retry loop from compute/external.rs::connect — same structure, same last_error accumulation. Fine for MVP scope. Worth a follow-up to extract a shared uds::connect_with_retry(path, attempts, interval) helper so the retry policy lives in one place.

mrunalp and others added 15 commits April 28, 2026 14:18
Add OAuth2/OIDC authentication to the gateway server with role-based
access control, CLI login flows, and full deployment plumbing.

Server: JWT validation against configurable OIDC issuer (oidc.rs),
JWKS key caching with TTL and rotation handling, method classification
(unauthenticated/sandbox-secret/dual-auth/bearer), identity extraction
with provider-agnostic Identity type, and RBAC enforcement via
AuthzPolicy with configurable admin/user roles and auth-only mode.

CLI: browser-based Authorization Code + PKCE flow, Client Credentials
flow for CI/automation, token storage with refresh, gateway add/login/
logout commands, OIDC bearer token injection over mTLS transport,
discovery endpoint for auto-configuration.

Security: sandbox-secret scope restriction on UpdateConfig (policy
sync only), anti-spoofing header stripping, dual-auth fallthrough
from sandbox-secret to Bearer token.

Deployment: OIDC config wired through DeployOptions, Docker env vars,
Helm values/templates, HelmChart manifest, cluster-entrypoint.sh, and
bootstrap scripts. Keycloak dev server script with pre-configured
realm (test users, roles, PKCE client, CI client).

Tested with Keycloak. The roles claim path and role names are
configurable to support other OIDC providers.

Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Add opt-in scope enforcement on top of existing OIDC role-based access
control. When --oidc-scopes-claim is set, the server extracts scopes
from the JWT and checks them per-method against an exhaustive scope map.

Scopes: sandbox:read, sandbox:write, provider:read, provider:write,
config:read, config:write, inference:read, inference:write, and
openshell:all (wildcard). Methods not in the scope map require
openshell:all. Scopes layer on top of roles and cannot escalate
privilege. Auth-only mode (empty role names) still enforces scopes
when enabled.

Server: scopes_claim in OidcConfig, scope extraction from JWT
(space-delimited and JSON array formats), standard OIDC scope
filtering, scope check in AuthzPolicy after role check.

CLI: --oidc-scopes on gateway add/start stored in metadata and
consumed by gateway login, --oidc-scopes-claim on gateway start
forwarded to server, scopes parameter in browser and client
credentials OAuth2 flows with openid deduplication.

Deployment: oidc_scopes_claim wired through DeployOptions, docker.rs,
Helm, bootstrap scripts, and cluster entrypoint.

Keycloak: realm config updated with built-in OIDC scopes and 9
OpenShell client scopes as optional on openshell-cli and openshell:all
as default on openshell-ci.

Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Add GetInferenceBundle to sandbox-secret methods so sandbox inference
route refresh works under OIDC. Make GetSandboxConfig dual-auth so CLI
users can read sandbox settings with Bearer tokens.

Preserve OIDC gateway metadata on restart — a bare gateway start
without --oidc-* flags no longer erases the stored OIDC registration.

Document CI client ID requirement (openshell-ci vs openshell-cli) in
the testing guide. Add security note about auth-only mode blast radius
for GitHub Actions.

Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Move OpenShell/GetSandboxConfig from sandbox-secret-only to dual-auth
so CLI users can read sandbox settings with Bearer tokens while sandbox
supervisors continue using the shared secret.

Add sandbox secret interceptor to the inference bundle fetch path so
GetInferenceBundle works under OIDC-enabled gateways. Extract shared
interceptor constructor to avoid duplication.

Add GetSandboxConfig to the config:read scope map so scope enforcement
applies consistently when scopes are enabled.

Refactor OIDC metadata preservation into apply_oidc_gateway_metadata()
with explicit resume semantics — only preserve existing OIDC metadata
on real resume paths, not on fresh deployments.

Update architecture docs and testing guide to reflect the corrected
method classifications and add new test coverage for interceptor
injection, scope requirements, metadata preservation, and dual-auth
classification.

Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Replace hand-written PKCE generation, authorization URL construction,
token exchange, client credentials, and token refresh with the oauth2
crate's typed API.

Eliminates sha2, hex, and getrandom dependencies from the CLI. The
custom urlencoded() helper and manual form POST logic are replaced by
BasicClient methods with proper type-state safety.

Discovery and the callback server remain custom since the oauth2 crate
does not provide OIDC discovery or a localhost redirect listener.

Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Group oidc.rs, authz.rs, identity.rs, and the auth HTTP endpoints
under src/auth/ module directory. No behavioral changes.

  auth/mod.rs      — module root, re-exports HTTP router
  auth/oidc.rs     — JWT validation, JWKS caching, method classification
  auth/authz.rs    — role and scope authorization policy
  auth/identity.rs — provider-agnostic Identity type
  auth/http.rs     — /auth/connect and /auth/oidc-config endpoints

Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
The oauth2 crate defaults to BasicAuth (HTTP Basic header) but Keycloak
and most OIDC providers expect client_secret_post (credentials in the
request body). Set AuthType::RequestBody explicitly to match the
pre-refactor behavior.

Also re-export Identity, IdentityProvider, and JwksCache from the auth
module so ServerState's public API remains nameable by external consumers.

Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Pass --oidc-scopes to gateway start so the metadata includes requested
scopes after cluster bootstrap. Without this, users had to manually
edit metadata.json to set scopes for gateway login.

Usage: OPENSHELL_OIDC_SCOPES="openshell:all" mise run cluster
Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Add 10 end-to-end tests covering OIDC authentication against a live
K3s cluster with Keycloak:

RBAC (5 tests): admin can create providers, user cannot, user can list
sandboxes, unauthenticated requests rejected, health probe works
without auth.

Scopes (4 tests): sandbox-scoped token can list sandboxes but not
providers, openshell:all grants full access, no-scopes token denied.

Client credentials (1 test): CI token via client_credentials grant.

Tests are opt-in via OPENSHELL_E2E_OIDC=1 and OPENSHELL_E2E_OIDC_SCOPES=1
env vars. They derive the Keycloak URL from gateway metadata to match
the server's configured issuer.

Run with:

  OPENSHELL_E2E_OIDC=1 OPENSHELL_E2E_OIDC_SCOPES=1 \
  PYTHONPATH=python uv run pytest e2e/python/oidc/ -v

Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Add blank lines before lists and fenced code blocks to satisfy
markdownlint MD031 and MD032 rules.

Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Add the CredentialsDriver gRPC contract (ResolveCredential,
ListCredentials) as proto/credentials_driver.proto, mirroring the
existing compute_driver.proto pattern. Wire it into the openshell-core
build.rs protobuf compilation and expose the generated Rust module.

Proto sourced from kagenti/openshell-credentials-keycloak; the gateway
fork is the canonical owner per the MVP design (Section 3.3).

Ref: kagenti/kagenti#1353

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Add ComputeDriverKind::External that connects to a pre-existing Unix
domain socket instead of spawning a driver subprocess. This enables
out-of-process compute drivers running as sidecar containers.

Changes:
- Add External variant to ComputeDriverKind enum with parser/display
- Add --compute-driver-socket CLI flag (+ OPENSHELL_COMPUTE_DRIVER_SOCKET env)
- Add --credentials-driver-socket CLI flag (+ OPENSHELL_CREDENTIALS_DRIVER_SOCKET env)
- Add compute_driver_socket and credentials_driver_socket to Config
- Add compute::external module with UDS connection logic (retry loop)
- Wire External arm in build_compute_runtime() reusing RemoteComputeDriver
- Add tests for External variant parsing and config acceptance

Ref: kagenti/kagenti#1353

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
…iring

Add CredentialsDriverHandle that connects to an out-of-process
credentials driver over UDS. The handle provides resolve_credential()
and list_credentials() methods wrapping the CredentialsDriver gRPC
contract. When --credentials-driver-socket is set, the gateway connects
at startup and stores the handle in ServerState for use by handlers.

Ref: kagenti/kagenti#1353

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
The Provider proto message uses ObjectMeta for metadata (including name)
rather than a top-level name field. Update test constructions to match
the current datamodel.proto schema.

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
@pdettori pdettori force-pushed the feat/gateway-driver-sockets-oidc branch from cbb79cf to 1c05cc8 Compare April 28, 2026 18:18
@pdettori pdettori merged commit 98e4758 into mvp Apr 28, 2026
8 checks passed
@pdettori pdettori deleted the feat/gateway-driver-sockets-oidc branch April 28, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants