Conversation
cb2dbd3 to
74e16d4
Compare
pdettori
left a comment
There was a problem hiding this comment.
15 commits, all signed-off. Rust, Python, Markdown, and DCO checks passing; container builds and license-headers check still pending at time of review.
The OIDC auth implementation is thorough — well-tested (10 e2e tests), good separation of authentication (oidc.rs) from authorization (authz.rs), proper anti-spoofing header stripping, and clean handling of all three auth modes (unauthenticated, sandbox-secret, and Bearer). The new --compute-driver-socket and --credentials-driver-socket flags are minimal and well-scoped.
No blocking issues. Three comments inline (one suggestion, two nits).
| issuer: "" | ||
| # Expected audience claim for the API resource server. | ||
| # This should match the server's --oidc-audience, NOT the CLI client ID. | ||
| audience: "openshell-cli" |
There was a problem hiding this comment.
Suggestion: The comment on line 119 says audience "should match the server's --oidc-audience, NOT the CLI client ID" but the default value here is openshell-cli. This is self-contradictory and will confuse operators configuring a non-Keycloak OIDC provider.
Either drop the "NOT the CLI client ID" clause (if openshell-cli is intentionally the right default for the test Keycloak realm), or change the default to "" and update the comment to clarify when to use the CLI client ID vs a dedicated resource-server audience.
| async fn connect_once(socket_path: &std::path::Path) -> Result<Channel> { | ||
| let socket_path = socket_path.to_path_buf(); | ||
| let display_path = socket_path.clone(); | ||
| Endpoint::from_static("http://[::]:50051") |
There was a problem hiding this comment.
Nit: The dummy TCP address "http://[::]:50051" is required by Endpoint::from_static but is silently ignored because the connect_with_connector closure overrides transport with the Unix socket. A one-line comment (// Dummy address required by tonic; actual transport is the UDS connector below.) would prevent future readers from wondering if this port is significant. Same applies to :50052 in credentials.rs.
| async fn connect_once(socket_path: &Path) -> Result<Channel> { | ||
| let socket_path = socket_path.to_path_buf(); | ||
| let display_path = socket_path.clone(); | ||
| Endpoint::from_static("http://[::]:50052") |
There was a problem hiding this comment.
Nit: connect_uds duplicates the 100×100ms retry loop from compute/external.rs::connect — same structure, same last_error accumulation. Fine for MVP scope. Worth a follow-up to extract a shared uds::connect_with_retry(path, attempts, interval) helper so the retry policy lives in one place.
Add OAuth2/OIDC authentication to the gateway server with role-based access control, CLI login flows, and full deployment plumbing. Server: JWT validation against configurable OIDC issuer (oidc.rs), JWKS key caching with TTL and rotation handling, method classification (unauthenticated/sandbox-secret/dual-auth/bearer), identity extraction with provider-agnostic Identity type, and RBAC enforcement via AuthzPolicy with configurable admin/user roles and auth-only mode. CLI: browser-based Authorization Code + PKCE flow, Client Credentials flow for CI/automation, token storage with refresh, gateway add/login/ logout commands, OIDC bearer token injection over mTLS transport, discovery endpoint for auto-configuration. Security: sandbox-secret scope restriction on UpdateConfig (policy sync only), anti-spoofing header stripping, dual-auth fallthrough from sandbox-secret to Bearer token. Deployment: OIDC config wired through DeployOptions, Docker env vars, Helm values/templates, HelmChart manifest, cluster-entrypoint.sh, and bootstrap scripts. Keycloak dev server script with pre-configured realm (test users, roles, PKCE client, CI client). Tested with Keycloak. The roles claim path and role names are configurable to support other OIDC providers. Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Add opt-in scope enforcement on top of existing OIDC role-based access control. When --oidc-scopes-claim is set, the server extracts scopes from the JWT and checks them per-method against an exhaustive scope map. Scopes: sandbox:read, sandbox:write, provider:read, provider:write, config:read, config:write, inference:read, inference:write, and openshell:all (wildcard). Methods not in the scope map require openshell:all. Scopes layer on top of roles and cannot escalate privilege. Auth-only mode (empty role names) still enforces scopes when enabled. Server: scopes_claim in OidcConfig, scope extraction from JWT (space-delimited and JSON array formats), standard OIDC scope filtering, scope check in AuthzPolicy after role check. CLI: --oidc-scopes on gateway add/start stored in metadata and consumed by gateway login, --oidc-scopes-claim on gateway start forwarded to server, scopes parameter in browser and client credentials OAuth2 flows with openid deduplication. Deployment: oidc_scopes_claim wired through DeployOptions, docker.rs, Helm, bootstrap scripts, and cluster entrypoint. Keycloak: realm config updated with built-in OIDC scopes and 9 OpenShell client scopes as optional on openshell-cli and openshell:all as default on openshell-ci. Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Add GetInferenceBundle to sandbox-secret methods so sandbox inference route refresh works under OIDC. Make GetSandboxConfig dual-auth so CLI users can read sandbox settings with Bearer tokens. Preserve OIDC gateway metadata on restart — a bare gateway start without --oidc-* flags no longer erases the stored OIDC registration. Document CI client ID requirement (openshell-ci vs openshell-cli) in the testing guide. Add security note about auth-only mode blast radius for GitHub Actions. Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Move OpenShell/GetSandboxConfig from sandbox-secret-only to dual-auth so CLI users can read sandbox settings with Bearer tokens while sandbox supervisors continue using the shared secret. Add sandbox secret interceptor to the inference bundle fetch path so GetInferenceBundle works under OIDC-enabled gateways. Extract shared interceptor constructor to avoid duplication. Add GetSandboxConfig to the config:read scope map so scope enforcement applies consistently when scopes are enabled. Refactor OIDC metadata preservation into apply_oidc_gateway_metadata() with explicit resume semantics — only preserve existing OIDC metadata on real resume paths, not on fresh deployments. Update architecture docs and testing guide to reflect the corrected method classifications and add new test coverage for interceptor injection, scope requirements, metadata preservation, and dual-auth classification. Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Replace hand-written PKCE generation, authorization URL construction, token exchange, client credentials, and token refresh with the oauth2 crate's typed API. Eliminates sha2, hex, and getrandom dependencies from the CLI. The custom urlencoded() helper and manual form POST logic are replaced by BasicClient methods with proper type-state safety. Discovery and the callback server remain custom since the oauth2 crate does not provide OIDC discovery or a localhost redirect listener. Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Group oidc.rs, authz.rs, identity.rs, and the auth HTTP endpoints under src/auth/ module directory. No behavioral changes. auth/mod.rs — module root, re-exports HTTP router auth/oidc.rs — JWT validation, JWKS caching, method classification auth/authz.rs — role and scope authorization policy auth/identity.rs — provider-agnostic Identity type auth/http.rs — /auth/connect and /auth/oidc-config endpoints Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
The oauth2 crate defaults to BasicAuth (HTTP Basic header) but Keycloak and most OIDC providers expect client_secret_post (credentials in the request body). Set AuthType::RequestBody explicitly to match the pre-refactor behavior. Also re-export Identity, IdentityProvider, and JwksCache from the auth module so ServerState's public API remains nameable by external consumers. Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Pass --oidc-scopes to gateway start so the metadata includes requested scopes after cluster bootstrap. Without this, users had to manually edit metadata.json to set scopes for gateway login. Usage: OPENSHELL_OIDC_SCOPES="openshell:all" mise run cluster Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Add 10 end-to-end tests covering OIDC authentication against a live K3s cluster with Keycloak: RBAC (5 tests): admin can create providers, user cannot, user can list sandboxes, unauthenticated requests rejected, health probe works without auth. Scopes (4 tests): sandbox-scoped token can list sandboxes but not providers, openshell:all grants full access, no-scopes token denied. Client credentials (1 test): CI token via client_credentials grant. Tests are opt-in via OPENSHELL_E2E_OIDC=1 and OPENSHELL_E2E_OIDC_SCOPES=1 env vars. They derive the Keycloak URL from gateway metadata to match the server's configured issuer. Run with: OPENSHELL_E2E_OIDC=1 OPENSHELL_E2E_OIDC_SCOPES=1 \ PYTHONPATH=python uv run pytest e2e/python/oidc/ -v Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Add blank lines before lists and fenced code blocks to satisfy markdownlint MD031 and MD032 rules. Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Add the CredentialsDriver gRPC contract (ResolveCredential, ListCredentials) as proto/credentials_driver.proto, mirroring the existing compute_driver.proto pattern. Wire it into the openshell-core build.rs protobuf compilation and expose the generated Rust module. Proto sourced from kagenti/openshell-credentials-keycloak; the gateway fork is the canonical owner per the MVP design (Section 3.3). Ref: kagenti/kagenti#1353 Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Add ComputeDriverKind::External that connects to a pre-existing Unix domain socket instead of spawning a driver subprocess. This enables out-of-process compute drivers running as sidecar containers. Changes: - Add External variant to ComputeDriverKind enum with parser/display - Add --compute-driver-socket CLI flag (+ OPENSHELL_COMPUTE_DRIVER_SOCKET env) - Add --credentials-driver-socket CLI flag (+ OPENSHELL_CREDENTIALS_DRIVER_SOCKET env) - Add compute_driver_socket and credentials_driver_socket to Config - Add compute::external module with UDS connection logic (retry loop) - Wire External arm in build_compute_runtime() reusing RemoteComputeDriver - Add tests for External variant parsing and config acceptance Ref: kagenti/kagenti#1353 Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
…iring Add CredentialsDriverHandle that connects to an out-of-process credentials driver over UDS. The handle provides resolve_credential() and list_credentials() methods wrapping the CredentialsDriver gRPC contract. When --credentials-driver-socket is set, the gateway connects at startup and stores the handle in ServerState for use by handlers. Ref: kagenti/kagenti#1353 Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
The Provider proto message uses ObjectMeta for metadata (including name) rather than a top-level name field. Update test constructions to match the current datamodel.proto schema. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
cbb79cf to
1c05cc8
Compare
Summary
Implements kagenti/kagenti#1353 — the gateway-side plumbing for the MVP gateway-per-tenant architecture.
--compute-driver-socket: NewExternalcompute driver variant that connects to a pre-existing Unix domain socket (for sidecar-based compute drivers)--credentials-driver-socket: Credentials driver client that delegates credential resolution to an out-of-process driver over gRPC/UDS, withcredentials_driver.protocontractNew CLI flags
--compute-driver-socket <path>OPENSHELL_COMPUTE_DRIVER_SOCKET--credentials-driver-socket <path>OPENSHELL_CREDENTIALS_DRIVER_SOCKETProto contract
proto/credentials_driver.protodefines theCredentialsDrivergRPC service:ResolveCredential(name)→{token, expires_at_ms, token_type}ListCredentials()→[{name, scopes}]This is the canonical location for the contract (gateway defines it, drivers implement it). See design discussion on the issue.
Architecture context
Part of the MVP implementation plan — gateway-per-tenant with sidecar compute and credentials drivers communicating over Unix domain sockets.
Test plan
cargo build,cargo test,cargo clippy,cargo fmt) — 1,269 tests, 0 failuresComputeDriverKind::Externalparses correctly —config::tests::compute_driver_kind_parses_external ... okconfigured_compute_driveraccepts External —tests::configured_compute_driver_accepts_external ... okconfigured_compute_driver+ runtime check inbuild_compute_runtime()OIDC e2e test details (Kind cluster validation)
Validated on local Kind cluster with Keycloak (
openshellrealm), gateway deployed with OIDC enabled:TestRbac::test_admin_can_create_providerTestRbac::test_user_cannot_create_providerTestRbac::test_user_can_list_sandboxesTestRbac::test_unauthenticated_request_rejectedTestRbac::test_health_does_not_require_authTestScopes::test_sandbox_scoped_token_can_list_sandboxesTestScopes::test_sandbox_scoped_token_cannot_list_providersTestScopes::test_openshell_all_grants_full_accessTestScopes::test_no_openshell_scopes_deniedTestClientCredentials::test_ci_token_can_list_sandboxesFix applied: Updated Provider construction in tests to use
ObjectMeta(name=...)wrapper matching currentdatamodel.protoschema (commitcbb79cf7).Closes kagenti/kagenti#1353
Assisted-By: Claude (Anthropic AI) noreply@anthropic.com