Skip to content

GraphQL-aware L7 inspection: operation-type and field-level policy rules #1022

@zredlined

Description

@zredlined

Problem Statement

OpenShell's L7 enforcement matches HTTP method, URL path, and query parameters. That's sufficient for REST APIs where the destructive intent is encoded in the URL (DELETE /repos/.../branches/main). It is not sufficient for GraphQL, JSON-RPC, SOAP, and similar body-encoded operation languages, where the destructive vs. read-only distinction lives in the request body.

Concrete motivating case (publicly reported, tomshardware.com link): an agent with a valid Railway bearer token issued

POST https://backboard.railway.app/graphql/v2
{"query":"mutation { volumeDelete(volumeId: \"...\") }"}

and deleted a production database. The same POST /graphql/v2 URL also serves legitimate read traffic (query { volume(...) }) — they cannot be distinguished without parsing the body. With current L7 rules an operator must choose between (a) blanket-denying POST /graphql/v2 (blocks legitimate writes too), or (b) allowing it and accepting the destructive-call risk.

This gap will surface for any GraphQL or JSON-RPC API where destructive operations are encoded in the request body — which is the common case across GraphQL-shaped systems.

Proposed Design

Proposed Design

Add Graphql as a peer variant to the existing L7Protocol enum (crates/openshell-sandbox/src/l7/mod.rs), alongside Rest and Sql. New module crates/openshell-sandbox/src/l7/graphql.rs implements:

  1. Body capture — buffer up to a bounded size (proposed 64 KiB default, configurable per-endpoint, number to be benchmarked before locking) of the POST body using existing framing logic (parse_body_length in l7/rest.rs). Bodies exceeding the bound fail closed.
  2. GraphQL parsing — minimal-cost parse of the JSON envelope ({ query, variables, operationName }) and the GraphQL document to extract: operation type (query / mutation / subscription), top-level operation name, and the set of root fields invoked.
  3. Rule matching — extend L7Allow and L7DenyRule (proto/sandbox.proto) with three new optional fields scoped to the GraphQL protocol:
    • operation_type (query / mutation / subscription / *)
    • operation_name (glob)
    • fields (set of field-name globs; rule matches if any selected field matches any element)
  4. Policy composition — same allow/deny ordering as the REST path: deny rules take precedence, allows are additive.

Example policy fragment, authored as a Provider Profile (Discussion #865):

- host: backboard.railway.app
  port: 443
  protocol: graphql
  rules:
    - allow:
        operation_type: query        # all reads
    - allow:
        operation_type: mutation
        fields: ["volumeCreate", "deploymentTrigger"]
  deny_rules:
    - operation_type: mutation
      fields: ["*Delete", "*Destroy", "volumeDelete", "projectDelete"]

The OPA Rego rules in crates/openshell-sandbox/data/sandbox-policy.rego extend by analogy with the REST path; no new evaluation engine.

Parser choice: apollo-parser

Phase 1 needs a Rust GraphQL document parser. Three candidates compared, scored on dependency footprint, maintenance health, error-recovery semantics, and AST fit for our use (operation type, operation name, top-level field set). Data from crates.io as of April 2026:

apollo-parser 0.8.5 async-graphql-parser 7.2.1 graphql-parser 0.4.1
Last stable release 2026-02-25 7.2.1 stable; 8.0.0-rc.5 dated 2026-04-21 2024-12-03
Versions published 41 209 8
Recent downloads (90d) ~93k ~4.2M ~4.4M
License MIT OR Apache-2.0 MIT OR Apache-2.0 MIT OR Apache-2.0
Production deps 3 (memchr, rowan, thiserror) 4 (async-graphql-value, pest, serde, serde_json) 2 (combine, thiserror)
Source size ~209 KB / ~8.1k LoC Rust server-framework parser internals ~36 KB / ~3.9k LoC Rust
AST shape CST (rowan, lossless) AST AST
Error recovery yes — "lexing and parsing does not fail or panic"; "always produces a CST" partial none
Maintainer apollographql org async-graphql server project graphql-rust org / individual

Recommendation: apollo-parser.

  • Smallest credible production-dep footprint for our use case. graphql-parser is technically lighter (two deps) but stagnant; the maintenance gap matters more than two crates on a security-critical path.
  • Maintained by the team that authors the GraphQL Federation spec — the strongest available signal that the parser will track GraphQL spec evolution.
  • Active and recent: 41 versions, last release Feb 2026. graphql-parser shipped 8 versions in 8 years and last released December 2024.
  • Error-recovery semantics fit the policy path. Producing a CST even on malformed input lets the deny path emit specific diagnostics ("malformed mutation field; rejected by deny rule X") rather than blanket "parse error → block". Useful for the policy advisor / agent inbox flow.

Why not the others:

  • async-graphql-parser is the parser internals of a server framework, not a standalone library. Its release cycle is bound to the server's (currently in 8.0.0-rc churn — five release candidates visible). Pulls pest parser-generator runtime as a production dep.
  • graphql-parser is small and has no parser-generator dep, but maintenance is thin (8 versions across 8 years, last release December 2024). combine-based AST output has no error recovery — malformed input fails wholesale, acceptable for fail-closed posture but loses the diagnostic benefit.

Selection assumes Phase 1 only needs the AST root: operation kind (query / mutation / subscription), operation name, and the set of top-level selected fields. All three crates expose this. If Phase 2+ needs execution-time information (variable resolution, fragment expansion across the full document graph), re-evaluate before that phase lands.

Implementation phases

  • Phase 1Graphql protocol variant, body capture with bounded buffering, operation-type matching only (query / mutation / subscription). Closes the Railway-class case.
  • Phase 2 — operation-name and root-field matching.
  • Phase 3 — generalize the "body inspector" trait so JSON-RPC, SOAP, and protobuf inspection can be added without new sandbox releases — they become Provider Profile contributions.

Related

Alternatives Considered

Alternatives Considered

  • Coarse POST deny on /graphql* — works today, blocks the Railway attack, but also blocks all legitimate writes. Operationally a non-starter for any team that needs GraphQL writes at all.
  • Rely on server-side enforcement only — correct in principle, and the primary control will always live on the side of the protected resource. Does not help when the upstream service ships destructive operations without confirmation gates, which the Railway incident demonstrates is common in practice.
  • Externalize to a request-inspecting sidecar — e.g., proxy through an Envoy filter or OPA-with-http.send. Adds another service on the deployment path, and the sandbox already has the body bytes in flight; in-process parsing is strictly cheaper.
  • Require GraphQL APIs to expose a typed REST surface — not actionable; many production systems are GraphQL-native by design.
  • Body-blind allow with audit-only logging — defense-in-depth value is near zero; agent destructive calls succeed, postmortem is the only artifact.

Agent Investigation

Codebase surveyed by agent prior to filing:

  • proto/sandbox.proto: L7Allow (method, path, query, command) and L7DenyRule (same shape) define the current rule surface. No body-related field exists. NetworkEndpoint.protocol is a free-form string with comment indicating "rest", "sql", or empty.
  • crates/openshell-sandbox/src/l7/mod.rs: L7Protocol enum is { Rest, Sql }. parse() accepts only those literals — adding Graphql is a minimal change.
  • crates/openshell-sandbox/src/l7/rest.rs: implements body framing (parse_body_length, Content-Length / chunked) for HTTP correctness. The bytes are already buffered; no body content matching exists. grep -rn "match_body|body_match|graphql|operation_type" crates/openshell-sandbox/src/l7/ returns zero hits.
  • crates/openshell-sandbox/data/sandbox-policy.rego: REST and SQL evaluation paths are siblings; adding a GraphQL evaluation path is structurally analogous.
  • Discussion Provider Enhancements -- Declarative Profiles, Auto-Injected Policy, Multi-Provider Inference #865 (Provider Profiles): the rule syntax above slots cleanly into the existing endpoints[].rules / endpoints[].deny_rules shape that profile YAML already supports.

No new dependency is required for Phase 1 beyond serde_json (already in tree) and apollo-parser (see Parser choice section above).

Checklist

  • I've reviewed existing issues and the architecture docs
  • This is a design proposal, not a "please build this" request

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:policyPolicy engine and policy lifecycle worktopic:l7Application-layer policy and inspection work

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions