GraphQL-aware L7 inspection: operation-type and field-level policy rules

## Problem Statement

OpenShell's L7 enforcement matches HTTP method, URL path, and query parameters. That's sufficient for REST APIs where the destructive intent is encoded in the URL (`DELETE /repos/.../branches/main`). It is **not** sufficient for GraphQL, JSON-RPC, SOAP, and similar body-encoded operation languages, where the destructive vs. read-only distinction lives in the request body.

Concrete motivating case (publicly reported, [tomshardware.com link](https://www.tomshardware.com/tech-industry/artificial-intelligence/claude-powered-ai-coding-agent-deletes-entire-company-database-in-9-seconds-backups-zapped-after-cursor-tool-powered-by-anthropics-claude-goes-rogue)): an agent with a valid Railway bearer token issued

```
POST https://backboard.railway.app/graphql/v2
{"query":"mutation { volumeDelete(volumeId: \"...\") }"}
```

and deleted a production database. The same `POST /graphql/v2` URL also serves legitimate read traffic (`query { volume(...) }`) — they cannot be distinguished without parsing the body. With current L7 rules an operator must choose between (a) blanket-denying `POST /graphql/v2` (blocks legitimate writes too), or (b) allowing it and accepting the destructive-call risk.

This gap will surface for any GraphQL or JSON-RPC API where destructive operations are encoded in the request body — which is the common case across GraphQL-shaped systems.



### Proposed Design

## Proposed Design

Add `Graphql` as a peer variant to the existing `L7Protocol` enum (`crates/openshell-sandbox/src/l7/mod.rs`), alongside `Rest` and `Sql`. New module `crates/openshell-sandbox/src/l7/graphql.rs` implements:

1. **Body capture** — buffer up to a bounded size (proposed 64 KiB default, configurable per-endpoint, *number to be benchmarked before locking*) of the POST body using existing framing logic (`parse_body_length` in `l7/rest.rs`). Bodies exceeding the bound fail closed.
2. **GraphQL parsing** — minimal-cost parse of the JSON envelope (`{ query, variables, operationName }`) and the GraphQL document to extract: operation type (`query` / `mutation` / `subscription`), top-level operation name, and the set of root fields invoked.
3. **Rule matching** — extend `L7Allow` and `L7DenyRule` (`proto/sandbox.proto`) with three new optional fields scoped to the GraphQL protocol:
   - `operation_type` (`query` / `mutation` / `subscription` / `*`)
   - `operation_name` (glob)
   - `fields` (set of field-name globs; rule matches if any selected field matches any element)
4. **Policy composition** — same allow/deny ordering as the REST path: deny rules take precedence, allows are additive.

Example policy fragment, authored as a Provider Profile (Discussion #865):

```yaml
- host: backboard.railway.app
  port: 443
  protocol: graphql
  rules:
    - allow:
        operation_type: query        # all reads
    - allow:
        operation_type: mutation
        fields: ["volumeCreate", "deploymentTrigger"]
  deny_rules:
    - operation_type: mutation
      fields: ["*Delete", "*Destroy", "volumeDelete", "projectDelete"]
```

The OPA Rego rules in `crates/openshell-sandbox/data/sandbox-policy.rego` extend by analogy with the REST path; no new evaluation engine.

### Parser choice: `apollo-parser`

Phase 1 needs a Rust GraphQL document parser. Three candidates compared, scored on dependency footprint, maintenance health, error-recovery semantics, and AST fit for our use (operation type, operation name, top-level field set). Data from crates.io as of April 2026:

| | `apollo-parser` 0.8.5 | `async-graphql-parser` 7.2.1 | `graphql-parser` 0.4.1 |
|---|---|---|---|
| Last stable release | 2026-02-25 | 7.2.1 stable; 8.0.0-rc.5 dated 2026-04-21 | 2024-12-03 |
| Versions published | 41 | 209 | 8 |
| Recent downloads (90d) | ~93k | ~4.2M | ~4.4M |
| License | MIT OR Apache-2.0 | MIT OR Apache-2.0 | MIT OR Apache-2.0 |
| Production deps | 3 (`memchr`, `rowan`, `thiserror`) | 4 (`async-graphql-value`, `pest`, `serde`, `serde_json`) | 2 (`combine`, `thiserror`) |
| Source size | ~209 KB / ~8.1k LoC Rust | server-framework parser internals | ~36 KB / ~3.9k LoC Rust |
| AST shape | CST (`rowan`, lossless) | AST | AST |
| Error recovery | yes — "lexing and parsing does not fail or `panic`"; "always produces a CST" | partial | none |
| Maintainer | apollographql org | async-graphql server project | graphql-rust org / individual |

**Recommendation: `apollo-parser`.**

- Smallest credible production-dep footprint for our use case. `graphql-parser` is technically lighter (two deps) but stagnant; the maintenance gap matters more than two crates on a security-critical path.
- Maintained by the team that authors the GraphQL Federation spec — the strongest available signal that the parser will track GraphQL spec evolution.
- Active and recent: 41 versions, last release Feb 2026. `graphql-parser` shipped 8 versions in 8 years and last released December 2024.
- Error-recovery semantics fit the policy path. Producing a CST even on malformed input lets the deny path emit specific diagnostics ("malformed mutation field; rejected by deny rule X") rather than blanket "parse error → block". Useful for the policy advisor / agent inbox flow.

Why not the others:

- `async-graphql-parser` is the parser internals of a server framework, not a standalone library. Its release cycle is bound to the server's (currently in 8.0.0-rc churn — five release candidates visible). Pulls `pest` parser-generator runtime as a production dep.
- `graphql-parser` is small and has no parser-generator dep, but maintenance is thin (8 versions across 8 years, last release December 2024). `combine`-based AST output has no error recovery — malformed input fails wholesale, acceptable for fail-closed posture but loses the diagnostic benefit.

Selection assumes Phase 1 only needs the AST root: operation kind (`query` / `mutation` / `subscription`), operation name, and the set of top-level selected fields. All three crates expose this. If Phase 2+ needs execution-time information (variable resolution, fragment expansion across the full document graph), re-evaluate before that phase lands.

### Implementation phases

- **Phase 1** — `Graphql` protocol variant, body capture with bounded buffering, operation-type matching only (`query` / `mutation` / `subscription`). Closes the Railway-class case.
- **Phase 2** — operation-name and root-field matching.
- **Phase 3** — generalize the "body inspector" trait so JSON-RPC, SOAP, and protobuf inspection can be added without new sandbox releases — they become Provider Profile contributions.


## Related

- Discussion #865 — Provider Enhancements (the rule-authoring surface)
- Issue #896 — tracking issue for #865
- Issue #822 — deny rules in network policy schema
- RFC 0001 — Agent-Driven Policy Management (sets up the inbox where mutation-deny rules would be reviewed and approved)


### Alternatives Considered

## Alternatives Considered

- **Coarse `POST` deny on `/graphql*`** — works today, blocks the Railway attack, but also blocks all legitimate writes. Operationally a non-starter for any team that needs GraphQL writes at all.
- **Rely on server-side enforcement only** — correct in principle, and the primary control will always live on the side of the protected resource. Does not help when the upstream service ships destructive operations without confirmation gates, which the Railway incident demonstrates is common in practice.
- **Externalize to a request-inspecting sidecar** — e.g., proxy through an Envoy filter or OPA-with-`http.send`. Adds another service on the deployment path, and the sandbox already has the body bytes in flight; in-process parsing is strictly cheaper.
- **Require GraphQL APIs to expose a typed REST surface** — not actionable; many production systems are GraphQL-native by design.
- **Body-blind allow with audit-only logging** — defense-in-depth value is near zero; agent destructive calls succeed, postmortem is the only artifact.

### Agent Investigation

Codebase surveyed by agent prior to filing:

- `proto/sandbox.proto`: `L7Allow` (`method`, `path`, `query`, `command`) and `L7DenyRule` (same shape) define the current rule surface. No body-related field exists. `NetworkEndpoint.protocol` is a free-form string with comment indicating `"rest"`, `"sql"`, or empty.
- `crates/openshell-sandbox/src/l7/mod.rs`: `L7Protocol` enum is `{ Rest, Sql }`. `parse()` accepts only those literals — adding `Graphql` is a minimal change.
- `crates/openshell-sandbox/src/l7/rest.rs`: implements body framing (`parse_body_length`, Content-Length / chunked) for HTTP correctness. The bytes are already buffered; no body content matching exists. `grep -rn "match_body|body_match|graphql|operation_type" crates/openshell-sandbox/src/l7/` returns zero hits.
- `crates/openshell-sandbox/data/sandbox-policy.rego`: REST and SQL evaluation paths are siblings; adding a GraphQL evaluation path is structurally analogous.
- Discussion #865 (Provider Profiles): the rule syntax above slots cleanly into the existing `endpoints[].rules` / `endpoints[].deny_rules` shape that profile YAML already supports.

No new dependency is required for Phase 1 beyond `serde_json` (already in tree) and `apollo-parser` (see Parser choice section above).

### Checklist

- [x] I've reviewed existing issues and the architecture docs
- [x] This is a design proposal, not a "please build this" request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GraphQL-aware L7 inspection: operation-type and field-level policy rules #1022

Problem Statement

Proposed Design

Proposed Design

Parser choice: `apollo-parser`

Implementation phases

Related

Alternatives Considered

Alternatives Considered

Agent Investigation

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	`apollo-parser` 0.8.5	`async-graphql-parser` 7.2.1	`graphql-parser` 0.4.1
Last stable release	2026-02-25	7.2.1 stable; 8.0.0-rc.5 dated 2026-04-21	2024-12-03
Versions published	41	209	8
Recent downloads (90d)	~93k	~4.2M	~4.4M
License	MIT OR Apache-2.0	MIT OR Apache-2.0	MIT OR Apache-2.0
Production deps	3 (`memchr`, `rowan`, `thiserror`)	4 (`async-graphql-value`, `pest`, `serde`, `serde_json`)	2 (`combine`, `thiserror`)
Source size	~209 KB / ~8.1k LoC Rust	server-framework parser internals	~36 KB / ~3.9k LoC Rust
AST shape	CST (`rowan`, lossless)	AST	AST
Error recovery	yes — "lexing and parsing does not fail or `panic`"; "always produces a CST"	partial	none
Maintainer	apollographql org	async-graphql server project	graphql-rust org / individual

GraphQL-aware L7 inspection: operation-type and field-level policy rules #1022

Description

Problem Statement

Proposed Design

Proposed Design

Parser choice: apollo-parser

Implementation phases

Related

Alternatives Considered

Alternatives Considered

Agent Investigation

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Parser choice: `apollo-parser`