A technical deep-dive of the OSSRedact local privacy gateway, written for a reader who wants to understand or audit the design.
OSSRedact is a local privacy gateway: an HTTP proxy that sits in front of cloud LLM APIs. On
egress it redacts PII and secrets in the request's free-text fields to stable placeholders; on
the response it rehydrates those placeholders back to the real values. The local client sees
real data; the cloud model sees only placeholders in the request fields the gateway scans.
(Cryptographically-bound reasoning material -- extended-thinking blocks and OpenAI encrypted_content --
is passed through opaque and not re-scanned; it is model-generated from already-redacted input, so it
carries only placeholders. See §1.)
The wire formats supported today are Anthropic /v1/messages, OpenAI-compatible
/v1/chat/completions (routing Codex, Hermes, Pi, omp, opencode, and other OpenAI-compatible clients via openai_adapter.py),
and OpenAI /v1/responses (the API the current Codex CLI speaks, via responses_adapter.py) -- all through the same
redact/rehydrate contract. The egress-proxy code now lives in this repo under appliance/; the GPU NER
gate service it calls is version-controlled under gate/ and deployed on the GPU host (drift-guarded by deploy/check-gate-drift.sh). Tool-specific wiring is documented in docs/ADAPTERS.md. Point any tool at the gateway with:
ANTHROPIC_BASE_URL=http://127.0.0.1:8011
It works under a Claude Max subscription: billing stays on Max, no API key is needed, and the auth header is forwarded verbatim.
The detection model runs locally on-device (deployed always-on tier: xlm-roberta-base as dynamic-INT8 ONNX on CPU via onnxruntime; the Intel NPU / OpenVINO FP16 tier is preserved as a drop-in alternate). There is no cloud detection call, which is what makes this true data sovereignty rather than another cloud DLP hop.
Going fully local for data sovereignty is too expensive (256GB+ VRAM to run a SOTA model at home). OSSRedact takes the other route: filter private data out, use cloud SOTA, redact on egress and rehydrate transparently. Two users motivate the design:
- The hobbyist who wants data sovereignty but cannot afford GPUs.
- The employee who unknowingly leaks client PII through configured CLI/API-endpoint clients today. Browser and desktop-app interception are roadmap items.
The redaction-proxy concept already exists. og-local/OutGate (BSL license) and rehydra-sdk (MIT) both proxy these wire formats with round-trip streaming rehydration. OSSRedact does not claim to be first or only. Its distinct contribution is:
- A trained French-Quebec + English PII NER model (competitors use generic Presidio/regex).
- Running that model locally on-device (CPU INT8 always-on; NPU/OpenVINO available as an alternate tier): no cloud detection call, true data sovereignty.
- An always-on deterministic secrets + structured-PII floor.
- Quebec Law 25 framing.
host (loopback by default, on-device)
local tool +-------------------------------------------------+
(Claude Code, Codex, | |
Hermes, Pi, opencode) | :8011 egress proxy |
| | - extract / gate / merge / rehydrate |
| ANTHROPIC_BASE_URL | - holds session+project entity map |
| = http://127.0.0.1:8011 | |
+---------------------+----------------+ |
| v |
| :8001 gate + NER engine |
| - Tier-0 regex+Luhn / secrets+entropy |
| - on-device NER (CPU INT8 ONNX; |
| NPU/OpenVINO alt) |
| | |
+----------------|--------------------------------+
v
on-device CPU INT8 (NPU/OpenVINO alt)
|
cloud LLM API <----- auth header verbatim, placeholders only -----+
(api.anthropic.com /v1/messages)
:8011egress proxy is the front door the local tool points at. It owns the request pipeline, the entity map, and stream rehydration.:8001gate + NER engine owns detection: the deterministic Tier-0 / secrets layer and the on-device NER pass.- Both bind
127.0.0.1by default. Tailnet or LAN exposure is an explicit operator opt-in viaGATEWAY_HOST/CPU_GATE_HOST; the gateway should not be exposed to the open LAN or internet. Only the egress to the cloud LLM API leaves the host, carrying placeholders and the verbatim auth header.
request in
|
[1] extract redactable text fields
|
[2] cheap deterministic gate (ALWAYS, microseconds)
|
[3] empty path: no scannable text and no known entity? --yes--> forward unchanged
| no
[4] on-device NER pass over extracted non-trivial fields
|
[5] union merge (connected-component) + session+project entity map
|
[6] forward upstream, auth header verbatim
|
[7] stream-rehydrate the SSE response
|
response out (real values restored to the local client)
The proxy pulls the fields that can carry model-visible user data: system, messages,
tool_result text/JSON, tool_use input, Anthropic document text, and tool schema
descriptions/literal values. It never rewrites tool/function names, schema property names, images,
binary file bytes, or the model name. It also passes cryptographically-bound reasoning material through
opaque -- Anthropic thinking / redacted_thinking blocks (a signature MAC over the content) and OpenAI
reasoning encrypted_content (a ciphertext blob) -- neither redacting nor rehydrating it, because it must
round-trip byte-for-byte or the signature/decryption breaks; it is model-generated from already-redacted
input, so it holds only placeholders. This boundary matters: rewriting routing or schema structure would
break the request, while structured argument values are user data and must be scanned.
Every extracted field passes through the Tier-0 deterministic gate, every time, with no exceptions. Two scans run here:
- Tier-0 PII: regex + Luhn check. This owns the catastrophic structured categories (payment cards via Luhn, SIN/NAS, etc.).
- Secrets / entropy scan: ported gitleaks-style patterns plus a Shannon-entropy backstop, with UUID / git-SHA / sequential false-positive filters.
This runs in microseconds, so it is cheap enough to be unconditional. It is also the reliable floor: because NER recall is below 100%, the deterministic layer is what anchors coverage of the catastrophic categories (secrets, cards, SIN), independent of the model.
If a request has no scannable extracted text and no prior session entity to backstop, it is forwarded unchanged. Normal non-empty text fields proceed to the local NER pass even when Tier-0 finds nothing, because the NER-only labels (person, organization, address) have no deterministic floor and short structural values can carry them.
The on-device NER model runs on every extracted non-trivial text field. Repeated system prompts and prior turns are cached, but code-like fields and short structural values are not skipped solely because Tier-0 is clean. That trade keeps the privacy posture fail-safer for NER-only labels such as person names.
The model processes text in 256-token windows at about 34ms per window.
Detections from the deterministic layers and the NER pass are combined with a union merge so that overlapping or adjacent spans from different detectors become one clean span. The merge is connected-component: spans that touch or overlap are grouped into a single component and redacted as a whole. This is what prevents fragment leaks. (See the dedicated section below.)
The merged entities then flow through the session + project entity map (AES-GCM at rest), which assigns the same placeholder to the same value across turns and applies the known-entity backstop. (See the dedicated section below.)
The rewritten request (placeholders in place of real values) is forwarded to the cloud LLM API. The client's auth header is forwarded verbatim, which is what lets a Claude Max subscription work without an API key.
The upstream response is a Server-Sent Events stream. As deltas arrive, the proxy reverses the
placeholder map so the local client receives real values. This requires reassembling placeholders
that split across deltas and rehydrating tool_use argument JSON at the value level. (See the
Streaming SSE rehydration section below.)
The fast path is the performance backbone. The principle is: never pay for a model when a microsecond-scale deterministic check can clear the request.
field --> Tier-0 regex+Luhn + secrets/entropy
|
any hit? ----no, and not natural-language----> forward unchanged (0 model cost)
|
yes / natural-language
|
v
targeted on-device NER pass
Two consequences:
- A clean request is forwarded with zero model inference. The measured clean fast-path latency is 1.7ms median.
- The NER model is invoked only where it can add detections the deterministic layer cannot, which keeps on-device inference proportional to actual PII risk rather than total traffic volume.
The on-device NER pass is the always-on detection workhorse. It runs xlm-roberta-base as a
dynamic-INT8 ONNX model on CPU via onnxruntime (the Intel NPU / OpenVINO FP16 tier is preserved as
a drop-in alternate).
extracted non-trivial text field
|
chunk into 256-token windows
|
+----+----+----+ ...
| w0 | w1 | w2 | ~34ms per window
+----+----+----+
|
per-window NER spans --> union merge (step 5)
- Targeting: every extracted non-trivial text field enters this pass. Repeated system prompts and prior turns are cached; empty/non-text requests can still forward unchanged.
- Chunking: text is processed in 256-token windows, at about 34ms per window.
- Why the base tier is the right always-on tier: the base model matches the GPU large model on recall at far lower latency, so the always-on tier runs the base model on-device (CPU INT8, ~42ms; the NPU/OpenVINO tier is the preserved alternate).
Multiple detectors (Tier-0 regex+Luhn, the secrets/entropy scan, and the NER pass) can each fire on the same region of text, often with partially overlapping spans. A person's name might be caught as two adjacent tokens by NER, while a structured detector catches an embedded digit run. If you redact each span independently, the gaps between them can leak fragments of the real value: the placeholders end up interleaved with unredacted characters.
The union merge treats spans as nodes in a graph and draws an edge between any two spans that overlap or are adjacent. Each connected component is then redacted as a single span.
detector spans on the same region:
[---- A ----]
[----- B -----]
[-- C --]
connected-component union:
[============ merged ============] <-- one placeholder, no gaps
Because the whole connected component is replaced atomically, there are no fragment leaks: no sliver of the real value survives between two adjacent redactions.
Once a value is identified, it must map to a stable placeholder so that the same real value gets the same placeholder across every turn of a conversation. Without this, the cloud model would see a different token for the same entity each turn and lose coreference; rehydration would also be ambiguous.
The entity map is scoped to session and project, with AES-GCM at rest. The same value yields the same placeholder across turns within that scope.
NER recall is below 100%. A value the model catches on turn 1 might be missed on turn 5 (different phrasing, a chunk boundary, an adversarial gluing). The known-entity backstop closes this gap: once a value has been identified, it is recorded, and any later occurrence of that value is re-redacted deterministically even if the NER model misses it that turn.
Turn 1 (user): "Marie Tremblay called about her file."
NER detects "Marie Tremblay" --> placeholder PERSON_1, recorded in the entity map.
Cloud model sees: "PERSON_1 called about her file."
Turn 5 (user): "...and tremblay, marie still hasn't sent the form"
NER misses this lowercased, reordered occurrence.
WITHOUT the backstop: "Marie Tremblay" leaks to the cloud verbatim.
WITH the backstop: the known value is matched and re-redacted to PERSON_1
before the request leaves the host.
The backstop turns a one-time successful detection into a durable guarantee for the rest of the session/project scope.
The gateway is a single process (:8011, one async event loop; no per-client workers). It does
not fork a process or thread per client -- many simultaneous agents are handled as interleaved
coroutines on one loop. While one request is awaiting the on-device NER pass or the upstream API,
the loop serves another. So "many agents at once" means request multiplexing, not CPU parallelism,
and each in-flight request keeps its own body, context, and placeholder map as coroutine-local state
-- there is no shared per-request mutable global, so two requests never see each other's text in
memory.
PII isolation between agents is enforced by the session -> entity-map-file mapping, not by memory
or process separation. Each request derives a (session, project) key, and that pair selects exactly
one encrypted map file; an agent can only read, write, or rehydrate against its own file.
sessioncomes from the client's session header (x-claude-code-session-id, orx-session-idon the OpenAI routes); absent that, a hash of the system prompt; absent both, a unique per-request ephemeral id (so two truly handle-less first-turn flows never share one map).- The map path is
sha256(project \0 session), so two different sessions hash to different files, different placeholder counters, and different replay maps. Agent A's<PERSON_001>and agent B's<PERSON_001>are minted from independent counters in independent files and can never collide or cross-rehydrate.
A tool that sends a stable session header (Claude Code does on every request) therefore gets a per-conversation map fully isolated from every other client on the same gateway.
When one agent fires parallel requests on the same session (parallel tool calls, sub-agents), they
share one map file and must be serialized at the mint stage. Detection (the awaited NER calls) runs
outside the lock so the model stays busy; then the load -> mint -> save cycle runs under a
per-(session, project) lock: an in-process re-entrant lock plus a cross-process fcntl.flock
on a sidecar .lock file, with an atomic file replace. The second request to take the lock loads the
file the first just saved, sees its placeholders, and reuses them -- so the same value gets one
stable placeholder even under concurrency. This is also what keeps the upstream prompt cache warm:
an unstable placeholder would change the redacted prefix bytes every turn and bust the cache, which
re-bills the whole conversation as uncached input.
The placeholder->value map handed to the response rehydrator is scoped to the placeholders that actually appear in this request's outbound body, not the full session map. The upstream model can only emit a placeholder it received, so this is sufficient for rehydration (including cross-turn, since re-sent history carries its placeholders) while guaranteeing a request can never rehydrate a value it did not send. This is the isolation boundary when two header-less clients share a system prompt (and therefore one map file): each request still only reconstitutes its own values, and an unknown placeholder is left raw rather than mapped to another flow's value.
- One event loop: CPU-bound work between awaits (regex scans, the merge, JSON (de)serialization) briefly stalls all in-flight requests; there is no CPU parallelism in the proxy itself.
- One on-device NER engine is the throughput ceiling -- every non-trivial field from every agent funnels through it, scanned in 256-token windows.
- Per-
(session, project)mint serialization: heavy same-session concurrency serializes at the lock; different sessions take different locks and do not contend. - Cross-process safety (multiple gateway instances sharing one map directory) rests on
fcntl.flock-- host-local advisory locking; it does not extend across hosts or a network filesystem without working flock semantics. - For strict multi-tenant separation (different parties' data through one gateway), key the session on a per-client/auth fingerprint as well, so distinct clients never share a map file even when they share a system prompt.
The response is an SSE stream of incremental deltas. The proxy rehydrates placeholders back to real values as the stream flows to the local client. Three concerns drive the design.
A placeholder token can be split across two deltas. If you rehydrate each delta independently, you would fail to match the placeholder and pass a broken token to the client.
delta n : "... please contact PER"
delta n+1 : "SON_1 about the file"
naive per-delta rehydration: "PER" and "SON_1" both pass through unmatched.
with tail-buffer:
hold back the suffix that could be the start of a placeholder ("PER"),
prepend it to the next delta, then match "PERSON_1" and rehydrate.
The proxy keeps a tail buffer: it holds back the trailing portion of a delta that could be the beginning of a placeholder, prepends it to the next delta, and only emits text once it is sure no placeholder straddles the boundary.
For tool_use blocks, arguments stream as input_json_delta fragments that are not individually
valid JSON. The proxy accumulates these fragments, then rehydrates the tool arguments at the
value level so that a placeholder appearing inside a JSON string value is restored to its real
value without corrupting the JSON structure.
input_json_delta : {"to": "EM
input_json_delta : AIL_1", "subject": "..."}
accumulate --> {"to": "EMAIL_1", "subject": "..."}
value-level rehydrate --> {"to": "marie@example.com", "subject": "..."}
The cloud model could emit a placeholder-looking string that was never in the egress map (a hallucination). Rehydrating it against nothing, or guessing, would be unsafe. The policy is: a placeholder is only rehydrated if it is in this request's replay -- the placeholders that appeared in the request's own outbound body (see Concurrency and multi-agent isolation). A placeholder-shaped token with no entry is left as-is, not invented and not mapped to a real value -- which also guarantees a request can never rehydrate a value it did not itself send.
Secrets are handled by a deterministic layer that runs in the deployed appliance as part of the always-on cheap gate (step 2), independent of the NER model. It has two parts:
- Pattern matching: ported gitleaks-style patterns for known credential shapes.
- Shannon-entropy backstop: high-entropy strings that match no known pattern are flagged as likely secrets, with UUID / git-SHA / sequential false-positive filters so that legitimate high-entropy-but-not-secret strings are not over-redacted.
Because this layer is deterministic and always-on, it is the reliable floor for the catastrophic secret categories regardless of model recall or policy configuration.
Policy is layered. The resolution order for PII categories is:
session > project > default
- session overrides project overrides default. A per-session PII config wins over a per-project config, which wins over the built-in default.
On top of that ordering, three rules are fixed and not subject to policy:
- Secrets and credentials always redact.
api_key,password,access_tokenand the like are redacted regardless of any policy setting. Policy can never turn the secrets floor off. - Operational labels excluded by default.
file_path,username,organizationare excluded by default to avoid breaking the coding use case (redacting every file path or username would make the gateway unusable for coding agents). - Hash allowlist. Git commit / content hashes (40-hex and 64-hex) are allowlisted so they are never redacted. These look high-entropy but are not PII, and redacting them would break diffs and references.
for each candidate category:
if category in {api_key, password, access_token, ...secrets}: REDACT (always, non-negotiable)
elif value matches 40/64-hex git/content hash: KEEP (allowlist)
elif category in {file_path, username, organization}: KEEP (default exclusion)
else: resolve(session > project > default)
A 3-tier NER suite with a French-Quebec + English focus. This bilingual Quebec PII focus is the moat: competitors lean on generic Presidio/regex.
| Tier | Model | Runtime | Role |
|---|---|---|---|
| CPU | xlm-roberta-base | dynamic-INT8 ONNX on CPU (onnxruntime) | the deployed always-on workhorse |
| NPU | xlm-roberta-base | OpenVINO FP16 IR, Intel NPU (alternate tier) | preserved drop-in alternate |
| GPU | xlm-roberta-large | GPU | highest-capacity tier |
Underneath the NER suite sit two deterministic layers (in the deployed appliance):
- Tier-0 (regex + Luhn) owns the catastrophic structured categories.
- Deterministic secrets layer (gitleaks-style patterns + Shannon-entropy backstop with UUID/git-SHA/sequential FP filters).
Repo scope vs deployed appliance. This repository contains the detection library and CLI (gate/privacy_gate.py: Tier-0 regex+Luhn floor, NER tier wrappers, merge, redact/rehydrate), the training code, the validation code, and the egress proxy (appliance/: the :8011 always-on gateway, SSE stream rehydration, the AES-GCM session/project entity map, the known-entity backstop, and the deterministic secrets/entropy layer). The GPU NER gate service (gate/gate_service_gpu.py) is now version-controlled here too; the running instance is deployed on the GPU host, with deploy/check-gate-drift.sh guarding host-vs-repo drift. The pipeline described in this document is that of the deployed appliance.
The shipped model uses 20 labels (training/labels_v20.json):
account_number, address, card_cvv, card_expiry, date_of_birth, email, file_path,
government_id, iban, ip_address, organization, password, payment_card, person,
phone_number, postal_code, secret, sensitive_account_id, tax_id, username.
The prior 23-label scheme was consolidated: bank_account + routing_number folded into account_number / sensitive_account_id; api_key + access_token folded into secret / password; sensitive_date folded into date_of_birth; phone renamed phone_number; postal renamed postal_code; ip renamed ip_address.
Recall here means leak-prevention. clean_fp means over-redaction on negative rows.
Measured on the synthetic held-out corpus (7,498 synthetic rows, 20 labels, "full" config = Tier-0 floor + neural model, 0 train overlap, unseen document structures -- an anti-saturation held-out built ONLY from structural variants never seen in training). Source: validation/RESULT-v11r9c.md (the v11r5 baseline it improves on is validation/RESULT-v11.md). Both tiers now ship the v11r9c revision -- the GPU/large as pii-gpu-xlmr-large-v11r9c and the CPU/base as pii-gpu-xlmr-base-v11r9c, retrained on the same cumulative corpus so the base now carries the organization/address fix too (base address recall ~0.93).
The privacy metric is full-stack catastrophic DETECTION recall: any detected span is redacted regardless of which label it gets -- an intra-catastrophic mislabel is still a redaction, not a leak.
| pick | base | catastrophic full-stack DETECTION | overall labeled R | overall P | clean_fp |
|---|---|---|---|---|---|
| GPU | xlm-r-large-v11r9c | 0.9954 | 0.9882 | 0.9615 | 34 / 7498 rows |
| CPU | xlm-r-base-v11r9c | 0.9941 | 0.9777 | 0.9139 | 48 / 7498 rows |
The v11r9c catastrophic recall essentially holds vs the prior v11r5 large model (0.9964 -> 0.9954, -0.001). The reason v11r9c ships is the structural-form leak it closes: organization recall ~0.10 -> 1.00 and address recall ~0.60 -> 0.95 on the synthetic corpus. The honest tradeoff is more over-redaction on digit-ID-shaped tokens: clean false positives on no-PII rows rise from 12 to 34 (per-label precision dips on government_id ~0.87, phone_number ~0.84, sensitive_account_id ~0.88, account_number ~0.94, date_of_birth ~0.96). This is the safe failure direction -- over-redaction never leaks PII, it only costs a coding agent a little context when a benign number is ID-shaped -- so for a privacy firewall whose prime directive is "never leak," closing the org/address leak is worth the extra over-redaction. FR is not weaker than EN: the Quebec-French moat holds on unseen structure.
Earlier results on the v6 generation sets (in-distribution held-out, train and val shared document layouts). Kept for reference; do not use these as current figures.
| Model | ALL-CAPS gate | tabular test | v6 val | canonical | clean_fp |
|---|---|---|---|---|---|
| NPU xlm-r-base | 0.955 | 0.968 | 0.990 | 0.986 | 0 |
| GPU xlm-r-large | 0.955 | (identical to NPU) | 0.990 | (identical) | 0 |
Key result: the base model equals GPU large on recall at about 4x lower latency, which is why the base model is the always-on tier (deployed as CPU INT8).
Presidio configured with English + French large spaCy, union, same sets, same metric:
| Test | OSSRedact recall | Presidio recall | OSSRedact clean_fp | Presidio clean_fp |
|---|---|---|---|---|
| ALL-CAPS gate | 0.955 | 0.779 | 0 | (n/a) |
| v6 val | 0.990 | 0.759 | 0 | 343 |
| canonical | 0.986 | 0.798 | 0 | 508 |
OSSRedact wins recall by 17 to 23 points and has far fewer false positives.
A generated corpus of 5,000 FR + EN documents (bank statements, financing forms, email threads,
CSV exports, .env, code), redacted entirely locally:
- 218,931 PII spans redacted.
- Zero email, SIN, account-ID, or credit-card leaks in the redacted output on this synthetic corpus (verified against ground truth); this is a synthetic-corpus result, not a real-world zero-leak guarantee.
- Adversarial cases included (ALL-CAPS, NBSP-separated IDs, mixed FR/EN, long unbroken lines, look-alike decoys). A NBSP-separated-SIN gap in cue-less cells was surfaced, fixed, and re-verified at zero SIN leaks.
- On our synthetic code-context corpus, recall is 1.000 across JSON, YAML, SQL, CSV, logs,
.env, and code comments, in FR + EN. This is the realistic, structured-PII case. - Honest adversarial caveat (see Limitations #11): full names glued into camelCase / snake_case identifiers are a harder, adversarial case and are under-detected -- 0.882 on that variant. The glued/adversarial number, not the structured 1.000, is the one to lead with when reasoning about worst-case identifier leaks.
| Path | Latency |
|---|---|
| appliance clean fast-path | 1.7ms median |
| PII-bearing request | 23.5ms median |
| on-device per 256-token window | about 34ms |
Stated plainly:
- Models are trained and validated entirely on synthetic Québec data, and every "zero leak" / "verified" claim in this document is scoped to our synthetic held-out corpus -- it is not a real-world zero-leak guarantee. Broader real-world domains are future work.
- The deterministic Tier-0 floor does not cover every category. It deterministically covers (model-independent hard floor): secrets / API keys, payment cards (Luhn), IBAN, SIN / government IDs, emails, IP addresses, and file paths. Address and organization have NO deterministic floor -- they rely entirely on the NER model. v11r9c now covers them well on the synthetic corpus (org 1.0, address 0.95), but that coverage is model-dependent, not a hard guarantee like the Tier-0 categories.
- #11 -- Glued / adversarial identifiers. Full names glued into code identifiers (camelCase / snake_case) are an adversarial case and are under-detected: 0.882 on the synthetic adversarial variant, versus 1.000 on structured code-context PII. Lead with the realistic structured number for typical traffic, but treat glued identifiers as a known worst-case gap, not a solved one.
- Bare long digit runs glued to adjacent letters can be missed unless a financial / identity cue is nearby.
- French and English only by design. Multilingual is an explicit future axis, not v1.
- Recall is below 100%. The deterministic layer is the reliable floor only for the catastrophic structured categories (secrets, cards, SIN, IBAN, emails, IP, file paths); the model-dependent categories (person, organization, address) have no such floor.
Charts: ./charts/fig1, fig3, fig5 (png).
- appliance is built, running as a systemd service, and verified end-to-end: a real Claude Code session through the proxy redacts and rehydrates transparently.
- Not yet published.
- The workbench UI is built. Anthropic
/v1/messages, OpenAI-compatible/v1/chat/completions, and OpenAI/v1/responsesadapters are live; CLI wiring for Codex, Hermes, Pi, omp, and opencode is documented indocs/ADAPTERS.md. (Both the egress-proxy code underappliance/and the GPU NER gate service undergate/are now version-controlled;deploy/check-gate-drift.shguards host vs repo drift.)