Add local open-source ATR detection guardrail sample (HookProvider)#273
Draft
eeee2345 wants to merge 1 commit into
Draft
Add local open-source ATR detection guardrail sample (HookProvider)#273eeee2345 wants to merge 1 commit into
eeee2345 wants to merge 1 commit into
Conversation
Adds python/03-integrate/guardrails/agent-threat-rules: a HookProvider that screens agent inputs (BeforeInvocationEvent) and tool calls (BeforeToolCallEvent) with Agent Threat Rules (ATR, MIT), an open detection ruleset. Fully local and deterministic (no API key, no network call), complementing the hosted (WonderFence) and model-based (LlamaFirewall, NeMo) guardrail samples. Signed-off-by: Adam Lin <adam@agentthreatrule.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds a guardrail sample under python/03-integrate/guardrails/, alongside the existing alice-wonderfence, llama-firewall, and nvidia-nemo samples.
What it adds: an ATRGuardrailHook that runs Agent Threat Rules (ATR, https://github.com/Agent-Threat-Rule/agent-threat-rules, MIT) against incoming turns and tool arguments. ATR is an open detection ruleset for AI-agent threats; the pyatr engine matches with pattern rules in-process, so the check needs no API key, no network call, and sends no agent data off the host.
Enforcement points (stable hooks API):
How it complements the existing samples: WonderFence calls a hosted service; LlamaFirewall and NeMo use model-based scanners. ATR is local and deterministic, which fits data-residency-sensitive deployments and can run as a fast offline first layer.
Status -- opening as a draft on purpose: the detection core and the hook decision logic are tested locally against the real pyatr engine (injection input and injected tool arguments are blocked; benign input and benign tool args are allowed; no false positives on the benign cases). I have not yet run main.py end-to-end against Amazon Bedrock from my own environment, so I am marking this a draft until that live run is confirmed. Happy to do it, or for a maintainer with Bedrock access to verify. Flagging honestly rather than claiming an e2e I have not run.
Files: guardrail.py, main.py, README.md (use-cases template), requirements.txt. Self-contained, no infrastructure to clean up.
Happy to adjust naming, placement, or the enforcement-point choice.