Skip to content

Add local open-source ATR detection guardrail sample (HookProvider)#273

Draft
eeee2345 wants to merge 1 commit into
strands-agents:mainfrom
eeee2345:add-atr-guardrail-sample
Draft

Add local open-source ATR detection guardrail sample (HookProvider)#273
eeee2345 wants to merge 1 commit into
strands-agents:mainfrom
eeee2345:add-atr-guardrail-sample

Conversation

@eeee2345

Copy link
Copy Markdown

This adds a guardrail sample under python/03-integrate/guardrails/, alongside the existing alice-wonderfence, llama-firewall, and nvidia-nemo samples.

What it adds: an ATRGuardrailHook that runs Agent Threat Rules (ATR, https://github.com/Agent-Threat-Rule/agent-threat-rules, MIT) against incoming turns and tool arguments. ATR is an open detection ruleset for AI-agent threats; the pyatr engine matches with pattern rules in-process, so the check needs no API key, no network call, and sends no agent data off the host.

Enforcement points (stable hooks API):

  • BeforeInvocationEvent -> event.cancel when a rule at or above min_severity matches the user turn.
  • BeforeToolCallEvent -> event.cancel_tool when a rule matches the tool arguments.

How it complements the existing samples: WonderFence calls a hosted service; LlamaFirewall and NeMo use model-based scanners. ATR is local and deterministic, which fits data-residency-sensitive deployments and can run as a fast offline first layer.

Status -- opening as a draft on purpose: the detection core and the hook decision logic are tested locally against the real pyatr engine (injection input and injected tool arguments are blocked; benign input and benign tool args are allowed; no false positives on the benign cases). I have not yet run main.py end-to-end against Amazon Bedrock from my own environment, so I am marking this a draft until that live run is confirmed. Happy to do it, or for a maintainer with Bedrock access to verify. Flagging honestly rather than claiming an e2e I have not run.

Files: guardrail.py, main.py, README.md (use-cases template), requirements.txt. Self-contained, no infrastructure to clean up.

Happy to adjust naming, placement, or the enforcement-point choice.

Adds python/03-integrate/guardrails/agent-threat-rules: a HookProvider that
screens agent inputs (BeforeInvocationEvent) and tool calls
(BeforeToolCallEvent) with Agent Threat Rules (ATR, MIT), an open detection
ruleset. Fully local and deterministic (no API key, no network call),
complementing the hosted (WonderFence) and model-based (LlamaFirewall, NeMo)
guardrail samples.

Signed-off-by: Adam Lin <adam@agentthreatrule.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant