Skip to content

feat/issue65 llm providers#69

Merged
ar7casper merged 15 commits into
masterfrom
feat/issue65-llm-providers
Jun 16, 2026
Merged

feat/issue65 llm providers#69
ar7casper merged 15 commits into
masterfrom
feat/issue65-llm-providers

Conversation

@ar7casper

Copy link
Copy Markdown
Collaborator

ar7casper and others added 2 commits May 24, 2026 15:08
…ti-provider support (issue #65)

Replaces the hardcoded AnthropicClient wrapper used across 15+ files with a
Protocol-based adapter layer. Each pipeline phase (analyze, enhance, verify,
report, dynamic_test, llm_reach, app_context) resolves to a (provider, model)
pair via a registry built from ~/.config/openant/config.json.

Adapter framework (libs/openant-core/utilities/llm/):
* adapter.py    LLMAdapter Protocol, unified content blocks, 5-class error taxonomy
* config.py     v2 schema with v1->v2 migration of legacy api_key
* builtins.py   frozen openant-default config pinning today's Claude defaults
* registry.py   PhaseRegistry with eager adapter instantiation + probe validation
* helpers.py    simple_text + lookup_pricing shortcuts

Shipped adapters (providers/):
* anthropic.py  reference impl, full tool calling, per-model pricing
* openai.py     Chat Completions, splits one-user-with-N-tool-results into N role=tool messages
* google.py     Gemini via google-genai, function_call/function_response Part translation

All three support tool calling. 33-test contract harness runs against all three.

Setup wizard (apps/openant-cli/cmd/setup.go):
* `openant setup llm` walks per-phase provider+model selection
* Per-phase per-provider model defaults + known-models hint
* 1-token probe per unique (provider, model) before save
* OpenAI ChatGPT/Codex-vs-API-key heads-up

Go CLI:
* --model opus/sonnet replaced by --llm-config <name> on analyze/scan/enhance/verify/dynamic-test/report
* internal/config/config.go gains WriteLLMConfig + raw-map round-trip preserving v2 fields
* set-api-key refactored to share the probe helper now used by the wizard

Pipeline migration:
* Every AnthropicClient call site (15+ files) replaced with PhaseBinding threading
* Scanner builds the registry once and probes upfront; standalone verbs build their own
* Defensive _coerce_to_str in reporter so non-string fields from GPT-style responses
  don't crash report generation (analyze prompt also tightened to require string types)

Closes #65.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
README:
* Replace "Using a non-Anthropic provider" with "Setting up an LLM"
  that leads with the `openant setup llm` wizard
* Shipped-adapters table (Anthropic, OpenAI, Google) with
  subscription-vs-API caveats per provider
* 7-phase sample config (with `app_context`)
* New Roadmap section: more adapters, OAuth subscription auth,
  cross-provider tool-call quirks, more languages, hosted scan service

CHANGELOG: bring [2026-05-24] entry in line with reality —
add OpenAI/Google adapters, the setup wizard, the contract harness,
the reporter coercion fix, and the seventh `app_context` phase that
the original entry didn't mention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ar7casper and others added 8 commits June 8, 2026 13:09
…layer

Critical/High:
- C1: carry the tool name on ToolResultBlock so the Gemini adapter sends
  the real function name on function_response (Gemini matches by name,
  not the synthesised gemini_<name>_<idx> id)
- H1: wire OpenAI + Google adapters into the global rate limiter via a
  shared providers/_ratelimit helper (was Anthropic-only); fix the two
  false "adapter handles it internally" caller comments
- H3: send max_completion_tokens for OpenAI reasoning models (o1/o3/o4)
- H5: warn once on malformed tool_call.arguments instead of silently
  collapsing to {}

Medium/Low:
- M6: warn once when the Anthropic adapter drops an unknown block kind
- M7: reset_warning_state() clears every per-adapter warn set, wired
  into reset_global_tracker()
- M9: drift-guard test pins MODEL_PRICING to AnthropicAdapter.pricing
- M1: remove dead imports (agent LLMRateLimitError, finding_verifier
  LLMError + orphaned get_rate_limiter)
- M10: drop the stale `model: opus|sonnet` docstring in scan_repository
- config _optional_str raises on non-strings; llm_reachability re-raises
  LLMAuthError; report _extract_usage warns on missing pricing; hoist
  registry import sys

Tests: 344 passed, 72 skipped; ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- C2: redact ?key= from Gemini probe transport errors before they reach
  stderr (was leaking the API key on any network failure)
- H3: send max_completion_tokens in the OpenAI probe for o1/o3/o4 models
- H2: document that the v2 per-provider key check is intentionally left
  to Python's registry.validate() at scan start
- M3: accept the key when set-api-key's probe returns 404
  (model_not_found) — auth already succeeded
- M4: write config atomically (temp file + rename), with a Windows-safe
  backup/restore so a failed replace never loses the original
- M5: merge provider entries in WriteLLMConfig, preserving hand-authored
  sibling fields
- M8: delete the tautological adapterShipped() + dead caller; fix the
  stale "only anthropic ships" comments

Tests: go vet + build + test ./... all pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…erifier truncation guard

Round-2 review found adapter defects the prior review missed:
- Gemini output tokens now count candidates + thoughts_token_count, so
  cost isn't undercounted for thinking models (gemini-2.5-*)
- Gemini empty-candidates (a prompt-level safety block) now raises
  LLMResponseError with the block reason instead of returning a silent
  empty end_turn that read as a clean pass
- finding_verifier mirrors the enhancer's guard: a truncated response
  (max_tokens, no tool call) returns "incomplete" instead of sending an
  empty user message that the next complete() rejects
- both agentic loops echo only Text+ToolUse blocks (future block-kind
  re-serialization safety)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…olish

- setup.go: skip the per-provider probe when the key is blank, so the
  wizard's "leave blank to read from environment" option actually
  completes instead of dead-ending on a 401
- llm_probe_google.go: tighten the key-redaction regex so it stops at the
  URL's closing quote instead of swallowing the delimiter
- root.go: fix a stale doc reference (requireAPIKeyFor -> requireAPIKey)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…kKey

Pre-existing issues surfaced during the PR #69 round-2 review (not
introduced by that PR), fixed here as security/robustness hardening:
- repository_index.read_file_section now confines the model-controlled
  file_path to the repo root (is_relative_to, which also collapses
  symlink escapes) — closes an arbitrary host-file read via the agent's
  read_file_section tool
- config.MaskKey returns "****" for keys shorter than 8 chars — no more
  out-of-range panic via `config set api-key --stdin` -> `config show`,
  and no full-key reveal for 3-5 char keys
- config.go: the *Names doc comments no longer claim "sorted" (they don't)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ntry-point bindings

Round-3 review fixes (test-first, all proven):
- H1: restore report phase default to Opus (claude-opus-4-6); it was silently downgraded to Sonnet, contradicting the no-behavior-change guarantee
- H2: fix ContextEnhancer() callers (5 parser pipelines + context_enhancer __main__) to build a registry and pass the enhance binding
- H3: OpenAI adapter routes system->developer for reasoning models; drop o1-mini/o1-preview; add gpt-4.1/o3/o4-mini pricing; guard empty choices
- M1: raise dependency floors (openai>=1.45.0, google-genai>=1.0.0)
- M2: python -m report summary/disclosures pass the now-required binding
- M3: reporter coerces the data_flow container before iterating
- M4 (py): report-data subparser registers --llm-config
- M5: config validation exempts anthropic for the env-key path
- L1/L4/L5/L10: dynamic_tester binding threading, error diagnostics, stale docstrings, tightened assertion

Python: 390 passed, 72 skipped; ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…g, set-api-key, probe test

Round-3 review fixes (test-first):
- H3 (Go): drop o1-mini from the wizard known OpenAI models, matching the Python adapter
- M4 (Go): runHTMLReport forwards --llm-config via the new buildReportDataArgs helper
- L7: cover probeAllPhases returning an error on a failed probe
- L8: set-api-key soft-passes transient 429/5xx (only 401/403 reject a key)
- L9: remove the dangling openant llm-config list reference from scan help

go test ./... ok; go vet/build clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- M6: correct the four-file-recipe phrasing — adding a provider also needs Go-side wizard/probe wiring
- L6: fix the contract-test count (12 methods / 36 cases across 3 adapters)

(The companion HOW_TO_ADD_AN_ADAPTER.md also gained a Go-wiring section, but it lives under the gitignored docs/ path and stays local.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gadievron

Copy link
Copy Markdown
Collaborator

Reviewed the adapter framework in depth (multiple passes, plus a head-to-head against another project that did this same native-SDK migration). Architecture is solid — clean Protocol, PhaseBinding wiring, backward-compat preserved, credentials baseline sound (0600, masking, traversal-blocked, the Go side redacts key= from probe URLs). Good call using native SDKs rather than a gateway lib. Static analysis (semgrep + CodeQL, security + quality, both languages) is clean, so the items below are behavioral asymmetries between the three adapters, not anything a scanner flags.

In the new adapter framework — worth fixing:

  • HIGH — Anthropic has no empty-content guard. A well-formed refusal/empty completion yields content=(), stop_reason=end_turn, simple_text() -> ''. openai.py:365 (empty choices) and google.py:346 (empty candidates) both raise LLMResponseError; anthropic.py:305-356 doesn't. Because the verify phase then defaults to agreeing with Stage 1 on empty text, a refusal can silently become a clean security verdict (false negative). Suggest mirroring the OpenAI/Google guard in AnthropicAdapter._response_to_unified.
  • MED — refusal / content-filter is normalized to end_turn (warn-only) in all three adapters. For a SAST tool a masked refusal is a masked false-negative. A typed LLMRefusalError raised on refusal finish reasons keeps it from reading as a normal short answer (distinct from the empty-guard above — this fires on a populated refusal).
  • MED — Google max_retries is silently dropped (google.py:158 _ = max_retries). The docstring says the SDK doesn't expose retries, but HttpOptions.retry_options=HttpRetryOptions(attempts=...) does (and you already build HttpOptions at :154). Anthropic/OpenAI forward it; Google should too — and the docstring is misleading.
  • MODERATE — the setup wizard echoes the pasted API key. setup.go:412 -> readLine (bufio.ReadString) prints input as typed; there's no term.ReadPassword in cmd/ (charmbracelet/x/term is already an indirect dep). set-api-key <key> also takes the key as an argv (shell history / ps). A no-echo read closes both.
  • LOW — Anthropic reads response.usage unguarded (anthropic.py:329, outside the try) -> a usage-less response raises an unmapped AttributeError; OpenAI :422 / Google :360 use getattr(..., None).
  • LOW — error mapping passes raw exception strings through. All three adapters do raise LLMError(str(exc)) with no redaction layer; provider 400/401 bodies can echo request context. Worth scrubbing secrets before these propagate to logs/reports.

Several of these are the same shape — handled in two adapters, missed in the third (Anthropic is the odd one out on the empty-guard and on usage; Google on retries) — so they're cheap to fix together.

Two pre-existing issues this PR's completion path now feeds (flagging because they're high-impact, not asking you to redesign here): the verify phase defaults to "agree with Stage 1" on every degenerate path — incomplete/empty completion (finding_verifier.py:380, where the empty-completion case lands), no tool calls (:448), max-iterations reached (:464), and even a finish call that omits the agree field (:925, get("agree", True)). For a security verifier that biases systematically toward false negatives, so the empty-completion case above is one of four inputs that silently rubber-stamp Stage 1. Separately, untrusted analyzed source is interpolated into the verifier prompt inside a bare triple-backtick fence (verification_prompts.py:106-125) with no escaping, so a crafted source file can break out of the fence and steer the verdict. Both are worth a fix independent of this PR.

Happy to share repros for any of these.

ar7casper and others added 4 commits June 16, 2026 11:37
… handling

- Anthropic: empty-content guard, usage getattr, refusal stop_reason -> LLMRefusalError
- OpenAI: content_filter finish_reason -> LLMRefusalError
- Google: SAFETY/RECITATION/... -> LLMRefusalError; forward max_retries via
  HttpRetryOptions(attempts=max_retries+1) for parity with the other adapters
- New LLMRefusalError(LLMResponseError) in the taxonomy
- New _redact.py: scrub provider keys from adapter error strings and the chained
  cause (RedactedCause) so secrets don't leak via tracebacks/logs

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…erdict

- finding_verifier: degenerate verify paths no longer auto-agree with Stage 1;
  mark VerificationResult.incomplete and set result["error"] on adapter raise
- reporter: incomplete -> stage2_verdict "unverified" (not "rejected");
  "unverified" is disclosure-eligible so a couldn't-verify finding still surfaces
- verifier/scanner/schemas: needs_review counter; degenerate findings no longer
  folded into the "safe" count
- report/{generator,__main__}: align disclosure gate with "unverified"

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ompts

- New prompts/_fence.py safe_code_fence(): opens a fence longer than any backtick
  run in the untrusted source (CommonMark), so crafted source can't escape the
  fence and steer the analyst/verifier verdict; None-tolerant
- verification_prompts (Stage 2) and vulnerability_analysis (Stage 1) both use it

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pi-key

- promptSecret() reads the key without echo via charmbracelet/x/term (promoted to
  a direct dep), falling back to the reader when stdin isn't a TTY
- set-api-key [key] arg now optional -> no-echo prompt; help warns that passing
  the key as an argument leaks it via shell history / ps

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ar7casper ar7casper requested a review from gadievron as a code owner June 16, 2026 08:39
The redact_secrets() tests must feed key-shaped inputs, which the default
gitleaks generic-api-key rule flags even though no real secret exists. Add a
.gitleaks.toml that keeps the full default ruleset (useDefault) and narrowly
allowlists ONLY secrets that both (a) live in the two redaction-test files and
(b) carry an unmistakable fake marker (1234567890 / LEAKED / EXAMPLE / FAKE).
Verified the AND condition is enforced: the same marker elsewhere is still
reported.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ar7casper ar7casper merged commit 49d9802 into master Jun 16, 2026
9 checks passed
@ar7casper ar7casper deleted the feat/issue65-llm-providers branch June 16, 2026 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants