Skip to content

fix(webfetch): flag non-2xx responses with a status banner (keep the body)#69

Merged
ysyneu merged 2 commits into
feat/ai-srefrom
audit-fix/audit-2026-06-15-webfetch-non2xx-error
Jun 15, 2026
Merged

fix(webfetch): flag non-2xx responses with a status banner (keep the body)#69
ysyneu merged 2 commits into
feat/ai-srefrom
audit-fix/audit-2026-06-15-webfetch-non2xx-error

Conversation

@ysyneu

@ysyneu ysyneu commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

What

On a non-2xx final HTTP response, webfetch now keeps the body and prepends a status banner to the result content:

[HTTP 404 Not Found] The server returned an error status; the response body
below may be an API error message or an error/login page.

{"error":"not found","code":1404}

The agent sees BOTH the error status AND the body, instead of either being misled into reading an error/login page as legitimate content (old behavior) or losing the body entirely (the first revision of this PR).

Why lossless (revision)

The first version of this PR dropped the body on non-2xx and returned an error. That loses valuable content: APIs commonly return non-2xx with a useful JSON error body (400/422/404 with {"error":...}), and an agent debugging an API call needs that message. Note the original 123K login page from the audit was HTTP 200, so it never hit the non-2xx path anyway — on the non-2xx path the body is almost always worth keeping. So we keep it and just flag the status.

Root cause

From prod audit audit-2026-06-15 (session sess_iBhNqeVErwQ4vYXKsVKxLw, steps 54/56/57): the runner's fetchBody (environment/webfetch.go) read/converted the response body with no indication of the HTTP status. A 401/403 auth wall, a redirect-to-login, or any 4xx/5xx error page was returned to the model as if it were real page content, with nothing signaling that the fetch had failed at the HTTP layer.

fc-safari's own localWebFetch rejects non-2xx outright, but the runner path — used by AI-SRE cloud sandboxes — surfaced nothing. This PR makes the runner status-aware without throwing away the (often useful) body.

Implementation

  • environment/webfetch.go
    • fetchedPage gains a statusCode int field.
    • fetchBody always reads the body (no early return on non-2xx) and records resp.StatusCode.
    • WebFetch prepends the [HTTP <code> <text>] banner to result.Content when the status is non-2xx, after processor.Process so the banner stays visible even when the body spilled to a file.
  • environment/webfetch_test.go
    • TestFetchBody_PreservesBodyAndStatusOnNon2xx (renamed from the earlier reject test): a 403/404/500 response is preserved — no error, body returned, page.statusCode equals the served status.
    • (A WebFetch-level banner test was prototyped but not kept: WebFetch runs a pre-flight validateURL SSRF check with no test bypass hook, so it refuses the loopback httptest server; adding a hook would be net-new scaffolding. The fetchBody test verifies statusCode is captured correctly, and the banner is a trivial deterministic string prepend over it.)

Scope honesty

The banner fires on true 4xx/5xx. It does NOT specially handle GitHub private-repo pages that return HTTP 200 with a login body — those look successful at the HTTP layer and would need fragile content heuristics, which we deliberately do not add. That 200-login case is handled separately by a prompt-side change in fc-safari (P2-1).

Verification

$ env -u GOROOT go build ./...
(exit 0)

$ env -u GOROOT go test ./environment/...
ok  	github.com/flashcatcloud/flashduty-runner/environment	1.409s

$ env -u GOROOT go test ./environment/ -run TestFetchBody -v
--- PASS: TestFetchBody_ReadsSlowStreamedBody (0.47s)
--- PASS: TestFetchBody_PreservesBodyAndStatusOnNon2xx (0.00s)
PASS

$ env -u GOROOT go vet ./environment/...            # clean
$ gofmt -l environment/webfetch.go environment/webfetch_test.go     # clean
$ gofumpt -l environment/webfetch.go environment/webfetch_test.go   # clean

ysyneu added 2 commits June 15, 2026 11:45
The runner's webfetch read and converted the response body regardless of
status code, so a 401/403 auth wall, a redirect-to-login, or any 4xx/5xx
error page was turned into markdown and returned/spilled as if it were real
page content. fc-safari's localWebFetch already rejects non-2xx; the runner
path (used by ai-sre cloud sandboxes) did not.

Add a status-code guard in fetchBody right after Do() succeeds and before the
body is read/converted. The client already follows redirects via
safeCheckRedirect (max 5 hops), so the checked response is terminal.
Dropping the entire body on non-2xx loses valuable content: APIs commonly
return non-2xx with a useful JSON error body (400/422/404 with {"error":...}).
The 123K login page from the audit was HTTP 200, so it never reached this
path anyway — on the non-2xx path the body is almost always worth keeping.

Make it lossless instead of rejecting:
- fetchedPage gains a statusCode field; fetchBody always reads the body and
  records resp.StatusCode (no early return on non-2xx).
- WebFetch prepends an [HTTP <code> <text>] banner to the processed content
  when the status is non-2xx, so the agent sees BOTH the error status AND the
  body. The banner is added after Process so it stays visible even when the
  body spilled to a file.

Test now asserts the opposite of before: a 403/404/500 response is preserved
(no error, body returned, statusCode matches).
@ysyneu ysyneu changed the title fix(webfetch): error on non-2xx responses instead of returning the body fix(webfetch): flag non-2xx responses with a status banner (keep the body) Jun 15, 2026
@ysyneu ysyneu merged commit 4f2f551 into feat/ai-sre Jun 15, 2026
5 checks passed
@ysyneu ysyneu deleted the audit-fix/audit-2026-06-15-webfetch-non2xx-error branch June 15, 2026 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant