Skip to content

fix(parsers/go): classify entry points by signature, not return type or bare next(#129

Open
gadievron wants to merge 1 commit into
masterfrom
fix/go-classifier-heuristics
Open

fix(parsers/go): classify entry points by signature, not return type or bare next(#129
gadievron wants to merge 1 commit into
masterfrom
fix/go-classifier-heuristics

Conversation

@gadievron

Copy link
Copy Markdown
Collaborator

fix(parsers/go): classify entry points by signature, not return type or bare next(

Base: master · Type: bug fix · Findings: F10 + F7 (MED)

What

parsers/go/go_parser/extractor.go:

  • F10isHTTPHandler no longer matches httpHandlerPatterns against the
    return type (dropped || pattern.MatchString(returnsStr)); it matches
    parameters only.
  • F7isMiddleware no longer treats a bare next( substring as middleware;
    the next( heuristic is now gated on the function having an HTTP request
    signature (hasHTTPSignature && strings.Contains(code, "next(")). The
    http.Handler/http.HandlerFunc return and next.ServeHTTP signals are kept.

Why

Both heuristics over-matched, seeding false remote-web entry points:

  • A factory func Logger() gin.HandlerFunc has no request param but matched the
    bare HandlerFunc pattern against its return type → tagged http_handler. A
    handler is defined by what it receives, not what it returns.
  • Any function whose body contains next( — Go 1.23 range-over-func iterators
    (iter.Pullnext()), lexers, cursors — was tagged middleware.

Tests

parsers/go/go_parser/extractor_classify_test.go (new): factory returning
http.HandlerFunc is not http_handler; iterator using next() is not
middleware; a genuine (w, r) handler and genuine next.ServeHTTP middleware
are still detected.

  • RED (pre-fix): both negative tests failed.
  • GREEN (post-fix): 4 passed; go test ./... passes.

Reachability impact (verified empirically, base vs patched binary)

Function-id sets identical (no extraction loss). Reclassified units sampled and
confirmed all fake:

  • samber/lo: middleware 1 → 0 (CutPrefix, a pull-iterator).
  • gin-gonic/gin: http_handler 108 → 89; all 19 removed are factories
    returning HandlerFunc (Logger, Recovery, BasicAuth, …) with no request
    param. All 89 retained handlers carry a genuine request/context param.

Zero genuine handlers or middleware lost.

Scope / siblings (deferred, documented)

Same defect class but out of scope here (tightening risks false negatives,
needs its own corpus): isHTTPHandler body heuristics w.Write( / Handler =
(:289-290) over-match any local w with .Write(; isCLIHandler os.Args
body match (:318). Tracked for a follow-up.

Upstream coordination

Only PR #126 also edits extractor.go (different functions: typeToString
generics + duplicateIDWarning). No hunk overlap with isHTTPHandler/
isMiddleware; rebase trivially if #126 lands first. Binary go_parser not
included — reviewers/CI rebuild from source.

Author notes

  • Fix lines: isHTTPHandler param-only loop; isMiddleware hasHTTPSignature
    gate on next(.
  • Input it now handles: middleware factories and next()-using iterators that
    previously became false entry points.
  • Likely pushback: "could a real handler be return-only?" — net/http and
    gin/echo/fiber handlers always take the request/context as a param; a return-only
    match is a factory. Body heuristics still catch param-less real handlers.

…or bare next(

Two over-broad heuristics in classifyUnitType seeded false remote-web entry
points: isHTTPHandler matched HTTP type patterns (incl. HandlerFunc) against the
RETURN type, tagging factories like func Logger() gin.HandlerFunc as http_handler
(F10); isMiddleware tagged any body containing the substring next( as middleware,
catching Go 1.23 iterators / lexers / cursors (F7). Match handler patterns only
against parameters, and gate the next( heuristic on an HTTP request signature.
Genuine handlers and middleware remain classified (covered by new tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gadievron

Copy link
Copy Markdown
Collaborator Author

Findings F7 + F10 (MED). Coordinates with #126 (pr/parser-generic-anon-class-keys), which edits the same file extractor.go but different functions (typeToString/duplicateIDWarning) — no hunk overlap; rebase trivially if #126 lands first. Reachability-verified: removed seeds are all fake (gin HandlerFunc factories, lo CutPrefix iterator); zero genuine handlers/middleware lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant