feat: ReggieOption/@RegexPattern fallback substrate + PIKEVM routing groundwork#81
Conversation
Eliminates B10/B15 FallbackPatternDetector predicates and partially
eliminates B16 by routing the affected DFA_*_WITH_GROUPS patterns to
PIKEVM_CAPTURE before the DFA state-count ladder:
- B10: optional prefix before capturing group (e.g. -?(-?.{3}).)
- B15: capturing group in quantified alternation (e.g. (a|b){2,})
- B16 (partial): nullable outer quantifier on capturing group with
non-nullable content (e.g. (a)?); patterns where both the outer
quantifier and group content are nullable (e.g. (0*-?){0,}) still
fall back to JDK via the new hasNullableGroupContentWithNullableQuantifier
predicate.
Both the capture-ambiguous TDFA path and the non-ambiguous DFA-with-groups
path now have the three gates before the DFA strategy ladder. Fuzz gate:
findings=0 (9530 patterns, 76240 inputs).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add PIKEVM gate inside the capturing TDFA isAnchorConditionDiluted() block: patterns where both branches share a leading character but one branch carries a start-anchor guard (e.g. ^x|x(y)) now route to PIKEVM_CAPTURE instead of the JDK fallback. PikeVM evaluates ^/\A correctly against the search-region origin since commit 0acfc66. Patterns with optional quantifiers, nullable branches, or leading end-anchors still fall through to the anchorConditionDiluted JDK path. Fuzz gate confirms zero divergences with the new routing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- NFABytecodeGenerator: add zero-length early-accept before bounds/regionMatches in generateBackreferenceCheck; groupLen==0 trivially succeeds (vacuous match) - FallbackPatternDetector: replace broad hasNullableBackrefGroup B7 guard with narrowed hasAmbiguouslyNullableBackrefGroup that only falls back when the group body can capture strings of length > 1 (unbounded contamination risk); groups with max capture length <= 1 (e.g. a?, [x]?) are safe with the early-accept - BackrefEngineGapsTest: enable b7_nullableBackrefGroupInOptimizedNfa Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- RuntimeCompiler: replace CapturePolicy import with ReggieOption/UnsupportedPatternException - Add cacheKeyFor() helper (flag-aware cache key) and fallbackOrThrow() helper - Gate all 6 JavaRegexFallbackMatcher construction sites behind ALLOW_JDK_FALLBACK flag - compileHybrid() receives ReggieOptions to propagate fallback policy - UnsupportedPatternException propagates through catch(Exception) via explicit re-throw - 34 test files updated: add allowJdkFallback() for patterns requiring JDK fallback - New FallbackPolicyTest: throwsByDefault, delegatesWhenFallbackEnabled, nativePatternUnaffected Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…otations ReggieOption moved from reggie-runtime to reggie-annotations so the annotation type can reference it without a circular dependency.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
…esign docs Adds the 2026-06-11..18 achievement report (capture-correctness + O(n) drain work), the PR decomposition plan for that work (PRs #81-#85), the Reggie 1.0.0 public-release readiness assessment, and the design/plan docs from the design->adversarial-review->plan loops. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5cb755568e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (sb == null) { | ||
| sb = new StringBuilder(pattern); | ||
| } | ||
| sb.append(' ').append(o.name()); |
There was a problem hiding this comment.
Use collision-free option cache keys
When any option is enabled, this builds the L1 cache key by appending a literal space plus the enum name to the raw regex. Since regex patterns can contain that same suffix, compiling "abc ALLOW_JDK_FALLBACK" with default options and then "abc" with allowJdkFallback() maps both calls to the same key; compile() then returns the first cached matcher without verifying the pattern, so the caller can silently match against the wrong regex. Use an unambiguous separator/structured key like the previous NUL-delimited option encoding.
Useful? React with 👍 / 👎.
| INVOKESTATIC, | ||
| "com/datadoghq/reggie/runtime/RuntimeCompiler", | ||
| "compilePikeVm", | ||
| "(Ljava/lang/String;Ljava/lang/String;)Lcom/datadoghq/reggie/runtime/ReggieMatcher;", | ||
| false); |
There was a problem hiding this comment.
Do not cache PikeVM matchers in generated stubs
For @RegexPattern methods resolved as PIKEVM, this invocation constructs a PikeVMMatcher and the surrounding lazy-init code stores it in the volatile method field. PikeVMMatcher owns mutable per-call thread-list/capture buffers, and the runtime compiler deliberately returns a fresh PikeVM matcher on each compile() call to avoid sharing that state; generated providers will instead share one instance across callers, so concurrent matches through the same provider can race and corrupt results. The stub should create a fresh matcher per call or wrap it with thread-safe isolation.
Useful? React with 👍 / 👎.
| if ((strategy == PatternAnalyzer.MatchingStrategy.DFA_UNROLLED_WITH_GROUPS | ||
| || strategy == PatternAnalyzer.MatchingStrategy.DFA_SWITCH_WITH_GROUPS) | ||
| && hasNullableOuterQuantifierOnCapturingGroup(ast)) { | ||
| return "capturing group with nullable outer quantifier: " | ||
| + "TDFA POSIX last-match span incorrect for zero-width last iteration"; | ||
| || strategy == PatternAnalyzer.MatchingStrategy.DFA_SWITCH_WITH_GROUPS | ||
| || strategy == PatternAnalyzer.MatchingStrategy.PIKEVM_CAPTURE) | ||
| && hasNullableGroupContentWithNullableQuantifier(ast)) { |
There was a problem hiding this comment.
Check PIKEVM fallback cases before returning native
Adding PIKEVM_CAPTURE to this fallback predicate does not actually protect those patterns: RuntimeCompiler returns immediately for result.strategy == PIKEVM_CAPTURE before invoking FallbackPatternDetector.needsFallback, and the annotation resolver similarly returns DELEGATE_PIKEVM before checking this predicate. A nullable-content/nullable-outer pattern that routes to PIKEVM, such as a named variant like (?<x>0*-?){0,}, will therefore still use the engine this comment says diverges instead of throwing or delegating to JDK fallback.
Useful? React with 👍 / 👎.
What does this PR do?
Introduces the
ReggieOptionflag substrate (moved intoreggie-annotations), replacesCapturePolicywithEnumSet<ReggieOption>inReggieOptions, and lands a throw-by-default fallback policy: a pattern that cannot be served by a native strategy now throwsUnsupportedPatternExceptionunless the caller opts in viaallowJdkFallback(). Adds the@RegexPatterncompile-time delegating-stub processor (classifies methods native / delegate-pikevm / delegate-fallback) and acompilePikeVmstaging entrypoint + name-map codec. The differential fuzz oracle compiles withALLOW_JDK_FALLBACKso coverage is preserved.Motivation
Turn previously-silent JDK fallback into an explicit, auditable policy — a prerequisite for honestly claiming native coverage and ReDoS-safety.
Related Issue(s)
Part of the 2026-06 capture-correctness & performance effort. First of a 5-PR stack.
De-risks (does not close) #28, #33, #34, #37: the throw-by-default policy turns the silent-wrong-answers those issues report into an explicit
UnsupportedPatternException(or correct only via opt-inallowJdkFallback()), instead of a silent wrong result.Change Type
Checklist
./gradlew build)Performance Impact
None (policy/substrate change).
Additional Notes
Base of a stacked train (PR1→PR5). Contains worktree-agent merge commits (kept as history). Cuts at
5cb7555.🤖 Generated with Claude Code