Skip to content

feat: ReggieOption/@RegexPattern fallback substrate + PIKEVM routing groundwork#81

Merged
jbachorik merged 23 commits into
mainfrom
pr/1-reggieoption-fallback-substrate
Jun 18, 2026
Merged

feat: ReggieOption/@RegexPattern fallback substrate + PIKEVM routing groundwork#81
jbachorik merged 23 commits into
mainfrom
pr/1-reggieoption-fallback-substrate

Conversation

@jbachorik

@jbachorik jbachorik commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Introduces the ReggieOption flag substrate (moved into reggie-annotations), replaces CapturePolicy with EnumSet<ReggieOption> in ReggieOptions, and lands a throw-by-default fallback policy: a pattern that cannot be served by a native strategy now throws UnsupportedPatternException unless the caller opts in via allowJdkFallback(). Adds the @RegexPattern compile-time delegating-stub processor (classifies methods native / delegate-pikevm / delegate-fallback) and a compilePikeVm staging entrypoint + name-map codec. The differential fuzz oracle compiles with ALLOW_JDK_FALLBACK so coverage is preserved.

Motivation

Turn previously-silent JDK fallback into an explicit, auditable policy — a prerequisite for honestly claiming native coverage and ReDoS-safety.

Related Issue(s)

Part of the 2026-06 capture-correctness & performance effort. First of a 5-PR stack.

De-risks (does not close) #28, #33, #34, #37: the throw-by-default policy turns the silent-wrong-answers those issues report into an explicit UnsupportedPatternException (or correct only via opt-in allowJdkFallback()), instead of a silent wrong result.

Change Type

  • Bug fix
  • New feature
  • Performance improvement
  • Refactoring (no functional change)
  • Documentation
  • Test improvement
  • Build/CI change

Checklist

  • I have read the CONTRIBUTING.md guidelines
  • All existing tests pass (./gradlew build)
  • I have added tests for my changes
  • I have updated documentation (if applicable)
  • My commits are signed

Performance Impact

None (policy/substrate change).

Additional Notes

Base of a stacked train (PR1→PR5). Contains worktree-agent merge commits (kept as history). Cuts at 5cb7555.

🤖 Generated with Claude Code

jbachorik and others added 23 commits June 11, 2026 17:28
Eliminates B10/B15 FallbackPatternDetector predicates and partially
eliminates B16 by routing the affected DFA_*_WITH_GROUPS patterns to
PIKEVM_CAPTURE before the DFA state-count ladder:

- B10: optional prefix before capturing group (e.g. -?(-?.{3}).)
- B15: capturing group in quantified alternation (e.g. (a|b){2,})
- B16 (partial): nullable outer quantifier on capturing group with
  non-nullable content (e.g. (a)?); patterns where both the outer
  quantifier and group content are nullable (e.g. (0*-?){0,}) still
  fall back to JDK via the new hasNullableGroupContentWithNullableQuantifier
  predicate.

Both the capture-ambiguous TDFA path and the non-ambiguous DFA-with-groups
path now have the three gates before the DFA strategy ladder. Fuzz gate:
findings=0 (9530 patterns, 76240 inputs).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add PIKEVM gate inside the capturing TDFA isAnchorConditionDiluted()
block: patterns where both branches share a leading character but one
branch carries a start-anchor guard (e.g. ^x|x(y)) now route to
PIKEVM_CAPTURE instead of the JDK fallback. PikeVM evaluates ^/\A
correctly against the search-region origin since commit 0acfc66.

Patterns with optional quantifiers, nullable branches, or leading
end-anchors still fall through to the anchorConditionDiluted JDK path.
Fuzz gate confirms zero divergences with the new routing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- NFABytecodeGenerator: add zero-length early-accept before bounds/regionMatches
  in generateBackreferenceCheck; groupLen==0 trivially succeeds (vacuous match)
- FallbackPatternDetector: replace broad hasNullableBackrefGroup B7 guard with
  narrowed hasAmbiguouslyNullableBackrefGroup that only falls back when the group
  body can capture strings of length > 1 (unbounded contamination risk); groups
  with max capture length <= 1 (e.g. a?, [x]?) are safe with the early-accept
- BackrefEngineGapsTest: enable b7_nullableBackrefGroupInOptimizedNfa

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- RuntimeCompiler: replace CapturePolicy import with ReggieOption/UnsupportedPatternException
- Add cacheKeyFor() helper (flag-aware cache key) and fallbackOrThrow() helper
- Gate all 6 JavaRegexFallbackMatcher construction sites behind ALLOW_JDK_FALLBACK flag
- compileHybrid() receives ReggieOptions to propagate fallback policy
- UnsupportedPatternException propagates through catch(Exception) via explicit re-throw
- 34 test files updated: add allowJdkFallback() for patterns requiring JDK fallback
- New FallbackPolicyTest: throwsByDefault, delegatesWhenFallbackEnabled, nativePatternUnaffected

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…otations

ReggieOption moved from reggie-runtime to reggie-annotations so the
annotation type can reference it without a circular dependency.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik jbachorik added the AI Generated or assisted by AI label Jun 18, 2026
@codecov-commenter

Copy link
Copy Markdown

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

jbachorik added a commit that referenced this pull request Jun 18, 2026
…esign docs

Adds the 2026-06-11..18 achievement report (capture-correctness + O(n) drain
work), the PR decomposition plan for that work (PRs #81-#85), the Reggie 1.0.0
public-release readiness assessment, and the design/plan docs from the
design->adversarial-review->plan loops.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jbachorik jbachorik marked this pull request as ready for review June 18, 2026 20:09
@jbachorik jbachorik merged commit ee13466 into main Jun 18, 2026
9 checks passed
@jbachorik jbachorik deleted the pr/1-reggieoption-fallback-substrate branch June 18, 2026 20:09

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5cb755568e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if (sb == null) {
sb = new StringBuilder(pattern);
}
sb.append(' ').append(o.name());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use collision-free option cache keys

When any option is enabled, this builds the L1 cache key by appending a literal space plus the enum name to the raw regex. Since regex patterns can contain that same suffix, compiling "abc ALLOW_JDK_FALLBACK" with default options and then "abc" with allowJdkFallback() maps both calls to the same key; compile() then returns the first cached matcher without verifying the pattern, so the caller can silently match against the wrong regex. Use an unambiguous separator/structured key like the previous NUL-delimited option encoding.

Useful? React with 👍 / 👎.

Comment on lines +189 to +193
INVOKESTATIC,
"com/datadoghq/reggie/runtime/RuntimeCompiler",
"compilePikeVm",
"(Ljava/lang/String;Ljava/lang/String;)Lcom/datadoghq/reggie/runtime/ReggieMatcher;",
false);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not cache PikeVM matchers in generated stubs

For @RegexPattern methods resolved as PIKEVM, this invocation constructs a PikeVMMatcher and the surrounding lazy-init code stores it in the volatile method field. PikeVMMatcher owns mutable per-call thread-list/capture buffers, and the runtime compiler deliberately returns a fresh PikeVM matcher on each compile() call to avoid sharing that state; generated providers will instead share one instance across callers, so concurrent matches through the same provider can race and corrupt results. The stub should create a fresh matcher per call or wrap it with thread-safe isolation.

Useful? React with 👍 / 👎.

Comment on lines 239 to +242
if ((strategy == PatternAnalyzer.MatchingStrategy.DFA_UNROLLED_WITH_GROUPS
|| strategy == PatternAnalyzer.MatchingStrategy.DFA_SWITCH_WITH_GROUPS)
&& hasNullableOuterQuantifierOnCapturingGroup(ast)) {
return "capturing group with nullable outer quantifier: "
+ "TDFA POSIX last-match span incorrect for zero-width last iteration";
|| strategy == PatternAnalyzer.MatchingStrategy.DFA_SWITCH_WITH_GROUPS
|| strategy == PatternAnalyzer.MatchingStrategy.PIKEVM_CAPTURE)
&& hasNullableGroupContentWithNullableQuantifier(ast)) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Check PIKEVM fallback cases before returning native

Adding PIKEVM_CAPTURE to this fallback predicate does not actually protect those patterns: RuntimeCompiler returns immediately for result.strategy == PIKEVM_CAPTURE before invoking FallbackPatternDetector.needsFallback, and the annotation resolver similarly returns DELEGATE_PIKEVM before checking this predicate. A nullable-content/nullable-outer pattern that routes to PIKEVM, such as a named variant like (?<x>0*-?){0,}, will therefore still use the engine this comment says diverges instead of throwing or delegating to JDK fallback.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated or assisted by AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants