Skip RBS rewrite when no markers are present#2616
Conversation
c06244a to
3ad4466
Compare
3ad4466 to
0422f6d
Compare
| def possible_rbs_runtime_rewrite_syntax?(source) | ||
| return true if source.include?("#:") || source.include?("#|") | ||
| return false unless source.include?("# @") | ||
|
|
||
| RBS_ANNOTATION_MARKERS.any? { |marker| source.include?(marker) } | ||
| end |
There was a problem hiding this comment.
This seems to be scanning the source string multiple times, can we turn the search into a Regexp.union and do a single scan through the source?
There was a problem hiding this comment.
I think something like:
SEARCH_REGEXP = Regexp.union(TYPED_FILE_PATTERN, Regexp.escape("#:"), Regexp.escape("#|"), *RBS_ANNOTATION_MARKERS.map { Regexp.escape(it) })
def should_rewrite?
source.index(SEARCH_REGEXP)
endshould give you the decision to rewrite or not with a single pass.
There was a problem hiding this comment.
Good call on the redundant scans. I just pushed an amend.
However, I kept typed_file? as a separate stage rather than folding into the union. A single union would introduce OR semantics, so a typed file with no markers would match and I think we'd want to avoid the expensive Spoom translator execution in this case.
There was a problem hiding this comment.
Ah, right, I forgot that the typed check wasn't being OR'ed. Good call.
0422f6d to
3b07e94
Compare
3b07e94 to
d0dd91c
Compare
paracycle
left a comment
There was a problem hiding this comment.
Looks good to me, but have one question
| "# @overridable", | ||
| "# @without_runtime", | ||
| ].freeze #: Array[String] | ||
| RBS_REWRITE_PATTERN = Regexp.union(["#:", "#|"] + RBS_ANNOTATION_MARKERS).freeze #: Regexp |
There was a problem hiding this comment.
Are you sure that none of these have special Regexp characters, or does Regexp.union do the right thing? I am surprised that we don't need Regexp.escape here.
There was a problem hiding this comment.
Just sanity checked this. Looks like Regexp.union runs each String arg through Regexp.escape before joining.
Looking at the Ruby implementation, it looks like Regexp.escape and Regexp.quote share the same C function.
- https://github.com/ruby/ruby/blob/master/re.c#L4874-L4875
quoteandescapemap torb_reg_s_quote
- https://github.com/ruby/ruby/blob/master/re.c#L4385
- union calls
rb_reg_s_quotefor string case
- union calls
Check to show union's source against manually escaped and joined version:
markers = ["#:", "#|", "# @abstract", "# @interface", "# @sealed", "# @final", "# @requires_ancestor:", "# @override", "# @overridable", "# @without_runtime"]
pattern = Regexp.union(markers)
pattern.source == markers.map { |m| Regexp.escape(m) }.join("|")
# => true
pattern.source
# => "\#:|\#\||\#\ @abstract|\#\ @interface|\#\ @sealed|\#\ @final|\#\ @requires_ancestor:|\#\ @override|\#\ @overridable|\#\ @without_runtime"Documentation doesn't explicitly state this behavior, but the example seems to highlight this escaping behavior as well: https://docs.ruby-lang.org/en/3.4/Regexp.html#method-c-union
Summary
Avoid running Spoom's RBS-to-Sorbet-sig translator for typed Ruby files that cannot contain runtime-rewriteable RBS syntax.
Tapioca currently calls
Spoom::Sorbet::Translate.rbs_comments_to_sorbet_sigsfor every typed Ruby file loaded through the RBS rewriter. A large portion of typed files in Core do not contain RBS signatures or supported RBS annotations, so the translator often parses source only to return it unchanged.This keeps the existing
typed:gate and adds a conservative marker gate. Translation now runs only when source contains one of:#:#|# @abstract# @interface# @sealed# @final# @requires_ancestor:# @override# @overridable# @without_runtimeFalse positives are safe because they still use the existing translator. False negatives would be unsafe, so the annotation list is intentionally aligned with Spoom's currently supported runtime-rewrite annotations.