perf: Replace better_html dependency with herb#462
Closed
dduugg wants to merge 1 commit into
Closed
Conversation
better_html has been deprecated in favor of herb (https://herb-tools.dev), the modern HTML+ERB toolchain it points users toward. herb has a native C parse step and exposes Herb.extract_ruby, which returns the Ruby code from an ERB template with whitespace padding so character positions match the original file. That lets us delete the AST-walk-and-concat-code-nodes path in Parsers::Erb (and the comment that explicitly disclaimed correct source locations); the new implementation hands the extracted string to the existing Ruby parser. The Parser::SyntaxError rescue moves away too because Parsers::Ruby already wraps that case. Other changes: - Drop the parser_class: keyword from Parsers::Erb.new (it injected a BetterHtml::Parser-shaped object; no Herb analogue, and the documented custom-ERB-parser hook in USAGE.md is subclassing + overriding parse_buffer). - Drop the `require "rails/railtie"` workaround in two parser tests; herb has no Rails dependency. - Add a small Sorbet shim for Herb.extract_ruby (only API we use); tapioca can't auto-generate herb's RBI in this env because of an unrelated sorbet-runtime/`sig` ordering issue inside herb.
Author
|
Superseded by #447 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What are you trying to accomplish?
better_htmlwas deprecated by Shopify in favor of Herb — its README says so explicitly and points users at the modern ecosystem. Packwerk has been carryingbetter_htmlpurely to parse.erbfiles (Parsers::Erb). This PR swaps that dependency forherb.What approach did you choose and why?
Herb exposes
Herb.extract_ruby(source), which returns just the Ruby code from an ERB template. That replaces, in one C-extension call, the entire pure-Ruby pipeline the existingParsers::Erbused: parse to aBetterHtml::ParserAST → walk the tree → pluck:codenodes → skip ERB-comment subtrees → join code strings with\n→ re-parse as Ruby.The new
Parsers::Erbis ~25 lines:Other notable changes
parser_class:keyword arg fromParsers::Erb.new. It was aBetterHtml::Parserinjection seam with no Herb analogue. The documented hook for users with custom ERB-parsing needs is subclassingPackwerk::Parsers::Erband overriding#parse_buffer(perUSAGE.md), and that path is unaffected.require "rails/railtie"workaround in two parser tests. The# TODO: make better_html not require Railscomment can go: Herb has no Rails dependency.BetterHtml::Parser#ast. The syntax-error case now uses the realinvalid.erbfixture (more honest); theEncodingErrorcase stubsHerb.extract_rubydirectly.sorbet/rbi/shims/herb.rbi, an 8-line shim covering only the single API we use (Herb.extract_ruby).bin/tapioca gem herbcan't auto-generate herb's RBI in this configuration: tapioca'srbs/rewriter.rbinstalls aRequireHooks.source_transformthat rewrites#:RBS comments intosig do … endblocks at load time (viaSpoom::Sorbet::Translate.rbs_comments_to_sorbet_sigs). Herb uses overloaded#:annotations — two#:lines on a single method, e.g.Herb::DiffResult#eachfor the with-block / without-block split. Spoom emits two consecutivesigcalls, andsorbet-runtime's_declare_sig_internalrejects that with"You called sig twice without declaring a method in between", aborting gem load. Filed upstream as Shopify/spoom#913 with a standalone reproducer; hand-rolling a shim is the workaround until that's resolved.extract_rubywhitespace-pads its output by default, so the Ruby parser's line/column on each node corresponds to the real.erbsource. Livebin/packwerk checkterminal output for ERB violations now has accurate line numbers.package_todo.ymldoesn't store locations so the stored todos are unaffected, and the existing code's comment showed the maintainers had already decided this wasn't worth fixing on its own — but since we get it as a side effect, we can also delete that comment.Performance
I benchmarked the new parser against the old one on (a) a synthetic micro-benchmark and (b) a real Rails monolith's ERB corpus (1,582 ERB files / ~2.9 MB pulled from a private codebase of ~46,000 Ruby+ERB files, applying the same exclude rules as that codebase's
packwerk.yml).Synthetic, same content scaled up:
better_htmlherbReal ERB corpus, 1,582 files:
better_htmlherbAllocations (medium input, GC paused, 500 iterations):
better_htmlherbPer-file gains in real workloads are more modest than the synthetic numbers (mostly because most
.erbfiles in practice are small partials), but the allocation drop is significant either way — less GC pressure across parallel workers during abin/packwerk check.What should reviewers focus on?
parser_class:keyword fromParsers::Erb.newacceptable? It's not markedprivate_constant, so it's technically reachable, but it was very tightly coupled toBetterHtml::Parser's shape. Subclassing-and-overridingparse_bufferremains as the supported customization path.sorbet/rbi/shims/herb.rbiand switch tobin/tapioca gem herb.USAGE.md(which currently inherits fromPackwerk::Parsers::Erband callssuperwith a buffer) deserves a refresh in this PR or a follow-up — the existing example still works.Type of Change
Additional Release Notes
Dependency replaced:
better_html→herb. Anyone who was instantiatingPackwerk::Parsers::Erbwith theparser_class:keyword (an undocumented injection seam) will need to migrate; the supportedparse_buffer-override pattern documented inUSAGE.mdcontinues to work unchanged.Checklist
better_htmldirectly; the gem swap is transparent for the documented public APIs.)