perf: Replace better_html dependency with herb by dduugg · Pull Request #462 · Shopify/packwerk

dduugg · 2026-05-11T21:12:42Z

What are you trying to accomplish?

better_html was deprecated by Shopify in favor of Herb — its README says so explicitly and points users at the modern ecosystem. Packwerk has been carrying better_html purely to parse .erb files (Parsers::Erb). This PR swaps that dependency for herb.

What approach did you choose and why?

Herb exposes Herb.extract_ruby(source), which returns just the Ruby code from an ERB template. That replaces, in one C-extension call, the entire pure-Ruby pipeline the existing Parsers::Erb used: parse to a BetterHtml::Parser AST → walk the tree → pluck :code nodes → skip ERB-comment subtrees → join code strings with \n → re-parse as Ruby.

The new Parsers::Erb is ~25 lines:

def parse_buffer(buffer, file_path:)
  ruby_source = Herb.extract_ruby(buffer.source)
  @ruby_parser.call(io: StringIO.new(ruby_source), file_path: file_path)
rescue EncodingError => e
  result = ParseResult.new(file: file_path, message: e.message)
  raise Parsers::ParseError, result
end

Other notable changes

Dropped the parser_class: keyword arg from Parsers::Erb.new. It was a BetterHtml::Parser injection seam with no Herb analogue. The documented hook for users with custom ERB-parsing needs is subclassing Packwerk::Parsers::Erb and overriding #parse_buffer (per USAGE.md), and that path is unaffected.
Removed the require "rails/railtie" workaround in two parser tests. The # TODO: make better_html not require Rails comment can go: Herb has no Rails dependency.
Reworked the two error-injection tests that previously stubbed BetterHtml::Parser#ast. The syntax-error case now uses the real invalid.erb fixture (more honest); the EncodingError case stubs Herb.extract_ruby directly.
Added sorbet/rbi/shims/herb.rbi, an 8-line shim covering only the single API we use (Herb.extract_ruby). bin/tapioca gem herb can't auto-generate herb's RBI in this configuration: tapioca's rbs/rewriter.rb installs a RequireHooks.source_transform that rewrites #: RBS comments into sig do … end blocks at load time (via Spoom::Sorbet::Translate.rbs_comments_to_sorbet_sigs). Herb uses overloaded #: annotations — two #: lines on a single method, e.g. Herb::DiffResult#each for the with-block / without-block split. Spoom emits two consecutive sig calls, and sorbet-runtime's _declare_sig_internal rejects that with "You called sig twice without declaring a method in between", aborting gem load. Filed upstream as Shopify/spoom#913 with a standalone reproducer; hand-rolling a shim is the workaround until that's resolved.
(Minor, falls out for free) Herb's extract_ruby whitespace-pads its output by default, so the Ruby parser's line/column on each node corresponds to the real .erb source. Live bin/packwerk check terminal output for ERB violations now has accurate line numbers. package_todo.yml doesn't store locations so the stored todos are unaffected, and the existing code's comment showed the maintainers had already decided this wasn't worth fixing on its own — but since we get it as a side effect, we can also delete that comment.

Performance

I benchmarked the new parser against the old one on (a) a synthetic micro-benchmark and (b) a real Rails monolith's ERB corpus (1,582 ERB files / ~2.9 MB pulled from a private codebase of ~46,000 Ruby+ERB files, applying the same exclude rules as that codebase's packwerk.yml).

Synthetic, same content scaled up:

Input	Iters	`better_html`	`herb`	Reduction
~1 KB	5,000	3.27 s	0.82 s	75%
~47 KB	500	16.16 s	3.37 s	79%
~466 KB	50	27.48 s	3.07 s	89%

Real ERB corpus, 1,582 files:

Mode	`better_html`	`herb`	Reduction
Serial parse, all files	1.68 s	0.89 s	47%
Parallel parse (8 cores)	0.23 s	0.18 s	22%

Allocations (medium input, GC paused, 500 iterations):

	`better_html`	`herb`	Reduction
Total objects	162,755	29,664	82%
Strings	24,184	3,278	86%
Arrays	53,714	7,157	87%

Per-file gains in real workloads are more modest than the synthetic numbers (mostly because most .erb files in practice are small partials), but the allocation drop is significant either way — less GC pressure across parallel workers during a bin/packwerk check.

What should reviewers focus on?

Is dropping the parser_class: keyword from Parsers::Erb.new acceptable? It's not marked private_constant, so it's technically reachable, but it was very tightly coupled to BetterHtml::Parser's shape. Subclassing-and-overriding parse_buffer remains as the supported customization path.
The Sorbet shim: a tapioca-generated RBI would be preferable. Tracked upstream at Shopify/spoom#913 — once that's resolved we can delete sorbet/rbi/shims/herb.rbi and switch to bin/tapioca gem herb.
Whether the README's "Custom ERB parser" example in USAGE.md (which currently inherits from Packwerk::Parsers::Erb and calls super with a buffer) deserves a refresh in this PR or a follow-up — the existing example still works.

Type of Change

Bugfix
New feature
Non-breaking change (a change that doesn't alter functionality - i.e., code refactor, configs, etc.)

Additional Release Notes

Breaking change (fix or feature that would cause existing functionality to change)

Dependency replaced: better_html → herb. Anyone who was instantiating Packwerk::Parsers::Erb with the parser_class: keyword (an undocumented injection seam) will need to migrate; the supported parse_buffer-override pattern documented in USAGE.md continues to work unchanged.

Checklist

I have updated the documentation accordingly. (No user-facing docs reference better_html directly; the gem swap is transparent for the documented public APIs.)
I have added tests to cover my changes. (Updated the parser tests to exercise the new path, including a real-fixture syntax-error case.)
It is safe to rollback this change.

better_html has been deprecated in favor of herb (https://herb-tools.dev), the modern HTML+ERB toolchain it points users toward. herb has a native C parse step and exposes Herb.extract_ruby, which returns the Ruby code from an ERB template with whitespace padding so character positions match the original file. That lets us delete the AST-walk-and-concat-code-nodes path in Parsers::Erb (and the comment that explicitly disclaimed correct source locations); the new implementation hands the extracted string to the existing Ruby parser. The Parser::SyntaxError rescue moves away too because Parsers::Ruby already wraps that case. Other changes: - Drop the parser_class: keyword from Parsers::Erb.new (it injected a BetterHtml::Parser-shaped object; no Herb analogue, and the documented custom-ERB-parser hook in USAGE.md is subclassing + overriding parse_buffer). - Drop the `require "rails/railtie"` workaround in two parser tests; herb has no Rails dependency. - Add a small Sorbet shim for Herb.extract_ruby (only API we use); tapioca can't auto-generate herb's RBI in this env because of an unrelated sorbet-runtime/`sig` ordering issue inside herb.

dduugg · 2026-05-14T14:59:18Z

Superseded by #447

dduugg marked this pull request as ready for review May 11, 2026 21:39

dduugg requested a review from a team as a code owner May 11, 2026 21:39

dduugg changed the title ~~Replace better_html dependency with herb~~ perf: Replace better_html dependency with herb May 12, 2026

dduugg closed this May 14, 2026

dduugg mentioned this pull request May 14, 2026

RBSCommentsToSorbetSigs emits two consecutive sig blocks for overloaded #: annotations, which sorbet-runtime rejects at load time Shopify/spoom#913

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Replace better_html dependency with herb#462

perf: Replace better_html dependency with herb#462
dduugg wants to merge 1 commit into
Shopify:mainfrom
dduugg:replace-better-html-with-herb

dduugg commented May 11, 2026 •

edited

Loading

Uh oh!

dduugg commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dduugg commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What are you trying to accomplish?

What approach did you choose and why?

Other notable changes

Performance

What should reviewers focus on?

Type of Change

Additional Release Notes

Checklist

Uh oh!

dduugg commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dduugg commented May 11, 2026 •

edited

Loading