Skip to content

perf: Replace better_html dependency with herb#462

Closed
dduugg wants to merge 1 commit into
Shopify:mainfrom
dduugg:replace-better-html-with-herb
Closed

perf: Replace better_html dependency with herb#462
dduugg wants to merge 1 commit into
Shopify:mainfrom
dduugg:replace-better-html-with-herb

Conversation

@dduugg
Copy link
Copy Markdown

@dduugg dduugg commented May 11, 2026

What are you trying to accomplish?

better_html was deprecated by Shopify in favor of Herb — its README says so explicitly and points users at the modern ecosystem. Packwerk has been carrying better_html purely to parse .erb files (Parsers::Erb). This PR swaps that dependency for herb.

What approach did you choose and why?

Herb exposes Herb.extract_ruby(source), which returns just the Ruby code from an ERB template. That replaces, in one C-extension call, the entire pure-Ruby pipeline the existing Parsers::Erb used: parse to a BetterHtml::Parser AST → walk the tree → pluck :code nodes → skip ERB-comment subtrees → join code strings with \n → re-parse as Ruby.

The new Parsers::Erb is ~25 lines:

def parse_buffer(buffer, file_path:)
  ruby_source = Herb.extract_ruby(buffer.source)
  @ruby_parser.call(io: StringIO.new(ruby_source), file_path: file_path)
rescue EncodingError => e
  result = ParseResult.new(file: file_path, message: e.message)
  raise Parsers::ParseError, result
end

Other notable changes

  • Dropped the parser_class: keyword arg from Parsers::Erb.new. It was a BetterHtml::Parser injection seam with no Herb analogue. The documented hook for users with custom ERB-parsing needs is subclassing Packwerk::Parsers::Erb and overriding #parse_buffer (per USAGE.md), and that path is unaffected.
  • Removed the require "rails/railtie" workaround in two parser tests. The # TODO: make better_html not require Rails comment can go: Herb has no Rails dependency.
  • Reworked the two error-injection tests that previously stubbed BetterHtml::Parser#ast. The syntax-error case now uses the real invalid.erb fixture (more honest); the EncodingError case stubs Herb.extract_ruby directly.
  • Added sorbet/rbi/shims/herb.rbi, an 8-line shim covering only the single API we use (Herb.extract_ruby). bin/tapioca gem herb can't auto-generate herb's RBI in this configuration: tapioca's rbs/rewriter.rb installs a RequireHooks.source_transform that rewrites #: RBS comments into sig do … end blocks at load time (via Spoom::Sorbet::Translate.rbs_comments_to_sorbet_sigs). Herb uses overloaded #: annotations — two #: lines on a single method, e.g. Herb::DiffResult#each for the with-block / without-block split. Spoom emits two consecutive sig calls, and sorbet-runtime's _declare_sig_internal rejects that with "You called sig twice without declaring a method in between", aborting gem load. Filed upstream as Shopify/spoom#913 with a standalone reproducer; hand-rolling a shim is the workaround until that's resolved.
  • (Minor, falls out for free) Herb's extract_ruby whitespace-pads its output by default, so the Ruby parser's line/column on each node corresponds to the real .erb source. Live bin/packwerk check terminal output for ERB violations now has accurate line numbers. package_todo.yml doesn't store locations so the stored todos are unaffected, and the existing code's comment showed the maintainers had already decided this wasn't worth fixing on its own — but since we get it as a side effect, we can also delete that comment.

Performance

I benchmarked the new parser against the old one on (a) a synthetic micro-benchmark and (b) a real Rails monolith's ERB corpus (1,582 ERB files / ~2.9 MB pulled from a private codebase of ~46,000 Ruby+ERB files, applying the same exclude rules as that codebase's packwerk.yml).

Synthetic, same content scaled up:

Input Iters better_html herb Reduction
~1 KB 5,000 3.27 s 0.82 s 75%
~47 KB 500 16.16 s 3.37 s 79%
~466 KB 50 27.48 s 3.07 s 89%

Real ERB corpus, 1,582 files:

Mode better_html herb Reduction
Serial parse, all files 1.68 s 0.89 s 47%
Parallel parse (8 cores) 0.23 s 0.18 s 22%

Allocations (medium input, GC paused, 500 iterations):

better_html herb Reduction
Total objects 162,755 29,664 82%
Strings 24,184 3,278 86%
Arrays 53,714 7,157 87%

Per-file gains in real workloads are more modest than the synthetic numbers (mostly because most .erb files in practice are small partials), but the allocation drop is significant either way — less GC pressure across parallel workers during a bin/packwerk check.

What should reviewers focus on?

  • Is dropping the parser_class: keyword from Parsers::Erb.new acceptable? It's not marked private_constant, so it's technically reachable, but it was very tightly coupled to BetterHtml::Parser's shape. Subclassing-and-overriding parse_buffer remains as the supported customization path.
  • The Sorbet shim: a tapioca-generated RBI would be preferable. Tracked upstream at Shopify/spoom#913 — once that's resolved we can delete sorbet/rbi/shims/herb.rbi and switch to bin/tapioca gem herb.
  • Whether the README's "Custom ERB parser" example in USAGE.md (which currently inherits from Packwerk::Parsers::Erb and calls super with a buffer) deserves a refresh in this PR or a follow-up — the existing example still works.

Type of Change

  • Bugfix
  • New feature
  • Non-breaking change (a change that doesn't alter functionality - i.e., code refactor, configs, etc.)

Additional Release Notes

  • Breaking change (fix or feature that would cause existing functionality to change)

Dependency replaced: better_htmlherb. Anyone who was instantiating Packwerk::Parsers::Erb with the parser_class: keyword (an undocumented injection seam) will need to migrate; the supported parse_buffer-override pattern documented in USAGE.md continues to work unchanged.

Checklist

  • I have updated the documentation accordingly. (No user-facing docs reference better_html directly; the gem swap is transparent for the documented public APIs.)
  • I have added tests to cover my changes. (Updated the parser tests to exercise the new path, including a real-fixture syntax-error case.)
  • It is safe to rollback this change.

better_html has been deprecated in favor of herb
(https://herb-tools.dev), the modern HTML+ERB toolchain it points
users toward. herb has a native C parse step and exposes
Herb.extract_ruby, which returns the Ruby code from an ERB template
with whitespace padding so character positions match the original
file.

That lets us delete the AST-walk-and-concat-code-nodes path in
Parsers::Erb (and the comment that explicitly disclaimed correct
source locations); the new implementation hands the extracted string
to the existing Ruby parser. The Parser::SyntaxError rescue moves
away too because Parsers::Ruby already wraps that case.

Other changes:

- Drop the parser_class: keyword from Parsers::Erb.new (it injected a
  BetterHtml::Parser-shaped object; no Herb analogue, and the
  documented custom-ERB-parser hook in USAGE.md is subclassing +
  overriding parse_buffer).
- Drop the `require "rails/railtie"` workaround in two parser tests;
  herb has no Rails dependency.
- Add a small Sorbet shim for Herb.extract_ruby (only API we use);
  tapioca can't auto-generate herb's RBI in this env because of an
  unrelated sorbet-runtime/`sig` ordering issue inside herb.
@dduugg dduugg marked this pull request as ready for review May 11, 2026 21:39
@dduugg dduugg requested a review from a team as a code owner May 11, 2026 21:39
@dduugg dduugg changed the title Replace better_html dependency with herb perf: Replace better_html dependency with herb May 12, 2026
@dduugg
Copy link
Copy Markdown
Author

dduugg commented May 14, 2026

Superseded by #447

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant