Avoid repeated parser table decode and cut parse setup overhead (issue 630) by avityuk · Pull Request #631 · softdevteam/grmtools

avityuk · 2026-04-28T04:01:41Z

While profiling a workload that parses many small inputs in a tight loop, I found two sources of avoidable per-parse overhead in lrpar.

First, generated parse() functions from lrpar_mod were calling _reconstitute(__GRM_DATA, __STABLE_DATA) on every invocation, which meant re-decoding the grammar and state table every time even though RTParserBuilder::new only borrows them. This change caches the reconstituted tables in generated code and reuses them across calls.

Second, there were a couple of smaller setup costs in lrpar::parser:

parser::token_cost was stored as Box<&dyn Fn(...)>, introducing a heap allocation around an already-borrowed callback on every parse.
parse_map and parse_actions collected the lexer iterator into Vec<Result<...>> and then walked it again to build the lexeme vector.

This PR removes those extra costs by:

caching generated parser tables behind a OnceLock.
introducing an opaque lrpar::ParserTables wrapper so generated code does not need to name lrtable types directly
storing token_cost as a borrowed callback rather than boxing it
collecting lexemes in one pass

Since the time spent in _reconstitute is proportional to grammar size, this change is particularly impactful there.

However, even on a very small grammar, such as calc_actions example, these changes brought a tight-loop parse benchmark down from roughly ~2.06 µs/parse to ~0.80 µs/parse.

ltratt · 2026-04-28T20:10:56Z

I have one easy comment: @ratmice does this look OK to you?

ratmice · 2026-04-28T20:25:30Z

I haven't yet had a chance to look, but I will try and have a gander this evening, in a couple of hours.

ratmice · 2026-04-28T23:35:46Z

Looks like what it says on the tin, so this looks OK to me, couldn't help but join the bikeshed a little though.

avityuk mentioned this pull request Apr 28, 2026

Generated parse() re-decodes grammar tables on every call #630

Closed

ltratt assigned ratmice and ltratt Apr 28, 2026

ltratt reviewed Apr 28, 2026

View reviewed changes

Comment thread lrpar/src/lib/ctbuilder.rs Outdated

avityuk force-pushed the issue-630-parse-performance branch 3 times, most recently from dcdfda0 to bdd2053 Compare April 30, 2026 02:56

avityuk added 2 commits April 29, 2026 19:58

Cache generated parser tables.

5f1862c

Reduce parser setup overhead.

bf1870b

avityuk force-pushed the issue-630-parse-performance branch from bdd2053 to bf1870b Compare April 30, 2026 03:00

ratmice added this pull request to the merge queue Apr 30, 2026

Merged via the queue into softdevteam:master with commit a50ddd5 Apr 30, 2026
2 checks passed

ratmice linked an issue Apr 30, 2026 that may be closed by this pull request

Generated parse() re-decodes grammar tables on every call #630

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid repeated parser table decode and cut parse setup overhead (issue 630)#631

Avoid repeated parser table decode and cut parse setup overhead (issue 630)#631
ratmice merged 2 commits intosoftdevteam:masterfrom
avityuk:issue-630-parse-performance

avityuk commented Apr 28, 2026

Uh oh!

Uh oh!

ltratt commented Apr 28, 2026

Uh oh!

ratmice commented Apr 28, 2026

Uh oh!

ratmice commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

avityuk commented Apr 28, 2026

Uh oh!

Uh oh!

ltratt commented Apr 28, 2026

Uh oh!

ratmice commented Apr 28, 2026

Uh oh!

ratmice commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants