Skip to content

Improve serialization performance with memchr#740

Merged
mrobinson merged 1 commit into
mainfrom
serialize-memchr
May 17, 2026
Merged

Improve serialization performance with memchr#740
mrobinson merged 1 commit into
mainfrom
serialize-memchr

Conversation

@mrobinson
Copy link
Copy Markdown
Member

This change greatly improves the performance of serialization (up to
95% on some benchmarks) by changing the way that escaping of HTML
entities works. It uses memchar to avoid creating a chars() iterator
on the output stream. When run with the benchmark from #739, I see these
results:

serialize "lipsum.html" time:   [6.4817 µs 6.5021 µs 6.5212 µs]
                        change: [−95.179% −95.013% −94.846%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) low severe
  1 (1.00%) low mild

serialize "lipsum-zh.html"
                        time:   [2.0815 µs 2.0888 µs 2.0947 µs]
                        change: [−91.533% −90.940% −90.407%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  15 (15.00%) low severe

serialize "medium-fragment.html"
                        time:   [7.7625 µs 7.7927 µs 7.8147 µs]
                        change: [−84.424% −83.952% −83.486%] (p = 0.00 < 0.05)
                        Performance has improved.

serialize "small-fragment.html"
                        time:   [879.01 ns 886.43 ns 892.78 ns]
                        change: [−89.813% −89.711% −89.610%] (p = 0.00 < 0.05)
                        Performance has improved.

serialize "tiny-fragment.html"
                        time:   [332.13 ns 332.78 ns 333.60 ns]
                        change: [−27.768% −27.617% −27.457%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

serialize "strong.html" time:   [5.4946 µs 5.4988 µs 5.5030 µs]
                        change: [−0.3133% −0.0322% +0.2349%] (p = 0.83 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

In this case lipsum.html deserialization time dropped from 122.81 µs
to 6.5021 µs.

@github-actions github-actions Bot added the V-non-breaking A non-breaking change label May 15, 2026
@mrobinson
Copy link
Copy Markdown
Member Author

I plan to extend support for this optimization to the XML serializer in a followup. In order to do that I need to make some XML benchmarks. That seems like enough to split into a separate PR.

Copy link
Copy Markdown
Member

@simonwuelker simonwuelker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice speedup!

We solve a very similar problem for the data state in the HTML tokenizer with SIMD intrinsics. I wonder how that compares to two memchr calls in terms of performance.

unsafe fn data_state_simd_fast_path(&self, input: &mut StrTendril) -> Option<SetResult> {
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
let (mut i, mut n_newlines) = self.data_state_sse2_fast_path(input);
#[cfg(target_arch = "aarch64")]
let (mut i, mut n_newlines) = self.data_state_neon_fast_path(input);
// Process any remaining bytes (less than STRIDE)
while let Some(c) = input.as_bytes().get(i) {
if matches!(*c, b'<' | b'&' | b'\r' | b'\0') {
break;
}
if *c == b'\n' {
n_newlines += 1;
}
i += 1;
}
let set_result = if i == 0 {
let first_char = input.pop_front_char().unwrap();
debug_assert!(matches!(first_char, '<' | '&' | '\r' | '\0'));
// FIXME: Passing a bogus input queue is only relevant when c is \n, which can never happen in this case.
// Still, it would be nice to not have to do that.
// The same is true for the unwrap call.
let preprocessed_char = self
.get_preprocessed_char(first_char, &BufferQueue::default())
.unwrap();
SetResult::FromSet(preprocessed_char)
} else {
debug_assert!(
input.len() >= i,
"Trying to remove {:?} bytes from a tendril that is only {:?} bytes long",
i,
input.len()
);
let consumed_chunk = input.unsafe_subtendril(0, i as u32);
input.unsafe_pop_front(i as u32);
SetResult::NotFromSet(consumed_chunk)
};
self.current_line.set(self.current_line.get() + n_newlines);
Some(set_result)
}

Comment thread html5ever/src/serialize/mod.rs Outdated
Comment on lines +114 to +115
let result2 = memchr2(b'&', 0xC2, &slice[..result]).unwrap_or(slice.len());
result.min(result2)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If you do unwrap_or(result) for the memchr2 call then you don't need result.min(result2)

@mrobinson
Copy link
Copy Markdown
Member Author

Thanks for the review!

We solve a very similar problem for the data state in the HTML tokenizer with SIMD intrinsics. I wonder how that compares to two memchr calls in terms of performance.

I've been doing a lot of rough experimentation of the past couple days with an SSE3 and AVX2 version of the parser optimization. It's quite possible that we could use a similar technique here (and it would benefit from not having to count newlines). It would be nice to move some of these routines into markup5ever utilities and to make them more general, though very carefully in order to avoid hurting performance.

I think you are ultimately correct in #703, though that bigger wins are likely found by structural changes to the API such as supporting a mode that doesn't count newlines.

This change greatly improves the performance of serialization (up to
95% on some benchmarks) by changing the way that escaping of HTML
entities works. It uses memchar to avoid creating a `chars()` iterator
on the output stream. When run with the benchmark from #739, I see these
results:

```
serialize "lipsum.html" time:   [6.4817 µs 6.5021 µs 6.5212 µs]
                        change: [−95.179% −95.013% −94.846%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) low severe
  1 (1.00%) low mild

serialize "lipsum-zh.html"
                        time:   [2.0815 µs 2.0888 µs 2.0947 µs]
                        change: [−91.533% −90.940% −90.407%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  15 (15.00%) low severe

serialize "medium-fragment.html"
                        time:   [7.7625 µs 7.7927 µs 7.8147 µs]
                        change: [−84.424% −83.952% −83.486%] (p = 0.00 < 0.05)
                        Performance has improved.

serialize "small-fragment.html"
                        time:   [879.01 ns 886.43 ns 892.78 ns]
                        change: [−89.813% −89.711% −89.610%] (p = 0.00 < 0.05)
                        Performance has improved.

serialize "tiny-fragment.html"
                        time:   [332.13 ns 332.78 ns 333.60 ns]
                        change: [−27.768% −27.617% −27.457%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

serialize "strong.html" time:   [5.4946 µs 5.4988 µs 5.5030 µs]
                        change: [−0.3133% −0.0322% +0.2349%] (p = 0.83 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
```

In this case `lipsum.html` deserialization time dropped from 122.81 µs
to 6.5021 µs.

Signed-off-by: Martin Robinson <mrobinson@igalia.com>
@mrobinson mrobinson enabled auto-merge May 17, 2026 10:22
@mrobinson mrobinson added this pull request to the merge queue May 17, 2026
Merged via the queue into main with commit a32b0d2 May 17, 2026
9 checks passed
@github-actions github-actions Bot added V-non-breaking A non-breaking change and removed V-non-breaking A non-breaking change labels May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

V-non-breaking A non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants