Skip to content

SmallThingz/zhtml

Repository files navigation

zhtml

High-throughput HTML parser + CSS selector engine for Zig.

zig license

⚠️ Conformance Warning

Performance numbers are not conformance claims. The parser is intentionally permissive and currently does not fully match browser-grade tree-construction behavior.

🏁 Performance

See the latest benchmark snapshot for more details

Source: bench/results/latest.json (stable profile).

Parse Throughput (Average Across Fixtures)

ours     │████████████████████│ 1647.32 MB/s (100.00%)
lol-html │█████████████░░░░░░░│ 1062.59 MB/s (64.50%)
lexbor   │███░░░░░░░░░░░░░░░░░│ 234.51 MB/s (14.24%)

Conformance Snapshot

Profile nwmatcher qwery_contextual html5lib subset WHATWG HTML parsing
strictest/fastest 20/20 (0 failed) 54/54 (0 failed) 524/600 (76 failed) 440/500 (60 failed)

Source: bench/results/external_suite_report.json

⚡ Features

  • 🔎 CSS selector queries: comptime, runtime, and cached runtime selectors.
  • 🧭 DOM navigation: parent, siblings, first/last child, and children iteration.
  • 💤 Lazy decode/normalize path: attribute/entity decode and text normalization happen on query-time APIs.
  • 🧪 Debug tooling: selector mismatch diagnostics and instrumentation wrappers.
  • 🧰 Parse profiles: strictest and fastest option bundles for benchmarks/workloads.
  • 🧵 Destructive parsing by default for throughput, with an opt-in non-destructive read-only mode.

🚀 Quick Start

const std = @import("std");
const html = @import("html");
const options: html.ParseOptions = .{};

test "basic parse + query" {
    var input = "<div id='app'><a class='nav' href='/docs'>Docs</a></div>".*;
    var doc = try options.parse(std.testing.allocator, &input);
    defer doc.deinit();

    const a = doc.queryOne("div#app > a.nav") orelse return error.TestUnexpectedResult;
    try std.testing.expectEqualStrings("/docs", a.getAttributeValue("href").?);
}

Parsing goes through options.parse(...). Use const options: html.ParseOptions = .{ .non_destructive = true }; when the caller bytes must remain unchanged, including file-backed memory maps. This mode reads the original source directly and does not make a full-source copy.

⚙️ Build Configuration

  • -Dintlen=u16|u32|u64|usize selects the integer width used for document spans and node indexes.
  • Smaller widths reduce memory use but also reduce the maximum parseable input size.
  • u32 is the default. Use u64 for multi-gigabyte inputs.

📚 Documentation

🧪 Build and Validation

zig build test
zig build docs-check
zig build examples-check
zig build ship-check

📎 Examples

  • examples/basic_parse_query.zig
  • examples/runtime_selector.zig
  • examples/cached_selector.zig
  • examples/query_time_decode.zig
  • examples/inner_text_options.zig
  • examples/non_destructive_parse.zig

📜 License

MIT. See LICENSE.

About

A really fast but not full compliant html parser written in zig with GiB/s+ throughput

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Contributors

Languages