FIX: Skip [u] inside links + collapse HTML whitespace per spec#36
Merged
Conversation
Discourse's BBCode plugin cooks `[u]…[/u]` from Markdown source but does not re-process BBCode inside Markdown link text, so `[[u]X[/u]](url)` stays literal. Drop the wrapper when rendering under a Url or Email ancestor (transitively, so it also covers `[url][b][u]X[/u][/b][/url]`). Imported HTML like `<a><u>Facebook</u></a>` now round-trips cleanly.
Match browser whitespace handling so authors can indent source HTML without ending up with stray spaces or line breaks in the rendered Markdown: - Collapse `\s+` runs in text nodes to a single space, except inside `<pre>`, `<code>`, `<textarea>`, or `<tt>` ancestors. - Drop leading whitespace at the start of an element's content. - Trim trailing whitespace from an element's last Text child after its children are processed. - Trim trailing whitespace on the parent's previous Text sibling before a block-level AST node starts, mirroring the CSS rule that whitespace collapses against block boundaries. Block-ness is opt-in via a new `AST::Block` marker module included by Paragraph, Heading, List, ListItem, Quote, Table, TableRow, TableCell, HorizontalRule, and Align. Code is intentionally not marked so inline `<code>` between text segments keeps its surrounding whitespace. The pre-existing test that treated raw `\n\n` in `<blockquote>` source as a paragraph break is updated to reflect the spec — newlines in HTML source are whitespace, not structural breaks.
The default HTML registry used inline lambdas for `<br>` and `<hr>`, which meant they didn't expose `element_class` like every other handler. As a result the `AST::Block`-based whitespace trim could not see that `<hr>` produces a block-level node, so trailing whitespace before `<hr>` was silently kept. Add a small `VoidHandler` (counterpart to `SimpleHandler` for void elements that take no children) and register `<br>`/`<hr>` through it. Trailing whitespace before `<hr>` now trims, matching browser layout. Proc handler support is left in the parser and registry — it stays for now and will be removed in a separate breaking change.
aadecbe to
914e9d1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three related changes that surfaced from Outlook-style HTML pasted into Discourse:
[u]inside[url]/[email]: Discourse's BBCode plugin doesn't re-cook BBCode inside Markdown link text, so[[u]X[/u]](url)stayed literal.UnderlineTagnow drops the wrapper when rendering under aUrlorEmailancestor (transitively, so[url][b][u]X[/u][/b][/url]works too).<pre>/<code>/<textarea>/<tt>collapse\s+to a single space; leading whitespace at an element's start is dropped; trailing whitespace at an element's end is trimmed; whitespace immediately before a block-level child collapses against the boundary. Block-ness is opt-in via a new empty marker moduleAST::Blockincluded by Paragraph, Heading, List, ListItem, Quote, Table/Row/Cell, HorizontalRule, and Align — third-party AST extensions get the same treatment by including the module.<br>/<hr>useVoidHandlerinstead of inline lambdas: closes a hole where<hr>(a block-level element) didn't exposeelement_classand silently bypassed the block-trim. Proc-handler support is left in place; @gschlager will remove it as a breaking change in Redesign migration API around Conversion/Parse result types #30.The pre-existing test that treated raw
\n\nin<blockquote>source as a paragraph break is updated — per HTML spec, newlines in source are whitespace, not structural breaks. Use<p>or<br>for actual breaks.Test plan
bundle exec rspec— 3077 examples, 0 failuresbin/lint— no offensesbin/mutant run 'Markbridge::Parsers::HTML::Parser*' 'Markbridge::Renderers::Discourse::Tags::UnderlineTag*' 'Markbridge::Parsers::HTML::Handlers::VoidHandler*'— 100% mutation coverage (551 mutations, 0 alive)<a><u>Facebook</u></a>,<a>\n<u>Twitter</u></a>,<a href=…>\n<u>X</u></a>) now round-trip to clean Markdown