Stop trimming escaped spaces off the end regex by ratmice · Pull Request #635 · softdevteam/grmtools

ratmice · 2026-05-03T00:50:04Z

This is a first attempt to fix #634 basically it continues to interpret escaped spaces
as a separator between regex and token ident, however it stops trimming the escaped space.

This has pretty thorough testing of the trim function, but only minimal testing of the stream of tokens produce by lrlex in cttests. Understand if you want me to expand on that, but this seemed like a good start.

ratmice · 2026-05-03T04:05:53Z

+    // First loop over spaces
+    for ch in s.chars().rev().into_iter() {
+        if RE_SPACE_SEP.is_match(ch.encode_utf8(&mut cbuf)) {
+            last_char_width = ch.width().unwrap_or(0);


I think this is probably wrong, and we want the width of the character in bytes, and what this width function gives is the width of the character in columns.

So I think this should be ch.len_utf8()

Should be fixed in 26e8886

ltratt · 2026-05-03T07:00:19Z

+    let mut total_ws_bytes = 0;
+    let mut last_ws_char_len = 0;
+    // First loop over spaces
+    for ch in s.chars().rev().into_iter() {


Could this be something like in s.chars().rev().into_iter().take_while(|x| RE_SPACE_SEP.is_match(...)?

I don't think so for a couple of reasons, it seems like take_while takes ownership of the self iterator, and returns an iterator over the matching elements.

What we'd really want is skip_while, so the iterator resumes from where the skipping returned false. Unfortunately skip_while doesn't lend itself to having a fold like accumulator, this kind of needs something that does both.

Beyond that I looked at a couple of things,

try_reduce could work but is nightly only.

scan seemed like it could work, but I couldn't find a way that improved clarity.

So part of the problem is that we want to accumulate into a value,
and then continue iteration from that point exactly where accumulation stopped.

Here is a try_fold version after which we do still need to make a second iterator on a sub-slice of the string.

// First loop over space characters let (last_ws_char_len, mut total_ws_bytes) = match s.chars().rev().into_iter().try_fold((0, 0), |(last_ch_len, total_ws_bytes), ch| { if RE_SPACE_SEP.is_match(ch.encode_utf8(&mut cbuf)) { let this_ch_len = ch.len_utf8(); ControlFlow::Continue((this_ch_len, total_ws_bytes + this_ch_len)) } else { ControlFlow::Break((last_ch_len, total_ws_bytes)) } }) { // The nightly function into_value would help here // https://doc.rust-lang.org/stable/std/ops/enum.ControlFlow.html#method.into_value ControlFlow::Continue((x, y)) => (x, y), ControlFlow::Break((x, y)) => (x, y), };

I kind of think this fold might be an improvement, opinions?

~~Oops, I forgot to cargo test that try_fold implementation, after getting it to compile, so of course it doesn't seem to work yet.~~

Nevermind, that was me messing up some other part of the function.

I was curious if AI had any ideas to improve the code, and it did have a novel idea.
Using the trim_end_matches function to trim all spaces, then re-add the space if needed.

I ended up rewriting basically all the AI code, but this seems like it might be more idiomatic, or at least less loopy.

fn trim_end_unescaped(s: &str) -> &str { let trimmed = s.trim_end_matches(matches_whitespace); if trimmed.len() == s.len() { return s; } // If the number of backslashes is odd then the first space in the trimmed portion is escaped so re-add it. if trimmed.chars().rev().take_while(|&c| c == '\\').count() % 2 == 1 { // Panic safety: the trimmed portion is at least one char long. &s[..trimmed.len() + s[trimmed.len()..].chars().next().unwrap().len_utf8()] } else { trimmed } }

ltratt · 2026-05-03T07:01:00Z

I have one shallow comment: otherwise, this looks good, particularly with the test suite!

Stop trimming escaped spaces off the end regex

124f613

ratmice linked an issue May 3, 2026 that may be closed by this pull request

regex/token name separator for regexes that end with trailing spaces #634

Open

ratmice commented May 3, 2026

View reviewed changes

use len_utf8 instead of unicode-width, rename locals

26e8886

ltratt reviewed May 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop trimming escaped spaces off the end regex#635

Stop trimming escaped spaces off the end regex#635
ratmice wants to merge 2 commits intosoftdevteam:masterfrom
ratmice:regex_trailing_ws

ratmice commented May 3, 2026 •

edited

Loading

Uh oh!

ratmice May 3, 2026 •

edited

Loading

Uh oh!

ratmice May 3, 2026

Uh oh!

ltratt May 3, 2026

Uh oh!

ratmice May 3, 2026 •

edited

Loading

Uh oh!

ratmice May 3, 2026 •

edited

Loading

Uh oh!

ratmice May 3, 2026 •

edited

Loading

Uh oh!

ltratt commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ratmice commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ratmice May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratmice May 3, 2026

Choose a reason for hiding this comment

Uh oh!

ltratt May 3, 2026

Choose a reason for hiding this comment

Uh oh!

ratmice May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratmice May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratmice May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ltratt commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ratmice commented May 3, 2026 •

edited

Loading

ratmice May 3, 2026 •

edited

Loading

ratmice May 3, 2026 •

edited

Loading

ratmice May 3, 2026 •

edited

Loading

ratmice May 3, 2026 •

edited

Loading