Skip to content

regex/token name separator for regexes that end with trailing spaces #634

@ratmice

Description

@ratmice

This is an issue that affects current grmtools which was brought to light when considering the feature request in #597 it seemed like it would be good to split this out into it's own issue than discuss it there.

If we take the following lex file (foo.l) which ends in a trailing space, current grmtools accepts it,
but it seems like it should probably reject it.

%%
\ \ "weirdness"
\n ;

The thing is the trailing space in \ \ is part of a regex, but it gets treated as the space separator that separates regex from the token name. Then it seems like perhaps it gets trimmed off of the regular expression before being passed to regex.

$ echo " " | cargo run --quiet -p lrlex ./foo.l -
weirdness
$ echo "  " | cargo run --quiet -p lrlex ./foo.l -
weirdness
weirdness

I would have expected the echo with two spaces to match a single weirdness token, and the one with a single token to not match.

The following regex is rejected, but by the regex crate, I assume it is just sending the '' token alone to regex.

%%
\ "rejectedByRegex"
\n ;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions