Skip to content

fix: preserve multiline source expression whitespace#323

Open
hjotha wants to merge 3 commits intomendixlabs:mainfrom
hjotha:submit/visitor-preserve-multiline-source-expressions
Open

fix: preserve multiline source expression whitespace#323
hjotha wants to merge 3 commits intomendixlabs:mainfrom
hjotha:submit/visitor-preserve-multiline-source-expressions

Conversation

@hjotha
Copy link
Copy Markdown
Contributor

@hjotha hjotha commented Apr 26, 2026

Part of #332.

Fixes #322.

Summary

Preserves multiline source text for declaration initial values and log expressions, including newline-only whitespace between log template parameters. Also avoids preserving plain string literals solely because their text contains punctuation such as ! or :.

Root cause

The visitor rebuilt declare/log expressions from parsed expression nodes. That discarded original line breaks and inter-parameter whitespace. The source-preservation scanner also inspected operator-like bytes inside single-quoted string literals, which could wrap plain template text as a SourceExpr.

Fix

  • Add SourceExpr as a source-preserving expression wrapper.
  • Use buildSourceExpression for source-sensitive declare/log/while expressions.
  • Serialize SourceExpr through executor expression formatting.
  • Preserve newline-only trailing whitespace between log template parameters.
  • Skip compact-operator detection while scanning inside single-quoted literals, including doubled quote escapes.

Tests

Added parser coverage for synthetic multiline declare/log template expressions and visitor coverage for punctuation inside string literals. Test names and fixtures are synthetic and do not reference real project microflows.

Validation

  • go test ./mdl/visitor -run 'TestDeclareAndLogTemplatePreserveMultilineSourceWhitespace|TestShouldPreserveExpressionSourceIgnoresStringLiteralPunctuation'
  • make build
  • make test
  • make lint-go
  • make test-integration

Agentic Code Testing

  • No new authoring syntax.
  • Synthetic multiline-expression tests verify source text remains stable for generated MDL.
  • Synthetic string-literal tests verify punctuation inside strings does not force source preservation.

Test plan

  • Multiline declaration initial value keeps original source.
  • Log template parameter keeps trailing blank-line whitespace.
  • String literal punctuation does not create a source-preserved expression.

Symptom: describe/exec round-trips collapsed multiline declaration and log-template expressions, producing cosmetic drift and losing source layout that users intentionally wrote.

Root cause: declare initial values and log message/node expressions were rebuilt from parsed expression nodes instead of retaining source text, and template parameter gaps discarded newline-only whitespace between parameters.

Fix: add SourceExpr as a small expression wrapper for retained source text, use it for declare/log/while expressions that need source preservation, serialize it back through the executor, and append newline-only trailing whitespace between log template parameters to the captured source expression.

Tests: added parser coverage for multiline declare values and log template parameters with trailing blank lines; validated make build, make test, and make lint-go.
@github-actions
Copy link
Copy Markdown

AI Code Review

We are reviewing a PR that adds a new SourceExpr AST node to preserve the original source text of expressions (especially for multiline declare initial values and log template parameters) and uses it in the visitor and executor.

The PR does not add new MDL syntax (it's about preserving existing source text) but modifies the internal representation and handling of expressions.

Let's go through the checklist:

Overlap & duplication

  • No proposals covering this (it's a bug fix for preserving whitespace).
  • The diff introduces SourceExpr which is new and does not duplicate existing code.
  • The test added is for a new test case (synthetic multiline declare and log template) so no overlap with existing doctype tests.
  • The PR does not re-document already-shipped features.

MDL syntax design

  • The PR does not add or modify any MDL syntax (grammar, keywords, statement types). It only changes the internal AST and how expressions are built and serialized. Therefore, the MDL syntax design checklist is not applicable.

Full-stack consistency

  • Since there is no new MDL syntax (no new grammar rules, no new AST node for a statement, etc.), the full-stack consistency checklist for MDL features does not apply. The changes are in the visitor, AST, and executor, but they are internal to the expression handling and do not add a new MDL feature.

Test coverage

  • The PR adds a test in visitor_test.go for the new functionality. The test is synthetic and does not reference real project microflows, which is acceptable.
  • The test covers the declared and log template multiline source preservation.
  • There are no new executor commands, so no need for doctype-test examples.

Security & robustness

  • The changes are in the visitor and expression handling. There is no obvious command injection, XSS, SQL injection, or temp file issue.
  • Unix sockets and file I/O are not touched.
  • The changes do not introduce silent side effects on typos (they only affect how expressions are built and serialized).
  • Method receivers: we are adding functions and methods, but they are not mutating existing structures in a way that would require a pointer receiver issue. The new SourceExpr is a struct and we are adding methods that take pointers (which is standard).

Scope & atomicity

  • The PR is focused on one thing: preserving multiline source expression whitespace for declare initial values and log template parameters.
  • The commits (as seen in the diff) are all related to this change.

Code quality

  • The changes are consistent: we are adding the same pattern in multiple places (visitor, executor helpers) to handle SourceExpr.
  • There are no manually maintained lists being touched.
  • The design (adding a wrapper to preserve source) matches the implementation.

Bugs & correctness

  • The fix addresses the root cause: the visitor was rebuilding expressions from parsed nodes, discarding original formatting. The fix introduces a wrapper that holds the original source when needed.
  • The error handling: the new functions (buildSourceExpression, shouldPreserveExpressionSource, etc.) return the original expression if the source is not available or not worth preserving, so there is no risk of nil pointer dereference in the existing code that expects an expression.
  • The logic in shouldPreserveExpressionSource might be a bit complex, but it seems to be trying to preserve expressions that have newlines or operators without surrounding spaces (which might be altered by the parser). However, note that the PR description says it preserves newline-only whitespace between log template parameters. The test case includes a blank line (two newlines) between the template parameters.

Looking at the test case:
DECLARE $Endpoint String = @Synthetic.Endpoint
++ '/items?page=' + toString($Page)
++ '&token=' + (if $Token!=empty then $Token/Value
+else
+'');
LOG INFO NODE 'SyntheticLog' 'Processed {1} items for {2}' WITH ({1} = toString($Count)
+
+, {2} = $Endpoint);

The declare initial value is split over multiple lines with ++ at the beginning of each line (except the first). The log template has two parameters: the first is toString($Count) followed by a blank line (two newlines) and then a comma and then the second parameter.

The test expects:
declareSource.Source = "@Synthetic.Endpoint\n+ '/items?page=' + toString($Page)\n+ '&token=' + (if $Token!=empty then $Token/Value\nelse\n'')"
and for the first log template param: "toString($Count)\n\n"

The implementation in buildSourceExpression uses shouldPreserveExpressionSource to decide whether to wrap. The function returns true if there's a newline or if there are operators without surrounding spaces (to catch cases like a+b becoming a + b if we lose the original). However, note that the declare initial value in the test does not have operators without spaces? Actually, it has + at the beginning of the line, but note the original source has a newline and then a + at the start of the next line. The function shouldPreserveExpressionSource checks for operators and if they are not surrounded by spaces (or tab) then it returns true. In the declare, we have:
... + toString($Page)

  • '&token=' ...

The + at the beginning of the line is preceded by a newline (which is not a space or tab) so it would return true. Similarly, the log template param has a newline and then nothing (so the expression is just toString($Count) and then we have two newlines and then a comma). The function would see the newline and return true.

The function appendTemplateParamTrailingWhitespace is specifically designed to capture the whitespace (including newlines) between the end of the expression and the next template parameter delimiter (comma or closing parenthesis). It then appends that whitespace to the source if the expression is a SourceExpr or wraps it.

One potential issue: the function buildSourceExpression calls extractOriginalText and then trims the space. However, note that the test expects the source to include the newlines and the leading + (without trimming the leading space? Actually, the test string does not have leading spaces). The strings.TrimSpace would remove leading and trailing whitespace. But note the test expects the source to start with @Synthetic.Endpoint and end with '' (two single quotes). The original source in the test does not have leading or trailing whitespace? Let's see:

DECLARE $Endpoint String = @Synthetic.Endpoint
++ '/items?page=' + toString($Page)
++ '&token=' + (if $Token!=empty then $Token/Value
+else
+'');

The initial value starts at @Synthetic.Endpoint and ends with '' (and then a semicolon, but the expression context would stop before the semicolon). The extractOriginalText would get the text from the start of the expression to the end (excluding the surrounding whitespace? Actually, the antlr context includes the whitespace? We are using extractOriginalText which gets the text from the start token to the stop token. Then we do strings.TrimSpace on that. This would remove the newline at the very beginning and the very end? But note the expression in the declare is on multiple lines. The first line of the expression is @Synthetic.Endpoint and then we have a newline and then ++ .... The extractOriginalText would include the newline after @Synthetic.Endpoint and the ++ on the next line. Then strings.TrimSpace would remove the leading and trailing whitespace. However, the test expects the source to have a newline after @Synthetic.Endpoint and then a + at the beginning of the next line. But note: the strings.TrimSpace would remove the newline at the very beginning? Actually, the expression does not start with whitespace? It starts with @Synthetic.Endpoint. The newline after that is inside the string and would be kept by TrimSpace because it's not at the very beginning or end? Actually, TrimSpace removes leading and trailing whitespace (spaces, tabs, newlines). So if the entire string starts with a non-whitespace and ends with a non-whitespace, then the internal newlines are kept.

In the test, the expression string is:
"@Synthetic.Endpoint\n+ '/items?page=' + toString($Page)\n+ '&token=' + (if $Token!=empty then $Token/Value\nelse\n'')"

This string does not have leading or trailing whitespace (it starts with '@' and ends with ''''). So TrimSpace would leave it unchanged.

Similarly, for the log template param, we expect "toString($Count)\n\n". This string starts with 't' and ends with '\n' (two newlines). The trailing newlines would be removed by TrimSpace? Wait, the test expects two trailing newlines. But note: the function appendTemplateParamTrailingWhitespace is called after buildSourceExpression for the template parameter. In buildSourceExpression, we do:
source := strings.TrimSpace(extractOriginalText(prc))

So if the original text of the expression (from the antlr context) includes trailing whitespace (like newlines) then TrimSpace would remove them. Then we append the trailing whitespace (which we captured separately) in appendTemplateParamTrailingWhitespace.

Looking at the test: the first template parameter is toString($Count) and then we have two newlines and then a comma. The extractOriginalText for the expression context of the first parameter would be just toString($Count) (without the trailing newlines) because the antlr context for the expression would stop at the end of the expression (before the whitespace and the comma). Then we trim it (which does nothing because there's no whitespace around toString($Count)). Then we call appendTemplateParamTrailingWhitespace which captures the two newlines and the comma? Actually, note the function templateParamTrailingWhitespace:

It looks for the gap between the end of the current expression and the start of the next parameter (or the end of the template params). It then if the delimiter is a comma, it takes the text up to the comma.

In the test, the gap between the end of the first expression and the start of the second expression is two newlines and then a comma? Actually, the input is:
WITH ({1} = toString($Count)

, {2} = $Endpoint);

So after toString($Count) we have a newline, then another newline, then a comma, then a space, then {2}. The function templateParamTrailingWhitespace for the first parameter (index 0) would:

  • start: the position after the end of the first expression (after toString($Count))
  • end: the start of the next parameter (which is the { in {2}) minus one? Actually, it sets end to the start of the next parameter minus one, and the delimiter to ','.

But note: the next parameter starts at the { of {2}. The gap is from after toString($Count) to the { of {2}. However, there is a comma in between. The function then:

  • if delimiter is ',', it looks for a comma in the gap and takes the text up to that comma.

So the gap string is: "\n\n , {2} = $Endpoint" (but note: the actual input has two newlines, then a space, then a comma, then a space, then {2}). The function finds the comma at position 3 (if we count: [0]='\n', [1]='\n', [2]=' ', [3]=','). Then it sets gap to the substring from 0 to 3 (exclusive of the comma? Actually, it does gap = gap[:comma] which is the substring from 0 to comma index, so we get "\n\n "). Then it checks if strings.TrimSpace(gap) == "" -> which is true because the gap is two newlines and a space -> when trimmed becomes empty string. Then it returns the gap (which is "\n\n ").

Then in appendTemplateParamTrailingWhitespace:
trailing = "\n\n "
source = strings.TrimSpace(extractOriginalText(exprCtx)) -> which is "toString($Count)" (no change because no surrounding whitespace)
innerExpr = expr (which is the expression from buildExpression, not a SourceExpr because we just built it and it didn't have newlines so it wasn't wrapped)
then we return &ast.SourceExpr{Expression: innerExpr, Source: source + trailing} -> "toString($Count)" + "\n\n " -> "toString($Count)\n\n "

But the test expects the source to be exactly "toString($Count)\n\n" (without the trailing space). However, note the test input has two newlines and then a comma and then a space. The gap we captured was "\n\n " (two newlines and a space). The test expects only two newlines.

This might be a discrepancy. However, note the test says: "Log template parameter keeps trailing blank-line whitespace." and the test expects the source to be "toString($Count)\n\n". The actual input has two newlines and then a comma and a space. The blank line is the two newlines? The comma and space are not part of the whitespace we want to keep? We only want the whitespace between the end of the expression and the comma? The function templateParamTrailingWhitespace is designed to capture the whitespace and then the comma is used as a delimiter to stop at. But note: the gap includes the comma? Actually, we stop at the comma (we take the substring up to the comma) and then we check if that substring (when trimmed) is empty. In this case, the substring up to the comma is "\n\n " (two newlines and a space). When trimmed, it becomes empty string -> so we return that substring. Then we append it to the source.

But the test expects only the two newlines, not the space. However, note that the space after the two newlines and before the comma is not part of the "blank-line whitespace" we want to keep? The blank line is the two newlines. The space is incidental and might be considered part of the formatting. But the PR description says: "Preserve newline-only trailing whitespace between log template parameters." So we only want to preserve the newlines, not the space.

This suggests that the function templateParamTrailingWhitespace might be capturing too much. However, note that the test passes (as per the PR: the test is added and passes). So either the test input does not have that space, or the function is working as intended.

Looking at the test input string in the PR:
LOG INFO NODE 'SyntheticLog' 'Processed {1} items for {2}' WITH ({1} = toString($Count)

, {2} = $Endpoint);

There is a blank line (two newlines) between toString($Count) and the comma. But note: the line after toString($Count) is empty, then the next line starts with a comma and then a space and then {2}. So the gap between the end of toString($Count) and the start of the next parameter (which is the { in {2}) is: newline, newline, comma, space, and then the {2}. However, the function templateParamTrailingWhitespace is looking for the gap until the next parameter's start (which is the { of {2}). But note: the comma and space are between the end of the first expression and the start of the next parameter. The function then, when it sees the delimiter is a comma, it takes the text up to the comma (so it includes the two newlines and the space? Actually, no: the gap is from the end of the first expression to the start of the next parameter. The next parameter starts at the { of {2}. The comma and space are before that. So the gap includes the two newlines, the comma, and the space. Then, because the delimiter is a comma, we look for the comma in the gap and take the substring up to that comma. That would be the two newlines and the space? Actually, the comma is at position 2 (if we count: [0]='\n', [1]='\n', [2]=',', [3]=' ', ...). Then we take gap[0:2] which is "\n\n". Then we check if that string (when trimmed) is empty -> it is (because it's two newlines, which when trimmed becomes empty). So we return "\n\n". Then we append that to the source -> "toString($Count)\n\n".

So the test input must not have a space between the two newlines and the comma? Actually, the test input as written in the PR has:
WITH ({1} = toString($Count)

, {2} = $Endpoint);

There are two newlines after toString($Count), then a comma, then a space, then {2}. But note: the line with the comma is: , {2} = $Endpoint); -> there are two spaces at the beginning of that line? Actually, the indentation: the entire WITH block is indented by two spaces? The test input:
LOG INFO NODE 'SyntheticLog' 'Processed {1} items for {2}' WITH ({1} = toString($Count)

, {2} = $Endpoint);

The line with the comma starts with two spaces? Actually, the indentation level: the WHILE block is indented by two spaces, then the LOG statement is indented by two more? But we don't have to worry because the antlr token stream will capture the exact text.

Given that the test passes, we assume the implementation is correct for the test case.

Now, let's check for any potential issues:

  1. The function shouldPreserveExpressionSource might be too broad. It returns true for any expression that contains a newline or for any expression that has an operator without surrounding spaces (with some conditions). This might cause us to wrap expressions that don't need wrapping, but it's safe because we only use the source when it's non-empty and we fall back to the expression otherwise.

  2. The function buildSourceExpression calls `buildExpression


Automated review via OpenRouter (Nemotron Super 120B) — workflow source

hjothamendix and others added 2 commits April 27, 2026 00:01
…ation

Adds an MDL script under mdl-examples/bug-tests/ exercising both
multiline DECLARE initial values and inter-parameter blank lines in LOG
template parameters. The describe → exec → describe fixpoint confirms
SourceExpr-wrapped expressions retain their original whitespace.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Symptom: a plain string literal containing punctuation such as `!` or `:` could be wrapped as a SourceExpr, causing downstream log-template formatting to emit extra quoting.

Root cause: shouldPreserveExpressionSource scanned every operator-like byte without tracking whether it was inside a single-quoted string literal.

Fix: skip compact-operator detection while inside single-quoted literals, including doubled single-quote escapes.

Tests: add visitor coverage for punctuation and escaped quotes inside string literals while keeping compact operators outside literals preserved.
@hjotha
Copy link
Copy Markdown
Contributor Author

hjotha commented Apr 27, 2026

Updated the branch with a source-preservation guard for string literals: punctuation inside single-quoted literals no longer forces SourceExpr preservation. Validation after the update:

  • go test ./mdl/visitor -run 'TestDeclareAndLogTemplatePreserveMultilineSourceWhitespace|TestShouldPreserveExpressionSourceIgnoresStringLiteralPunctuation'
  • make build
  • make lint-go
  • make test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multiline microflow expressions lose source whitespace during parse/exec

2 participants