Skip to content

fix: quote fields containing a carriage return in CsvWriter#128

Open
stevehansen wants to merge 2 commits into
masterfrom
fix/csvwriter-carriage-return-quoting
Open

fix: quote fields containing a carriage return in CsvWriter#128
stevehansen wants to merge 2 commits into
masterfrom
fix/csvwriter-carriage-return-quoting

Conversation

@stevehansen
Copy link
Copy Markdown
Owner

What

CsvWriter now quotes fields that contain a bare carriage return (\r), matching CsvBufferWriter and RFC 4180.

Why

Per RFC 4180, a field must be quoted when it contains a quote, the separator, CR, or LF. CsvWriter only triggered quoting on ", the separator, ', and \n — it omitted \r. CsvBufferWriter already included \r (SearchValues.Create("''\n\r")), so the two writers disagreed.

Consequences of the bug:

  • A value like a\rb was written unquoted, which strict RFC-4180 parsers (Excel, etc.) treat as a record break → malformed output.
  • Round-tripped through this library it split into two records instead of one.

This was a pre-existing issue; it was surfaced by Gemini's review of #127 (the terminology PR), which made the two writers' now-identically-named trigger sets visibly disagree.

Changes

  • CsvWriter: add \r to the cached SearchValues (memory/buffer paths) and to the string paths (WriteLine / WriteLineAsync). All four write paths — sync string, async string, and the two ReadOnlyMemory<char> paths — now quote on \r.
  • Perf: removed the per-row char[] allocation in WriteLine/WriteLineAsync by caching the fixed trigger chars in a static array and checking the variable separator separately (mirrors what the memory paths already do).
  • Tests: regression coverage for the sync, async, and memory paths, plus an assertion that CsvWriter and CsvBufferWriter now agree on \r.
  • CHANGELOG updated under Unreleased.

Verification

  • Builds on netstandard2.0 / net8.0 / net9.0, 0 errors.
  • 183/183 tests pass.

Note: the timing-based Performance_CompareWriterMethods micro-benchmark is pre-existing-flaky under parallel test execution (no warmup, sub-5ms measurements). It passes consistently in isolation; the allocation removal here makes the traditional baseline marginally faster, which slightly tightens its ratio check. Not touched in this PR — flagging in case you want to robustify it separately.

Follow-up to the review on #127; that PR remains terminology-only / behavior-neutral.

🤖 Generated with Claude Code

Per RFC 4180 a field that contains CR, LF, the separator, or a quote must be
quoted. CsvWriter only triggered on '\n', the separator, the single quote, and
'"', so a value like `a\rb` was written unquoted — mis-parsed by strict readers
and split into two records when re-read. CsvBufferWriter already included '\r';
all CsvWriter paths (sync, async, and the ReadOnlyMemory<char> paths) now match it.

Also removes the per-row char[] allocation in WriteLine/WriteLineAsync by caching
the fixed quote-trigger characters in a static array and checking the variable
separator separately.

Surfaced by Gemini's review of #127.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request ensures CsvWriter correctly quotes fields containing a bare carriage return (\r) to comply with RFC 4180, and optimizes performance by caching fixed quote-trigger characters to eliminate per-row allocations. The review feedback suggests further optimizing these checks on .NET 8.0 and greater by utilizing the pre-defined SearchValues<char> and string.Contains instead of the char[] array on the hot path.

Comment thread Csv/CsvWriter.cs
Comment on lines +494 to 495
else if (cell.IndexOf(separator) >= 0 || cell.IndexOfAny(FixedEscapeCharsArray) >= 0)
escape = true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

On .NET 8.0 and greater, we can leverage the pre-defined SearchValues<char> (FixedEscapeChars) and string.Contains(char) to avoid the overhead of searching with a char[] array. This improves performance on the hot path when writing rows.

#if NET8_0_OR_GREATER
                    else if (cell.Contains(separator) || cell.AsSpan().IndexOfAny(FixedEscapeChars) >= 0)
                        escape = true;
#else
                    else if (cell.IndexOf(separator) >= 0 || cell.IndexOfAny(FixedEscapeCharsArray) >= 0)
                        escape = true;
#endif

Comment thread Csv/CsvWriter.cs
await writer.WriteAsync('"').ConfigureAwait(false);
}
else if (cell.IndexOfAny(escapeChars) >= 0)
else if (cell.IndexOf(separator) >= 0 || cell.IndexOfAny(FixedEscapeCharsArray) >= 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similarly to the synchronous WriteLine method, we can optimize this check on .NET 8.0 and greater by using SearchValues<char> (FixedEscapeChars) and string.Contains(char) instead of the char[] array.

#if NET8_0_OR_GREATER
                    else if (cell.Contains(separator) || cell.AsSpan().IndexOfAny(FixedEscapeChars) >= 0)
#else
                    else if (cell.IndexOf(separator) >= 0 || cell.IndexOfAny(FixedEscapeCharsArray) >= 0)
#endif

Address PR review: on NET8+, WriteLine/WriteLineAsync now reuse the cached
SearchValues<char> via cell.AsSpan().IndexOfAny + string.Contains(separator)
instead of char[] IndexOfAny, matching the existing memory write paths. The
char[] fallback is retained for netstandard2.0 and scoped under #if so it is
not flagged unused on NET8+.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@stevehansen
Copy link
Copy Markdown
Owner Author

Thanks @gemini-code-assist — good call, applied in 2e18932. On NET8+, WriteLine/WriteLineAsync now use the cached SearchValues<char> (cell.AsSpan().IndexOfAny(FixedEscapeChars) + cell.Contains(separator)), matching the existing WriteCellMemory pattern. The char[] (FixedEscapeCharsArray) is kept for the netstandard2.0 path and scoped under #if so it is not flagged unused on NET8+. Build clean on all three TFMs; 183/183 tests pass.

@gemini-code-assist
Copy link
Copy Markdown

Thanks for the update, @stevehansen. The implementation looks solid and correctly addresses the RFC 4180 compliance issue while optimizing the allocation path for both modern and legacy TFMs. The test coverage is also comprehensive. Nice work getting those tests passing across all targets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant