fix: quote fields containing a carriage return in CsvWriter#128
fix: quote fields containing a carriage return in CsvWriter#128stevehansen wants to merge 2 commits into
Conversation
Per RFC 4180 a field that contains CR, LF, the separator, or a quote must be quoted. CsvWriter only triggered on '\n', the separator, the single quote, and '"', so a value like `a\rb` was written unquoted — mis-parsed by strict readers and split into two records when re-read. CsvBufferWriter already included '\r'; all CsvWriter paths (sync, async, and the ReadOnlyMemory<char> paths) now match it. Also removes the per-row char[] allocation in WriteLine/WriteLineAsync by caching the fixed quote-trigger characters in a static array and checking the variable separator separately. Surfaced by Gemini's review of #127. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request ensures CsvWriter correctly quotes fields containing a bare carriage return (\r) to comply with RFC 4180, and optimizes performance by caching fixed quote-trigger characters to eliminate per-row allocations. The review feedback suggests further optimizing these checks on .NET 8.0 and greater by utilizing the pre-defined SearchValues<char> and string.Contains instead of the char[] array on the hot path.
| else if (cell.IndexOf(separator) >= 0 || cell.IndexOfAny(FixedEscapeCharsArray) >= 0) | ||
| escape = true; |
There was a problem hiding this comment.
On .NET 8.0 and greater, we can leverage the pre-defined SearchValues<char> (FixedEscapeChars) and string.Contains(char) to avoid the overhead of searching with a char[] array. This improves performance on the hot path when writing rows.
#if NET8_0_OR_GREATER
else if (cell.Contains(separator) || cell.AsSpan().IndexOfAny(FixedEscapeChars) >= 0)
escape = true;
#else
else if (cell.IndexOf(separator) >= 0 || cell.IndexOfAny(FixedEscapeCharsArray) >= 0)
escape = true;
#endif| await writer.WriteAsync('"').ConfigureAwait(false); | ||
| } | ||
| else if (cell.IndexOfAny(escapeChars) >= 0) | ||
| else if (cell.IndexOf(separator) >= 0 || cell.IndexOfAny(FixedEscapeCharsArray) >= 0) |
There was a problem hiding this comment.
Similarly to the synchronous WriteLine method, we can optimize this check on .NET 8.0 and greater by using SearchValues<char> (FixedEscapeChars) and string.Contains(char) instead of the char[] array.
#if NET8_0_OR_GREATER
else if (cell.Contains(separator) || cell.AsSpan().IndexOfAny(FixedEscapeChars) >= 0)
#else
else if (cell.IndexOf(separator) >= 0 || cell.IndexOfAny(FixedEscapeCharsArray) >= 0)
#endifAddress PR review: on NET8+, WriteLine/WriteLineAsync now reuse the cached SearchValues<char> via cell.AsSpan().IndexOfAny + string.Contains(separator) instead of char[] IndexOfAny, matching the existing memory write paths. The char[] fallback is retained for netstandard2.0 and scoped under #if so it is not flagged unused on NET8+. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks @gemini-code-assist — good call, applied in 2e18932. On NET8+, |
|
Thanks for the update, @stevehansen. The implementation looks solid and correctly addresses the RFC 4180 compliance issue while optimizing the allocation path for both modern and legacy TFMs. The test coverage is also comprehensive. Nice work getting those tests passing across all targets. |
What
CsvWriternow quotes fields that contain a bare carriage return (\r), matchingCsvBufferWriterand RFC 4180.Why
Per RFC 4180, a field must be quoted when it contains a quote, the separator, CR, or LF.
CsvWriteronly triggered quoting on", the separator,', and\n— it omitted\r.CsvBufferWriteralready included\r(SearchValues.Create("''\n\r")), so the two writers disagreed.Consequences of the bug:
a\rbwas written unquoted, which strict RFC-4180 parsers (Excel, etc.) treat as a record break → malformed output.This was a pre-existing issue; it was surfaced by Gemini's review of #127 (the terminology PR), which made the two writers' now-identically-named trigger sets visibly disagree.
Changes
CsvWriter: add\rto the cachedSearchValues(memory/buffer paths) and to the string paths (WriteLine/WriteLineAsync). All four write paths — sync string, async string, and the twoReadOnlyMemory<char>paths — now quote on\r.char[]allocation inWriteLine/WriteLineAsyncby caching the fixed trigger chars in a static array and checking the variable separator separately (mirrors what the memory paths already do).CsvWriterandCsvBufferWriternow agree on\r.Verification
netstandard2.0/net8.0/net9.0, 0 errors.Follow-up to the review on #127; that PR remains terminology-only / behavior-neutral.
🤖 Generated with Claude Code