rfc4180 - Standard way to write empty field to last column of CSV file - Stack Overflow

I'm writing data to CSV files for customers to consume.I've been asked to make the last col

I'm writing data to CSV files for customers to consume. I've been asked to make the last column a field which will be empty sometimes. Sometimes every row in the file will have an empty string for the last column. RFC 4180 says that CSV files may not end with a comma, so I'm concerned about breaking parsers. I don't know exactly how different customers will consume the files (e.g. what kinds of parsers they might use).

Example file: header row followed by two data rows

field1,field2,field3
abcdef,ghijkl,
aaaaaa,bbbbbb,cccccc

Is there a standard way of doing this? RFC 4180 mentions double-quoting fields with troublesome characters, but I didn't see it mention empty strings specifically. I'm wondering if a solution like this is likely to be supported by every parser, or whether this isn't necessarily standard:

field1,field2,field3
abcdef,ghijkl,""
aaaaaa,bbbbbb,cccccc

I'm writing data to CSV files for customers to consume. I've been asked to make the last column a field which will be empty sometimes. Sometimes every row in the file will have an empty string for the last column. RFC 4180 says that CSV files may not end with a comma, so I'm concerned about breaking parsers. I don't know exactly how different customers will consume the files (e.g. what kinds of parsers they might use).

Example file: header row followed by two data rows

field1,field2,field3
abcdef,ghijkl,
aaaaaa,bbbbbb,cccccc

Is there a standard way of doing this? RFC 4180 mentions double-quoting fields with troublesome characters, but I didn't see it mention empty strings specifically. I'm wondering if a solution like this is likely to be supported by every parser, or whether this isn't necessarily standard:

field1,field2,field3
abcdef,ghijkl,""
aaaaaa,bbbbbb,cccccc
Share Improve this question asked Mar 13 at 17:32 echawkesechawkes 4852 silver badges12 bronze badges 2
  • You can safely use the mode in your first example – aborruso Commented Mar 14 at 7:27
  • 1 Both solutions are correct in regard to the CSV file format spec, the first one being probably more widely supported – Fravadona Commented Mar 14 at 9:51
Add a comment  | 

2 Answers 2

Reset to default 2

The spec actually doesn't say it can't end in a comma it says:

The last field in the
       record must not be followed by a comma

So your example tells the parser there are still 3 fields, it's just that the last one is empty. That being said I've seen both styles: empty or double quotes and unfortunately a parser has to handle both.

Also worth mentioning, is not showing here are the hidden characters such as CRLF (Carriage return and line feed respectively). So even your first example, if you open in notepad++ or the like, and turn on "Show all characters" it may actually look like this:

field1,field2,field3CRLF
abcdef,ghijkl,CRLF
aaaaaa,bbbbbb,cccccc

(NOTE: Linux is likely just to have LF, where Windows will have CRLF).

So again, you're not technically ending the line in a comma and the CR and/or LF tell the parser this record is done, and move to the next line for the next record.

Big picture, you cannot count on all CSV parsers to do even the same thing, let alone the right thing:

Due to lack of a single specification, there are considerable differences among implementations.

I think you can make assumptions about common, popular ones, though, and in my experience, for either of these two inputs:

-- no quote --      -- empty quote --
Col1,Col2,Col3      Col1,Col2,Col3
aaaa,bbbb,          aaaa,bbbb,""
zzzz,yyyy,xxxx      zzzz,yyyy,xxxx

you can expect a good parser to produce a data structure, like:

[
  [ Col1, Col2, Col3 ],
  [ aaaa, bbbb,      ],
  [ zzzz, yyyy, xxxx ],
]

you can also expect that if you leave the trailing comma off:

Col1,Col2,Col3
aaaa,bbbb
zzzz,yyyy,xxxx

then the parser will see only two fields for the first record:

Each line should contain the same number of fields throughout the file.

Some parsers care about this discrepancy by default (e.g., Golang); some parsers can be configured to care (e.g., Deno's jsr:@std/csv, npm:csv-parser). I couldn't find an option in Python's csv module for this.

If that input did pass parsing, the consumer would most likely see some data like:

[
  [ Col1, Col2, Col3 ],
  [ aaaa, bbbb ],
  [ zzzz, yyyy, xxxx ],
]

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744687696a4588020.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信