On Nov 25, 2013, at 1:05 PM, Jonathan Rochkind wrote:
> Ah, but what if the data itself has tabs! Doh!
> It can be a mess either way. There are standards (or conventions?) for escaping internal commas in CSV -- which doesn't mean the software that was used to produce the CSV, or the software you are using to read it, actually respects them.
You don't have to escape the commas, you just have to double-quote the string. If you want to have a double quote, you put two in a row:, eg:
"He said, ""hello"""
> But I'm not sure if there are even standards/conventions for escaping tabs in a tab-delimited text file?
None official ones that I'm aware of. I've seen some parsers that will consider a backslash before a delimiter to be an escape, but I don't know if there's an official spec for tab- / pipe- / whatever-delimited text.
> Really, the lesson to me is that you should always consider use an existing well-tested library for both reading and writing these files, whether CSV or tab-delimited -- even if you think "Oh, it's so simple, why bother than that." There will be edge cases. That you will discover only when they cause bugs, possibly after somewhat painful debugging. A well-used third-party library is less likely to have such edge case bugs.
Agreed, but in this case, it might be easier to bypass the library. (if you were using a library, you'd have to shift an empty element to the front of each row, then output it).
> I am more ruby than python; in ruby there is a library for reading and writing CSV in the stdlib. http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html
And I'm more perl, and generally lazy for this simple of an edit:
perl -pi -e 's/^/\t/' file_to_convert
(the '-p' tells it to apply the transformation to each line, '-i.bak' tells it to save the file with '.bak' appended before processing, "-e 's/^/\t/'" is to put a tab at the front of the line)