Also, just to be clear, the data file is a tab-delimited text file, not a
CSV (comma-separated quoted values) file. Whenever processing data it's
important to be clear about what format you are working with. I happen to
prefer tab-delimited text files over CSV myself, as in this case like in
many others, the data itself can have quotes, which can play havoc on a
program expecting them only as delimiters.
Roy
On Mon, Nov 25, 2013 at 9:49 AM, Joshua Gomez <[log in to unmask]> wrote:
> If all you want to do is add a tab to the beginning of each line, then you
> don't need to bother using the csv library. Just open your file, read it
> line by line, prepend a tab to each line and write it out again.
>
> src = open('noid_refworks.txt','rU')
> tgt = open('withid.txt', 'w')
>
> for line in src.readlines():
> line = '\t%s' % line
> tgt.write(line)
>
> -Joshua
>
> ________________________________________
> From: Code for Libraries <[log in to unmask]> on behalf of Bohyun
> Kim <[log in to unmask]>
> Sent: Monday, November 25, 2013 9:10 AM
> To: [log in to unmask]
> Subject: [CODE4LIB] Tab delimited file with Python CSV
>
> Hi all,
>
> I am new to Python and was wondering if I can get some help with my short
> script. What I would like the script to do is:
> (1) Read the tab delimited file generated by Refworks
> (2) Output exactly the same file but the blank column added in front.
> (This is for prepping the exported tab delimited file from refworks so
> that it can be imported into MySQL; so any suggestions in the line of
> timtoady would be also appreciated.)
>
> This is what I have so far. It works, but then in the output file, I end
> up getting some weird character in each line in the second column (first
> column in the original input file). I also don't really get what
> escapechar=' ' does or what I am supposed to put in there.
>
> import csv
> with open('noid_refworks.txt','rU') as csvinput:
> with open('withid.txt', 'w') as csvoutput:
> dialect = csv.Sniffer().sniff(csvinput.read(1024))
> csvinput.seek(0)
> reader = csv.reader(csvinput, dialect)
> writer = csv.writer(csvoutput, dialect, escapechar='\'',
> quoting=csv.QUOTE_NONE)
> for row in reader:
> writer.writerow(['\t']+row)
>
> A row in the original file is like this (Tab delimited and no quotations,
> some fields have commas and quotation marks inside.):
>
> Reference Type Authors, Primary Title Primary Periodical Full
> Periodical Abbrev Pub Year Pub Date Free From Volume Issue
> Start Page Other Pages Keywords Abstract Notes Personal
> Notes Authors, Secondary Title Secondary Edition Publisher
> Place Of Publication Authors, Tertiary Authors, Quaternary
> Authors, Quinary Title, Tertiary ISSN/ISBN Availability
> Author/Address Accession Number Language Classification Sub
> file/Database Original Foreign Title Links DOI Call Number
> Database Data Source Identifying Phrase Retrieved Date
> Shortened Title User 1 User 2 User 3 User 4 User 5 User
> 6 User 7 User 8 User 9 User 10 User 11 User 12 User 13
> User 14 User 15
>
> A row in the output file is like this:
> (The tab is successfully inserted. But I don't get why I have L inserted
> after no matter what I put in escapechar)
>
> LReference Type Authors, Primary Title Primary Periodical
> Full Periodical Abbrev Pub Year Pub Date Free From Volume
> Issue Start Page Other Pages Keywords Abstract Notes
> Personal Notes Authors, Secondary Title Secondary Edition
> Publisher Place Of Publication Authors, Tertiary Authors,
> Quaternary Authors, Quinary Title, Tertiary ISSN/ISBN
> Availability Author/Address Accession Number Language
> Classification Sub file/Database Original Foreign Title Links
> DOI Call Number Database Data Source Identifying Phrase
> Retrieved Date Shortened Title User 1 User 2 User 3 User 4
> User 5 User 6 User 7 User 8 User 9 User 10 User 11
> User 12 User 13 User 14 User 15
>
>
> Any help or pointers would be greatly appreciated!
> ~Bohyun
>
|