Print

Print


To engage in a little code golf, the following python  script uses only the
standard library. The `csv` module is pretty good about understanding
various kinds of escapes and so forth.  This should help you find any lines
with an "unusual" number of fields.

#!/usr/bin/env python
>
> from csv import reader
> from statistics import mode
>
> with open('file.csv') as f:
>     lengths = [(i, len(row)) for i, row in enumerate(reader(f))]
>
> linemode = mode((x[1] for x in lengths))
> variants = [x for x in lengths if x[1] != linemode]
>
> for lineno, l in variants:
>     print("line %d has %d elements" % ( lineno, l))
>

HTH!

On Mon, Jan 16, 2023 at 9:59 PM Joe Hourclé <[log in to unmask]> wrote:

> > On Jan 16, 2023, at 7:26 PM, Max <[log in to unmask]> wrote:
> >
> > Hi code4lib folks:
> >
> > Does anyone know a tool or hack to help fix a problem at a CSV that's
> > causing a "The rows are not all the same number of columns." error when
> > trying to import the CSV at a web application? I'm trying to use the CSV
> > Import module <https://omeka.org/s/docs/user-manual/modules/csvimport/>
> at
> > Omeka S. I've had success in the past with different CSV files. But some
> > kinda problem at the CSV I'm trying to import right now is causing this
> > error, & reviewing the CSV in Excel & as plain text (literally counting
> > commas to confirm rows are the same number of columns) isn't helping.
>
> I’ve been known to do a find/replace on commas, then set the tab width to
> something very large and then look for the rows that don’t line up.
>
> But CSV is tricky, as it’s more than just commas that are significant, you
> also have to consider quotations marks… which allow it so you can put
> commas or line returns within a string field.
>
> -Joe
>
> Sent from a mobile device with a crappy on screen keyboard and obnoxious
> "autocorrect"
>