Hi again folks:
Many thanks for everyone's replies this week!
We figured it out here with y'all's help, & so I wanna share back what we
learned. Like folks suggested, the problem was some combination of line
breaks, commas & double quotes inside values. Maintaining the
breaks/characters isn't important for us. So I used OpenRefine to "trim
leading & trailing whitespace" & "collapse consecutive whitespace" at our
very messy OCR transcript column to clean it up some. & then used find &
replace at Excel to replace the breaks, commas & double quotes. & success!
đź‘Ťđź‘Ť
Thanks again for everyone's replies & personal messages & help!
Y'all are great folks. 🙏🙏
Cheers all,
Max
Maxwell Gray
https://maxgray20.com
On Tue, Jan 17, 2023 at 10:49 AM Geoffrey Spear <[log in to unmask]>
wrote:
> Nitpick: RFC 7111 (the standard for CSV files in MIME, although, as noted
> by others in this thread, CSV in practice might as well not have a standard
> at all since different tools feel free to do whatever the heck they want)
> says to always use Windows-style line endings in CSV files (CRLF), whatever
> system you're on.
>
> I don't see any explicit line ending handling in the github repo you linked
> to, but other things will likely expect this ending (including, by not
> limited to, Python's csv library which also has some suggestions for usage
> to fix things.)
>
> CSVImport also seems to raise an entirely different error message if it
> thinks your file isn't UTF-8, but it doesn't hurt to verify in advance.
> It's certainly one of the easier encodings to tell a bunch of bytes
> definitely isn't using, but I didn't verify whether CSVImport is actually
> doing that in a foolproof way...
>
> On Tue, Jan 17, 2023 at 10:08 AM Benjamin Armintor <[log in to unmask]>
> wrote:
>
> > Sometimes in a situation like this it's useful to look at the source of
> the
> > message. In the event that this is true here...
> >
> > This error is raised by the CSV import module here:
> >
> >
> https://github.com/omeka-s-modules/CSVImport/blob/30857ff5cbab31bb53713fc7c837b8c2c1247f6e/src/Source/AbstractSource.php#L89-L94
> > The checkNumberOfColumnsByRow function returning false (and triggering
> the
> > error) is here:
> >
> >
> https://github.com/omeka-s-modules/CSVImport/blob/30857ff5cbab31bb53713fc7c837b8c2c1247f6e/src/Source/CsvFile.php#L126-L144
> >
> > That function appears to verify two things:
> > 1. The iterator over the CSV is not empty
> > 2. The rows all have the same number of values as the header row had
> > headers
> >
> > The iterator in question is from an SplFileObject (
> > https://www.php.net/manual/en/class.splfileobject.php). So either your
> > uploaded file appears empty to the CSV reader, or some row has a number
> of
> > cells different from the header row. If I were you, before I started
> > digging into escaped values and what not, I would:
> > 1. Make sure I had the number of headers I intended
> > 2. Make sure my file is UTF8 encoded, and make sure I was using
> Unix-style
> > line endings (a single newline character)
> > 3. Make sure I didn't have empty lines (I notice that
> >
> >
> https://github.com/omeka-s-modules/CSVImport/blob/30857ff5cbab31bb53713fc7c837b8c2c1247f6e/src/Source/CsvFile.php#L189-L190
> > sets the SKIP_EMPTY flag, but not DROP_NEW_LINE), including the end of
> the
> > file
> >
> > If you've already done these things, I apologize for being so
> rudimentary -
> > but it's always good to verify the basic assumptions before you dive into
> > more elaborate data inspection.
> >
> > Good luck!
> > Ben
> >
> > PS: I know this stuff is frustrating - it might be worth opening an issue
> > on that CSVImport github repository to improve the error message!
> >
> > On Tue, Jan 17, 2023 at 9:15 AM Max <[log in to unmask]> wrote:
> >
> > > Hi again folks:
> > >
> > > Many thanks for everyone's replies yesterday evening! I retried fixing
> > via
> > > OpenRefine, & no success. & I'm using double quotes for CSV comma
> > > enclosure. So I don't think commas inside values are the problem. (I've
> > had
> > > success in the past with multiple different CSV files that included
> extra
> > > commas inside values & used double quotes for CSV comma enclosure, & so
> > > they weren't a problem.) RE: counting blanks in columns, Jackie Keith
> > > recommended using the COUNTBLANK formula at Excel/Google Sheets, which
> > was
> > > easy to use, but still no success. (I didn't find any blanks in the
> > data.)
> > > Anyway, I wanted to update folks, & say thanks again, everyone!
> > >
> > > Cheers all,
> > > Max
> > >
> > > Maxwell Gray
> > > https://maxgray20.com
> > >
> > >
> > > On Mon, Jan 16, 2023 at 8:59 PM Joe Hourclé <[log in to unmask]>
> > wrote:
> > >
> > > > > On Jan 16, 2023, at 7:26 PM, Max <[log in to unmask]> wrote:
> > > > >
> > > > > Hi code4lib folks:
> > > > >
> > > > > Does anyone know a tool or hack to help fix a problem at a CSV
> that's
> > > > > causing a "The rows are not all the same number of columns." error
> > when
> > > > > trying to import the CSV at a web application? I'm trying to use
> the
> > > CSV
> > > > > Import module <
> > https://omeka.org/s/docs/user-manual/modules/csvimport/
> > > >
> > > > at
> > > > > Omeka S. I've had success in the past with different CSV files. But
> > > some
> > > > > kinda problem at the CSV I'm trying to import right now is causing
> > this
> > > > > error, & reviewing the CSV in Excel & as plain text (literally
> > counting
> > > > > commas to confirm rows are the same number of columns) isn't
> helping.
> > > >
> > > > I’ve been known to do a find/replace on commas, then set the tab
> width
> > to
> > > > something very large and then look for the rows that don’t line up.
> > > >
> > > > But CSV is tricky, as it’s more than just commas that are
> significant,
> > > you
> > > > also have to consider quotations marks… which allow it so you can put
> > > > commas or line returns within a string field.
> > > >
> > > > -Joe
> > > >
> > > > Sent from a mobile device with a crappy on screen keyboard and
> > obnoxious
> > > > "autocorrect"
> > > >
> > >
> >
>
|