R would be ideal for the types data manipulation you describe and would
allow to automate the entire process. If you can share a sample of your
data and examples of the types of queries you're running, I'd be glad to
help you get started.
If you'd like to keep a relational database in your workflow, check out
sqlite. It's a file format rather than a database server, so won't be an
issue for your IT staff. There's a Firefox plug-in that provides basic
client functionality, and you can also easily access the tables from R
(directly) or for that matter with Access or Excel via odbc (not what I'd
recommend, but it's possible!).
Head, Library Data Lab
UC Berkeley Libraries
On Thu, Aug 6, 2015 at 6:05 AM, Harper, Cynthia <[log in to unmask]> wrote:
> I have compacted the database, and I'm using the Group By SQL query. I
> think I actually am hitting the 2GB limit, because of all the data I have
> for each row. I'm wondering if having added a field for reserves history
> notes, that that's treated as a fixed-length field for every record, rather
> than variable length, and just appearing for the small number of records
> that have been put on reserve. I suppose if I exported my data in two
> tables - bib and item data, the database would be much more efficient than
> the flat-file approach I've been using. Time to turn the mind back on,
> rather than just taking the lazy approach every time...
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Kevin Ford
> Sent: Wednesday, August 05, 2015 5:16 PM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] Processing Circ data
> On the surface, your difficulties suggest you may need look at a few
> optimization tactics. Apologies if these are things you've already
> considered and addressed - just offering a suggestion.
> This page  is for Access 2003 but the items under "Improve query
> performance" should apply - I think - to newer versions also. I'll draw
> specific attention to 1) Compacting the database; 2) making sure you have
> an index set up on the bib record number field and number of circs field;
> and 3) make sure you are using hte "Group by" sql syntax .
> Now, I'm not terribly familiar with Access so I can't actually help you
> with point/click instructions, but the above are common 'gotchas' that
> could be a problem regardless of RDBMS.
>  https://support.microsoft.com/en-us/kb/209126
>  http://www.w3schools.com/sql/sql_groupby.asp
> On 8/5/15 4:01 PM, Harper, Cynthia wrote:
> > Well, I guess it could be bad data, but I don't know how to tell. I
> think I've done more than this before.
> > I have a "Find duplicates" query that groups by bib record number. That
> query seemed to take about 40 minutes to process. Then I added a criterion
> to limit to only records that had >0 circs this year. That query displays
> the rotating cursor, then says "Not Responding", then the cursor, and loops
> through that for hours. Maybe I can find the Access bad data, but I'd be
> glad to find a more modern data analysis software. My db is 136,256 kb.
> But adding that extra query will probably put it over the 2GB mark. I've
> tried extracting to a csv, and that didn't work. Maybe I'll try a Make
> table to a separate db.
> > Or the OpenRefine suggestion sounds good too.
> > Cindy Harper
> > -----Original Message-----
> > From: Code for Libraries [mailto:[log in to unmask]] On Behalf
> > Of Kevin Ford
> > Sent: Wednesday, August 05, 2015 4:23 PM
> > To: [log in to unmask]
> > Subject: Re: [CODE4LIB] Processing Circ data
> > Hi Cindy,
> > This doesn't quite address your issue, but, unless you've hit the 2 GB
> Access size limit , Access can handle a good deal more than 250,000 item
> records ("rows," yes?) you cited.
> > What makes you think you've hit the limit? Slowness, something else?
> > All the best,
> > Kevin
> > 
> > https://support.office.com/en-us/article/Access-2010-specifications-1e
> > 521481-7f9a-46f7-8ed9-ea9dff1fa854
> > On 8/5/15 3:07 PM, Harper, Cynthia wrote:
> >> Hi all. What are you using to process circ data for ad-hoc queries. I
> usually extract csv or tab-delimited files - one row per item record, with
> identifying bib record data, then total checkouts over the given time
> period(s). I have been importing these into Access then grouping them by
> bib record. I think that I've reached the limits of scalability for Access
> for this project now, with 250,000 item records. Does anyone do this in
> R? My other go-to- software for data processing is RapidMiner free
> version. Or do you just use MySQL or other SQL database? I was looking
> into doing it in R with RSQLite (just read about this and sqldf
> http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/ ) because ... I'm
> rusty enough in R that if anyone will give me some start-off data import
> code, that would be great.
> >> Cindy Harper
> >> E-services and periodicals librarian
> >> Virginia Theological Seminary
> >> Bishop Payne Library
> >> 3737 Seminary Road
> >> Alexandria VA 22304
> >> [log in to unmask]<mailto:[log in to unmask]>
> >> 703-461-1794