On Tue, Aug 30, 2011 at 12:56 PM, Ken Irwin <[log in to unmask]> wrote: > I have a feeling it may be time for me to learn some grown-up programming > skills, and I hope someone here might be able to help. > > I'm basically building a big associative array encoding the name of the > borrowing institution, the patron type (student,faculty,staff,etc), the item > barcode, and the various bits of data we want for each of these items. > > // the first time we see a barcode > $stats[$inst][$patron_type][$item_barcode][first] = $date; > $stats[$inst][$patron_type][$item_barcode][min] = $renewals; > $stats[$inst][$patron_type][$item_barcode][call] = $call_no; > //subsequent instances of the barcode > $stats[$inst][$patron_type][$item_barcode][max] = $renewals; > > Once I've chewed over all 4million records (40MB) , I spit it out into a > new MySQL table that has the collated data that I want. Unfortunately this > system breaks down when I get to so many millions of records. > > Is there a more efficient way of doing this kind of data transformation? I > *could* not keep so much in the big stats array and instead make millions of > "UPDATE" calls to the MySQL table, but that doesn't sound like a winning > proposition to me. I imagine that I could also increase the memory > allotment, but it will eventually get to big too. Or I suppose that I could > do it all in chunks - right now I'm parsing the whole raw-data SQL table at > once; I could do one institution's data at a time and buy myself some > wiggle-room. Sometimes it can make sense to use the database to do the aggregation; e.g. CREATE TABLE Summary AS SELECT inst,patron_type,item_barcode,min(date) "first", min(call_no),min(renewals) "min_renewals", max(renewals) "max_renewals" FROM Renewals GROUP BY inst,patron_type,item_barcode;