Print

Print


On Aug 30, 2011, at 12:56 PM, Ken Irwin wrote:

> I have a feeling it may be time for me to learn some grown-up programming skills, and I hope someone here might be able to help.
> 
> I have a PHP script chewing over a large MySQL query. It's creating a handful of big associative arrays in the process, and punks out after the arrays get to 32MB.
> Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 242 bytes) in analysis.php on line 41
> 
> I'm basically building a big associative array encoding the name of the borrowing institution, the patron type (student,faculty,staff,etc), the item barcode, and the various bits of data we want for each of these items.
> 
> // the first time we see a barcode
>    $stats[$inst][$patron_type][$item_barcode][first] = $date;
>    $stats[$inst][$patron_type][$item_barcode][min] = $renewals;
>    $stats[$inst][$patron_type][$item_barcode][call] = $call_no;
> //subsequent instances of the barcode
>      $stats[$inst][$patron_type][$item_barcode][max] = $renewals;
> 
> Once I've chewed over all  4million records (40MB) , I spit it out into a new MySQL table that has the collated data that I want. Unfortunately this system breaks down when I get to so many millions of records.
> 
> Is there a more efficient way of doing this kind of data transformation? I *could* not keep so much in the big stats array and instead make millions of "UPDATE" calls to the MySQL table, but that doesn't sound like a winning proposition to me. I imagine that I could also increase the memory allotment, but it will eventually get to big too. Or I suppose that I could do it all in chunks - right now I'm parsing the whole raw-data SQL table at once; I could do one institution's data at a time and buy myself some wiggle-room.
> 
> But fundamentally, it seems to me that asking PHP to hold lots of data in an array might not be the most programmerly system; it's just what I've always done.


I don't know how you're getting the data in, so I don't know how much you need to keep in memory during the operation.  I'm also not familiar with how memory-efficient PHP is in general, so I can't recommend changes in that regard, either.

What I can advise is two things -- 

When you mentioned millions of mysql 'UPDATE' calls -- that's rarely efficient.

Instead, write it out to some sort of text file (tab delim, CSV, etc), and then use the 'LOAD DATA ... REPLACE' command:

	http://dev.mysql.com/doc/refman/5.1/en/load-data.html

For this to work, you need to have your primary keys set up correctly, as that's what it'll use for determining if it should do an INSERT or an UPDATE.

...

If you only need a little more than 32MB to deal with (and again, I don't know how memory efficient PHP is; it's possible that 40MB in mysql might be double the space in PHP, or even worse), you can adjust your php.ini file to change the memory limit:

	http://www.php.net/manual/en/ini.core.php#ini.memory-limit

Of course, this is only a stop-gap measure.

...

Personally, I'd go with a mix of these two, plus chunking it out by institution ... you should be able to get a list with a 'SELECT DISTINCT', and then just loop through 'em.


-Joe

ps. See, I made it all the way through the message without commenting that 'PHP' and 'grown-up programming' don't mix ... oh. crap.  Never mind.