Well, not being a MySQL whiz myself, I thought maybe the query was being needlessly slowed down by having DISTINCT specified in both parts of the query. In testing a similar query on one of my databases, the double DISTINCT query did run quite slowly. I tried modifying it to a form like this:
SELECT distinct institution from renewals
WHERE institution not in
(SELECT institution FROM `renewals` WHERE snap_date < '2011-07-01')
It ran much faster. But, so did rerunning the original query.
There is a page in the MySQL documentation about some limitations and some optimizer issues with subqueries:
http://dev.mysql.com/doc/refman/5.1/en/subquery-restrictions.html
Genny Engel
Sonoma County Library
[log in to unmask]
707 545-0831 x581
www.sonomalibrary.org
-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Ken Irwin
Sent: Wednesday, September 28, 2011 9:41 AM
To: [log in to unmask]
Subject: [CODE4LIB] mysql subquery response time
Hi all,
I've not done much with MySQL subqueries, and I'm trying right now with what I find to be surprising results. I wonder if someone can help me understand.
I have a pile of data that with columns for "institution" and "date". Institution gets repeated a lot, with many different dates. I want to select all the institutions that *only* have dates after July 1 and don't appear in the table before that. My solution was to do a first query for all the institutions that DO have dates before July 1
SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'
And then to do a SELECT query on all the institutions:
SELECT distinct institution from renewals
And then try to do a NOT IN subquery subtracting the smaller query from the larger one:
SELECT distinct institution from renewals
WHERE institution not in
(SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01')
...only it doesn't seem to work. Or rather, the query has been running for several minutes and never comes back with an answer. Each of these two queries takes just a few milliseconds to run on its own.
Can someone tell me (a) am I just formatting the query wrong, (b) do subqueries like this just take forever, and/or (c) is there a better way to do this? (I don't really understand about JOIN queries, but from what I can tell they are only for mixing the results of two different tables so I think they might not apply here.)
Any advice would be most welcome.
Thanks
Ken
|