Print

Print


Ken,

   The list is locked at the moment, so this reply may take awhile to reach you, so sorry if this question has been answered already (or I'm too late).

   Anyway, here's how I'd format it (assuming I read the question right, that you want distinct institutions that have listing from July 1st 2011 on and have never been listed before then):

   SELECT distinct institution FROM renewals WHERE date >= '2011-07-01' AND institution NOT IN (SELECT distinct institution FROM renewals WHERE date < '2011-07-01');


David K. Uspal
Technology Development Specialist
Falvey Memorial Library
Phone: 610-519-8954
Email: [log in to unmask]





-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Ken Irwin
Sent: Wednesday, September 28, 2011 12:41 PM
To: [log in to unmask]
Subject: [CODE4LIB] mysql subquery response time

Hi all,

I've not done much with MySQL subqueries, and I'm trying right now with what I find to be surprising results. I wonder if someone can help me understand.

I have a pile of data that with columns for "institution" and "date". Institution gets repeated a lot, with many different dates. I want to select all the institutions that *only* have dates after July 1 and don't appear in the table before that. My solution was to do a first query for all the institutions that DO have dates before July 1
SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'

And then to do a SELECT query on all the institutions:
SELECT distinct institution from renewals

And then try to do a NOT IN subquery subtracting the smaller query from the larger one:

SELECT distinct institution from renewals
WHERE institution not in
(SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01')

...only it doesn't seem to work. Or rather, the query has been running for several minutes and never comes back with an answer. Each of these two queries takes just a few milliseconds to run on its own.

Can someone tell me (a) am I just formatting the query wrong, (b) do subqueries like this just take forever, and/or (c) is there a better way to do this? (I don't really understand about JOIN queries, but from what I can tell they are only for mixing the results of two different tables so I think they might not apply here.)

Any advice would be most welcome.

Thanks
Ken