I may be wrong about this, but when you use a "not in" operator, you're
necessarily doing a table scan. Any indexes you have are not used in such
a query, so you'll definitely see a performance hit.
But if each independent query runs quickly, why not use a temporary table
to store the results of what you've got written as a subquery first, and
then use that presumably small, distinct list of institutions in the
subquery afterward? Something like this:
SELECT INTO #my_institutions
SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'
EXEC
SELECT distinct institution from renewals
WHERE institution not in
(SELECT institution from #my_institutions)
That assumes you're able to SELECT INTO, but if not, just create yourself
a my_institutions table first.
\-/-\-/-\-/-\-/-\-/-\-/-\-/-\-/-\-/-\-/-\-/-\-/
Scot Colford
Web Services Manager
Boston Public Library
[log in to unmask]
Phone 617.859.2399
Mobile 617.592.8669
Fax 617.536.7558
On 9/28/11 12:41 PM, "Ken Irwin" <[log in to unmask]> wrote:
>Hi all,
>
>I've not done much with MySQL subqueries, and I'm trying right now with
>what I find to be surprising results. I wonder if someone can help me
>understand.
>
>I have a pile of data that with columns for "institution" and "date".
>Institution gets repeated a lot, with many different dates. I want to
>select all the institutions that *only* have dates after July 1 and don't
>appear in the table before that. My solution was to do a first query for
>all the institutions that DO have dates before July 1
>SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'
>
>And then to do a SELECT query on all the institutions:
>SELECT distinct institution from renewals
>
>And then try to do a NOT IN subquery subtracting the smaller query from
>the larger one:
>
>SELECT distinct institution from renewals
>WHERE institution not in
>(SELECT distinct institution FROM `renewals` WHERE snap_date <
>'2011-07-01')
>
>...only it doesn't seem to work. Or rather, the query has been running
>for several minutes and never comes back with an answer. Each of these
>two queries takes just a few milliseconds to run on its own.
>
>Can someone tell me (a) am I just formatting the query wrong, (b) do
>subqueries like this just take forever, and/or (c) is there a better way
>to do this? (I don't really understand about JOIN queries, but from what
>I can tell they are only for mixing the results of two different tables
>so I think they might not apply here.)
>
>Any advice would be most welcome.
>
>Thanks
>Ken
|