The righteous crimson king solved all the problems Your evil red king over taxes us.
-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Eric Lease Morgan
Sent: Saturday, July 22, 2017 10:36 AM
To: [log in to unmask]
Subject: Re: [CODE4LIB] good kings & bad kings: an sql query [resolved]
On Jul 22, 2017, at 8:47 AM, Andreas Orphanides <[log in to unmask]> wrote:
>> How might I write an SQL query to find all the descriptions of good
>> kings and bad kings in my database?
>>
>> I have a table that looks something like this:
>>
>> key word part_of_speech position
>> --- --------- -------------- --------
>> w1 The article 1
>> w2 righteous adjective 2
>> w3 king noun 3
>> w4 solved verb 4
>> w5 all adjective 5
>> w6 the article 6
>> w7 problems noun 7
>> w8 Your noun 8
>> w9 evil adjective 9
>> w10 king noun 10
>> w11 over adverb 11
>> w12 taxes verb 12
>> w13 us pronoun 13
>
> In a perfect world, your parser would include the referents for the
> adjectives, which would solve the problem lickety split, but I assume
> that's not an option here. Are you ok with only discovering
> adjective-noun pairs?...
>
> Add a sequentially numbered autogen column that's pure integer.
> (Unless you have string manipulation -- in which case you could just work on the "key"
> column).
>
> Then something like:
>
> SELECT * FROM table WHERE autogen IN (SELECT (autogen-1) FROM table
> WHERE word = "king") AND part_of_speech=“adjective"
Resolved, and the secret sauce was the SQL qualifier IN.
To find all positions of the word “king”, I can do this:
SELECT position FROM words WHERE word=“king”;
To find all the positions prior to the word “king”, I can do this:
SELECT position-1 FROM words WHERE word=“king”;
I can then use the IN qualifier to use the results of the previous query as input for a subsequent query. Thus I can find all the words prior to the word “king”;
SELECT word
FROM words
WHERE position IN (SELECT position-1 FROM words WHERE word=“king”);
I can then further limit my query to only list adjectives:
SELECT word
FROM words
WHERE part_of_speach=“adjective”
AND position IN (SELECT position-1 FROM words WHERE word=“king”);
I can then count and group the results as an ordered list:
SELECT COUNT(word) AS c, word
FROM words
WHERE part_of_speach=“adjective”
AND position IN (SELECT position-1 FROM words WHERE word=“king”)
GROUP BY word
ORDER BY c DESC;
Since my database includes the words from my entire corpus (60,000,000 words), I need to further limit my query to document identifiers. Moreover, my parts-of-speech are not as simple as “noun”, “verb”, and “adjective”. Without going into too much detail, my cool query ultimately looks like this:
SELECT COUNT(word) AS c, word
FROM words
WHERE did='A05406' AND pos LIKE 'j%'
AND (position IN (SELECT position-1 FROM words WHERE (word='king' AND did='A05406')))
GROUP by word
ORDER by c DESC;
In this particular case, kings are described as both earthly and good.
I can then search my index for documents of interest, and then I can feed my SQL query an identifier as well as a few words. For example, a humorous document on the topic of love describes men and women: [1]
SELECT COUNT(word) AS c, word FROM words WHERE did='A13520' AND
pos LIKE 'j%' AND (position IN (SELECT position-1 FROM words
WHERE((LOWER(word)='man' OR LOWER(word)='men' OR
LOWER(word)='woman' OR LOWER(word)='women') AND did='A13520')))
GROUP BY word ORDER BY c DESC;
c word
= =========
5 old
5 young
4 poore
3 good
2 Old
1 Good
1 Great
1 Rich
1 Seruing
1 best
1 great
1 labouring
1 little
1 painfull
1 proper
1 singing
1 wise
Apparently, when it comes to love, age doesn’t matter, men & women are poor, and at the same time, they are not very wise. ;-)
Fun with SQL. Thank you. Code4lib++
[1] humorous document - http://cds.crc.nd.edu/freebo/A13/A13520/A13520.htm
—
Eric Morgan
|