Unfortunately, Eric's system will never learn of the glory of the King in
Yellow.
On Mon, Jul 24, 2017 at 7:26 AM, Timothy Cornwell <[log in to unmask]> wrote:
> The righteous crimson king solved all the problems Your evil red king over
> taxes us.
>
>
>
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Eric Lease Morgan
> Sent: Saturday, July 22, 2017 10:36 AM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] good kings & bad kings: an sql query [resolved]
>
> On Jul 22, 2017, at 8:47 AM, Andreas Orphanides <[log in to unmask]> wrote:
>
> >> How might I write an SQL query to find all the descriptions of good
> >> kings and bad kings in my database?
> >>
> >> I have a table that looks something like this:
> >>
> >> key word part_of_speech position
> >> --- --------- -------------- --------
> >> w1 The article 1
> >> w2 righteous adjective 2
> >> w3 king noun 3
> >> w4 solved verb 4
> >> w5 all adjective 5
> >> w6 the article 6
> >> w7 problems noun 7
> >> w8 Your noun 8
> >> w9 evil adjective 9
> >> w10 king noun 10
> >> w11 over adverb 11
> >> w12 taxes verb 12
> >> w13 us pronoun 13
> >
> > In a perfect world, your parser would include the referents for the
> > adjectives, which would solve the problem lickety split, but I assume
> > that's not an option here. Are you ok with only discovering
> > adjective-noun pairs?...
> >
> > Add a sequentially numbered autogen column that's pure integer.
> > (Unless you have string manipulation -- in which case you could just
> work on the "key"
> > column).
> >
> > Then something like:
> >
> > SELECT * FROM table WHERE autogen IN (SELECT (autogen-1) FROM table
> > WHERE word = "king") AND part_of_speech=“adjective"
>
>
> Resolved, and the secret sauce was the SQL qualifier IN.
>
> To find all positions of the word “king”, I can do this:
>
> SELECT position FROM words WHERE word=“king”;
>
> To find all the positions prior to the word “king”, I can do this:
>
> SELECT position-1 FROM words WHERE word=“king”;
>
> I can then use the IN qualifier to use the results of the previous query
> as input for a subsequent query. Thus I can find all the words prior to the
> word “king”;
>
> SELECT word
> FROM words
> WHERE position IN (SELECT position-1 FROM words WHERE word=“king”);
>
> I can then further limit my query to only list adjectives:
>
> SELECT word
> FROM words
> WHERE part_of_speach=“adjective”
> AND position IN (SELECT position-1 FROM words WHERE word=“king”);
>
> I can then count and group the results as an ordered list:
>
> SELECT COUNT(word) AS c, word
> FROM words
> WHERE part_of_speach=“adjective”
> AND position IN (SELECT position-1 FROM words WHERE word=“king”)
> GROUP BY word
> ORDER BY c DESC;
>
> Since my database includes the words from my entire corpus (60,000,000
> words), I need to further limit my query to document identifiers. Moreover,
> my parts-of-speech are not as simple as “noun”, “verb”, and “adjective”.
> Without going into too much detail, my cool query ultimately looks like
> this:
>
> SELECT COUNT(word) AS c, word
> FROM words
> WHERE did='A05406' AND pos LIKE 'j%'
> AND (position IN (SELECT position-1 FROM words WHERE (word='king' AND
> did='A05406')))
> GROUP by word
> ORDER by c DESC;
>
> In this particular case, kings are described as both earthly and good.
>
> I can then search my index for documents of interest, and then I can feed
> my SQL query an identifier as well as a few words. For example, a humorous
> document on the topic of love describes men and women: [1]
>
> SELECT COUNT(word) AS c, word FROM words WHERE did='A13520' AND
> pos LIKE 'j%' AND (position IN (SELECT position-1 FROM words
> WHERE((LOWER(word)='man' OR LOWER(word)='men' OR
> LOWER(word)='woman' OR LOWER(word)='women') AND did='A13520')))
> GROUP BY word ORDER BY c DESC;
>
> c word
> = =========
> 5 old
> 5 young
> 4 poore
> 3 good
> 2 Old
> 1 Good
> 1 Great
> 1 Rich
> 1 Seruing
> 1 best
> 1 great
> 1 labouring
> 1 little
> 1 painfull
> 1 proper
> 1 singing
> 1 wise
>
> Apparently, when it comes to love, age doesn’t matter, men & women are
> poor, and at the same time, they are not very wise. ;-)
>
> Fun with SQL. Thank you. Code4lib++
>
> [1] humorous document - http://cds.crc.nd.edu/freebo/A13/A13520/A13520.htm
>
> —
> Eric Morgan
>
|