Print

Print


Unfortunately, Eric's system will never learn of the glory of the King in
Yellow.

On Mon, Jul 24, 2017 at 7:26 AM, Timothy Cornwell <[log in to unmask]> wrote:

> The righteous crimson king solved all the problems Your evil red king over
> taxes us.
>
>
>
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Eric Lease Morgan
> Sent: Saturday, July 22, 2017 10:36 AM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] good kings & bad kings: an sql query [resolved]
>
> On Jul 22, 2017, at 8:47 AM, Andreas Orphanides <[log in to unmask]> wrote:
>
> >> How might I write an SQL query to find all the descriptions of good
> >> kings and bad kings in my database?
> >>
> >> I have a table that looks something like this:
> >>
> >>  key  word       part_of_speech  position
> >>  ---  ---------  --------------  --------
> >>  w1   The        article         1
> >>  w2   righteous  adjective       2
> >>  w3   king       noun            3
> >>  w4   solved     verb            4
> >>  w5   all        adjective       5
> >>  w6   the        article         6
> >>  w7   problems   noun            7
> >>  w8   Your       noun            8
> >>  w9   evil       adjective       9
> >>  w10  king       noun            10
> >>  w11  over       adverb          11
> >>  w12  taxes      verb            12
> >>  w13  us         pronoun         13
> >
> > In a perfect world, your parser would include the referents for the
> > adjectives, which would solve the problem lickety split, but I assume
> > that's not an option here. Are you ok with only discovering
> > adjective-noun pairs?...
> >
> > Add a sequentially numbered autogen column that's pure integer.
> > (Unless you have string manipulation -- in which case you could just
> work on the "key"
> > column).
> >
> > Then something like:
> >
> > SELECT * FROM table WHERE autogen IN (SELECT (autogen-1) FROM table
> > WHERE word = "king") AND part_of_speech=“adjective"
>
>
> Resolved, and the secret sauce was the SQL qualifier IN.
>
> To find all positions of the word “king”, I can do this:
>
>   SELECT position FROM words WHERE word=“king”;
>
> To find all the positions prior to the word “king”, I can do this:
>
>   SELECT position-1 FROM words WHERE word=“king”;
>
> I can then use the IN qualifier to use the results of the previous query
> as input for a subsequent query. Thus I can find all the words prior to the
> word “king”;
>
>   SELECT word
>   FROM words
>   WHERE position IN (SELECT position-1 FROM words WHERE word=“king”);
>
> I can then further limit my query to only list adjectives:
>
>   SELECT word
>   FROM words
>   WHERE part_of_speach=“adjective”
>   AND position IN (SELECT position-1 FROM words WHERE word=“king”);
>
> I can then count and group the results as an ordered list:
>
>   SELECT COUNT(word) AS c, word
>   FROM words
>   WHERE part_of_speach=“adjective”
>   AND position IN (SELECT position-1 FROM words WHERE word=“king”)
>   GROUP BY word
>   ORDER BY c DESC;
>
> Since my database includes the words from my entire corpus (60,000,000
> words), I need to further limit my query to document identifiers. Moreover,
> my parts-of-speech are not as simple as “noun”, “verb”, and “adjective”.
> Without going into too much detail, my cool query ultimately looks like
> this:
>
>   SELECT COUNT(word) AS c, word
>   FROM words
>   WHERE did='A05406' AND pos LIKE 'j%'
>   AND (position IN (SELECT position-1 FROM words WHERE (word='king' AND
> did='A05406')))
>   GROUP by word
>   ORDER by c DESC;
>
> In this particular case, kings are described as both earthly and good.
>
> I can then search my index for documents of interest, and then I can feed
> my SQL query an identifier as well as a few words. For example, a humorous
> document on the topic of love describes men and women: [1]
>
>   SELECT COUNT(word) AS c, word FROM words WHERE did='A13520' AND
>   pos LIKE 'j%' AND (position IN (SELECT position-1 FROM words
>   WHERE((LOWER(word)='man' OR LOWER(word)='men' OR
>   LOWER(word)='woman' OR LOWER(word)='women') AND did='A13520')))
>   GROUP BY word ORDER BY c DESC;
>
>   c  word
>   =  =========
>   5  old
>   5  young
>   4  poore
>   3  good
>   2  Old
>   1  Good
>   1  Great
>   1  Rich
>   1  Seruing
>   1  best
>   1  great
>   1  labouring
>   1  little
>   1  painfull
>   1  proper
>   1  singing
>   1  wise
>
> Apparently, when it comes to love, age doesn’t matter, men & women are
> poor, and at the same time, they are not very wise.  ;-)
>
> Fun with SQL. Thank you. Code4lib++
>
> [1] humorous document - http://cds.crc.nd.edu/freebo/A13/A13520/A13520.htm
>
> —
> Eric Morgan
>