Unfortunately, Eric's system will never learn of the glory of the King in Yellow. On Mon, Jul 24, 2017 at 7:26 AM, Timothy Cornwell <[log in to unmask]> wrote: > The righteous crimson king solved all the problems Your evil red king over > taxes us. > > > > -----Original Message----- > From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of > Eric Lease Morgan > Sent: Saturday, July 22, 2017 10:36 AM > To: [log in to unmask] > Subject: Re: [CODE4LIB] good kings & bad kings: an sql query [resolved] > > On Jul 22, 2017, at 8:47 AM, Andreas Orphanides <[log in to unmask]> wrote: > > >> How might I write an SQL query to find all the descriptions of good > >> kings and bad kings in my database? > >> > >> I have a table that looks something like this: > >> > >> key word part_of_speech position > >> --- --------- -------------- -------- > >> w1 The article 1 > >> w2 righteous adjective 2 > >> w3 king noun 3 > >> w4 solved verb 4 > >> w5 all adjective 5 > >> w6 the article 6 > >> w7 problems noun 7 > >> w8 Your noun 8 > >> w9 evil adjective 9 > >> w10 king noun 10 > >> w11 over adverb 11 > >> w12 taxes verb 12 > >> w13 us pronoun 13 > > > > In a perfect world, your parser would include the referents for the > > adjectives, which would solve the problem lickety split, but I assume > > that's not an option here. Are you ok with only discovering > > adjective-noun pairs?... > > > > Add a sequentially numbered autogen column that's pure integer. > > (Unless you have string manipulation -- in which case you could just > work on the "key" > > column). > > > > Then something like: > > > > SELECT * FROM table WHERE autogen IN (SELECT (autogen-1) FROM table > > WHERE word = "king") AND part_of_speech=“adjective" > > > Resolved, and the secret sauce was the SQL qualifier IN. > > To find all positions of the word “king”, I can do this: > > SELECT position FROM words WHERE word=“king”; > > To find all the positions prior to the word “king”, I can do this: > > SELECT position-1 FROM words WHERE word=“king”; > > I can then use the IN qualifier to use the results of the previous query > as input for a subsequent query. Thus I can find all the words prior to the > word “king”; > > SELECT word > FROM words > WHERE position IN (SELECT position-1 FROM words WHERE word=“king”); > > I can then further limit my query to only list adjectives: > > SELECT word > FROM words > WHERE part_of_speach=“adjective” > AND position IN (SELECT position-1 FROM words WHERE word=“king”); > > I can then count and group the results as an ordered list: > > SELECT COUNT(word) AS c, word > FROM words > WHERE part_of_speach=“adjective” > AND position IN (SELECT position-1 FROM words WHERE word=“king”) > GROUP BY word > ORDER BY c DESC; > > Since my database includes the words from my entire corpus (60,000,000 > words), I need to further limit my query to document identifiers. Moreover, > my parts-of-speech are not as simple as “noun”, “verb”, and “adjective”. > Without going into too much detail, my cool query ultimately looks like > this: > > SELECT COUNT(word) AS c, word > FROM words > WHERE did='A05406' AND pos LIKE 'j%' > AND (position IN (SELECT position-1 FROM words WHERE (word='king' AND > did='A05406'))) > GROUP by word > ORDER by c DESC; > > In this particular case, kings are described as both earthly and good. > > I can then search my index for documents of interest, and then I can feed > my SQL query an identifier as well as a few words. For example, a humorous > document on the topic of love describes men and women: [1] > > SELECT COUNT(word) AS c, word FROM words WHERE did='A13520' AND > pos LIKE 'j%' AND (position IN (SELECT position-1 FROM words > WHERE((LOWER(word)='man' OR LOWER(word)='men' OR > LOWER(word)='woman' OR LOWER(word)='women') AND did='A13520'))) > GROUP BY word ORDER BY c DESC; > > c word > = ========= > 5 old > 5 young > 4 poore > 3 good > 2 Old > 1 Good > 1 Great > 1 Rich > 1 Seruing > 1 best > 1 great > 1 labouring > 1 little > 1 painfull > 1 proper > 1 singing > 1 wise > > Apparently, when it comes to love, age doesn’t matter, men & women are > poor, and at the same time, they are not very wise. ;-) > > Fun with SQL. Thank you. Code4lib++ > > [1] humorous document - http://cds.crc.nd.edu/freebo/A13/A13520/A13520.htm > > — > Eric Morgan >