On Jul 22, 2017, at 8:47 AM, Andreas Orphanides <[log in to unmask]> wrote: >> How might I write an SQL query to find all the descriptions of good kings >> and bad kings in my database? >> >> I have a table that looks something like this: >> >> key word part_of_speech position >> --- --------- -------------- -------- >> w1 The article 1 >> w2 righteous adjective 2 >> w3 king noun 3 >> w4 solved verb 4 >> w5 all adjective 5 >> w6 the article 6 >> w7 problems noun 7 >> w8 Your noun 8 >> w9 evil adjective 9 >> w10 king noun 10 >> w11 over adverb 11 >> w12 taxes verb 12 >> w13 us pronoun 13 > > In a perfect world, your parser would include the referents for the > adjectives, which would solve the problem lickety split, but I assume > that's not an option here. Are you ok with only discovering adjective-noun > pairs?... > > Add a sequentially numbered autogen column that's pure integer. (Unless you > have string manipulation -- in which case you could just work on the "key" > column). > > Then something like: > > SELECT * FROM table WHERE autogen IN (SELECT (autogen-1) FROM table WHERE > word = "king") AND part_of_speech=“adjective" Resolved, and the secret sauce was the SQL qualifier IN. To find all positions of the word “king”, I can do this: SELECT position FROM words WHERE word=“king”; To find all the positions prior to the word “king”, I can do this: SELECT position-1 FROM words WHERE word=“king”; I can then use the IN qualifier to use the results of the previous query as input for a subsequent query. Thus I can find all the words prior to the word “king”; SELECT word FROM words WHERE position IN (SELECT position-1 FROM words WHERE word=“king”); I can then further limit my query to only list adjectives: SELECT word FROM words WHERE part_of_speach=“adjective” AND position IN (SELECT position-1 FROM words WHERE word=“king”); I can then count and group the results as an ordered list: SELECT COUNT(word) AS c, word FROM words WHERE part_of_speach=“adjective” AND position IN (SELECT position-1 FROM words WHERE word=“king”) GROUP BY word ORDER BY c DESC; Since my database includes the words from my entire corpus (60,000,000 words), I need to further limit my query to document identifiers. Moreover, my parts-of-speech are not as simple as “noun”, “verb”, and “adjective”. Without going into too much detail, my cool query ultimately looks like this: SELECT COUNT(word) AS c, word FROM words WHERE did='A05406' AND pos LIKE 'j%' AND (position IN (SELECT position-1 FROM words WHERE (word='king' AND did='A05406'))) GROUP by word ORDER by c DESC; In this particular case, kings are described as both earthly and good. I can then search my index for documents of interest, and then I can feed my SQL query an identifier as well as a few words. For example, a humorous document on the topic of love describes men and women: [1] SELECT COUNT(word) AS c, word FROM words WHERE did='A13520' AND pos LIKE 'j%' AND (position IN (SELECT position-1 FROM words WHERE((LOWER(word)='man' OR LOWER(word)='men' OR LOWER(word)='woman' OR LOWER(word)='women') AND did='A13520'))) GROUP BY word ORDER BY c DESC; c word = ========= 5 old 5 young 4 poore 3 good 2 Old 1 Good 1 Great 1 Rich 1 Seruing 1 best 1 great 1 labouring 1 little 1 painfull 1 proper 1 singing 1 wise Apparently, when it comes to love, age doesn’t matter, men & women are poor, and at the same time, they are not very wise. ;-) Fun with SQL. Thank you. Code4lib++ [1] humorous document - http://cds.crc.nd.edu/freebo/A13/A13520/A13520.htm — Eric Morgan