Just taking a stab in the dark:
-- set up a "copy field" in Solr. This basically takes the content from an existing field and creates a mirror of it.
-- apply some extra string processing to your copy field so that it splits and tokenizes the content on the "-" (e.g., "enemy of islam" and "haverford" become two tokens on the field)
Seriously, though, I'm not sure what you would do after you've tokenized it. You could set up some sort of faceted browse interface to show co-occuring terms, or something else. Maybe some other Solr folks out there have some better ideas.
On 2012-07-11, at 11:32 AM, Laurie Allen wrote:
> I'm working on a drupal site with a very complicated taxonomy.
> Backstory: A polisci professor and team of students designed this
> project first as a theoretcal exercise as part of a senior thesis
> double major in political science and computer science, and then as
> the project of a very devoted and smart student using drupal. It's
> both amazingly cool and technically complex. At this point, we are
> trying to help rein it in to the library servers and help support it
> so that new crops of students can maintain it without needing to be CS
> majors, and also to help them address a few issues and problems that
> have been discovered over the past year or so. My colleague and I are
> totally new to Drupal, and to this database. While he's working on the
> solr indexing, I'm trying to help figure out the taxonomy issue.
> See here: http://gtrp.haverford.edu/aqsi/aqsi/statements/mustafa-abu-al-yazids-interview-al-jazeera
> Basically, the site indexes the public statements of al-qaeda. Each
> statements is assigned a bunch of terms by students who have studied
> jihad and al-qaeda.
> Each term is composed of two parts.
> First part: a keyword from a controlled list of keywords - there are
> many of these and they include places, people, theories, and other
> things. So, "Afghanistan", "Barack Obama", and "media" are all
> Second part: a context from a much smaller (around 20) collection of
> contexts, including I guess how the keyword figures in this statement.
> Example include "area of jihad, enemy of islam, religious relations"
> and others.
> So, the full term would be "media - enemy of islam" for example. And
> each record includes a large number of these.
> Going forward, we'd ideally like to allow users of the site to find
> all three of the following:
> 1. Records that contain a particular two part term. (easy - that's
> what taxonomy is for)
> 2. A list of terms that begin with the first part so that they can
> select the modifier for it (also easy, if we make the second term a
> subterm or child of the first, this will work fine)
> 3. A list of terms that have the second part as a qualifier. So, for
> example, show me all terms in which anything is called an "enemy of
> islam" and then let me choose which keyword is referred to as an enemy
> of jihad and show me that record.
> It's that third one that we can't figure out. The only way we can
> think to accomplish this is to basically duplicate each entry so that
> we'd say "Haverford - enemy of islam" and "enemy of islam - Haverford"
> I think that will work, but since there are many statements, and each
> statement has many terms, this solution doesn't seem ideal. Do any of
> you have ideas?
> Thanks very much.
> Coordinator for Digital Scholarship and Services
> Haverford College Library
> 370 Lancaster Ave
> Haverford, PA 19041
> [log in to unmask]