Just taking a stab in the dark:
-- set up a "copy field" in Solr. This basically takes the content from an existing field and creates a mirror of it.
-- apply some extra string processing to your copy field so that it splits and tokenizes the content on the "-" (e.g., "enemy of islam" and "haverford" become two tokens on the field)
-- ???
-- Profit.
Seriously, though, I'm not sure what you would do after you've tokenized it. You could set up some sort of faceted browse interface to show co-occuring terms, or something else. Maybe some other Solr folks out there have some better ideas.
-Andrew
On 2012-07-11, at 11:32 AM, Laurie Allen wrote:
> Hi,
> I'm working on a drupal site with a very complicated taxonomy.
> Backstory: A polisci professor and team of students designed this
> project first as a theoretcal exercise as part of a senior thesis
> double major in political science and computer science, and then as
> the project of a very devoted and smart student using drupal. It's
> both amazingly cool and technically complex. At this point, we are
> trying to help rein it in to the library servers and help support it
> so that new crops of students can maintain it without needing to be CS
> majors, and also to help them address a few issues and problems that
> have been discovered over the past year or so. My colleague and I are
> totally new to Drupal, and to this database. While he's working on the
> solr indexing, I'm trying to help figure out the taxonomy issue.
>
> See here: http://gtrp.haverford.edu/aqsi/aqsi/statements/mustafa-abu-al-yazids-interview-al-jazeera
> Basically, the site indexes the public statements of al-qaeda. Each
> statements is assigned a bunch of terms by students who have studied
> jihad and al-qaeda.
>
> Each term is composed of two parts.
> First part: a keyword from a controlled list of keywords - there are
> many of these and they include places, people, theories, and other
> things. So, "Afghanistan", "Barack Obama", and "media" are all
> keywords.
> Second part: a context from a much smaller (around 20) collection of
> contexts, including I guess how the keyword figures in this statement.
> Example include "area of jihad, enemy of islam, religious relations"
> and others.
>
> So, the full term would be "media - enemy of islam" for example. And
> each record includes a large number of these.
>
> Going forward, we'd ideally like to allow users of the site to find
> all three of the following:
> 1. Records that contain a particular two part term. (easy - that's
> what taxonomy is for)
> 2. A list of terms that begin with the first part so that they can
> select the modifier for it (also easy, if we make the second term a
> subterm or child of the first, this will work fine)
> 3. A list of terms that have the second part as a qualifier. So, for
> example, show me all terms in which anything is called an "enemy of
> islam" and then let me choose which keyword is referred to as an enemy
> of jihad and show me that record.
>
> It's that third one that we can't figure out. The only way we can
> think to accomplish this is to basically duplicate each entry so that
> we'd say "Haverford - enemy of islam" and "enemy of islam - Haverford"
> I think that will work, but since there are many statements, and each
> statement has many terms, this solution doesn't seem ideal. Do any of
> you have ideas?
> Thanks very much.
> Laurie
> --
> Coordinator for Digital Scholarship and Services
> Haverford College Library
> 370 Lancaster Ave
> Haverford, PA 19041
> 610-896-4226
> [log in to unmask]
|