Print

Print


Solr _can_ use stemming, but to do it with POS would be flakey I'd think.  Is "work" a verb or noun?

Some of the (Solr-using) customers that I work with have done POS tagging (using tools like BasisTech Solr plugins for entity tagging).  Payloads can be assigned to terms during indexing and then used to weight the score when query terms match.  Lucene supports payloads and scoring based on them natively, but it requires some code to wire together.  Solr supports a little in terms of payloads, but to really use them effectively custom coding is needed.  See <https://issues.apache.org/jira/browse/SOLR-1485> for example.

	Erik

On Feb 22, 2011, at 09:02 , Cindy Harper wrote:

> It's not ironic - my post was musing inspired by your work.  I guess I
> wasn't sure if I understood your results. You were looking at the overall
> POS usage in the entire texts as a possible way of ranking the texts. I was
> wondering about POS of particular search terms - those that could take on
> several POS. A related question - does SOLR use stemming to widen the search
> to various POS?  Then would it be meaningful to rank the given texts by the
> POS of the actual search terms?  And has anyone looked at samples of user
> search terms - are they almost always noun phrases?  Just wanting to
> understand what you have explored.  And I probably should have added to your
> thread on NGC4LIB, rather than Code4lib - I tend to conflate them.
> 
> Cindy Harper, Systems Librarian
> Colgate University Libraries
> [log in to unmask]
> 315-228-7363
> 
> 
> 
> On Sat, Feb 19, 2011 at 5:42 PM, Eric Lease Morgan <[log in to unmask]> wrote:
> 
>> On Feb 19, 2011, at 11:26 AM, Cindy Harper wrote:
>> 
>>> I just was testing our discovery engine for any technical issues after a
>>> reboot. I was just using random single words, and one word I used was
>>> "correct".  Looking at the first ranked items, I wondered if there's some
>>> role for parts-of-speech in ranking hits - are nouns and , in this case,
>>> adjectives more indicative of aboutness than verbs?  The first items were
>>> "Miss Manners ...  excruciating correctly behavior", then a bunch of
>> govdocs
>>> on "an act to correct....".  I don't think there's any reason to prefer
>>> nouns over verbs, but I thought I'd throw the thought at you anyway.
>> 
>> 
>> 
>> Ironically, I was playing with parts-of-speech (POS) analysis the other
>> day. [1]
>> 
>> Using a pseudo-random sample of texts, I found there to be surprisingly
>> similar POS usage between texts. With such similarity, I thought it would be
>> difficult to use general POS as a means for ranking or sorting. On the other
>> hand, specific POS may be useful. For example, Thoreau was dominated by
>> first-person male pronouns but Austen was dominated by second person female
>> pronouns.
>> 
>> I think there is something to be explored here.
>> 
>> [1] POS - http://bit.ly/hsxD2i
>> 
>> --
>> Eric "Still Counting Tweets and Chats" Morgan
>>