LISTSERV 16.5 - CODE4LIB Archives

Modern Information retrieval is certainly an authoritative text on IR. You might be good searching for ACM SIGIR references on TFxIDF or BM25 by Stephen Robertson and Karen sparck-jones. From the 70s. These were retrieval algorithms that significantly improved precision and recall of keyword search systems. 

These algorthms, or modern evolutions of, are embedded in things like solr/lucene now as standard. (apologies if you know all this).  

This would only apply really of you were doing fullyext indexing of databases. Retrieving by specific metadata in a field is a different ballgame. 

You can also look at average precision, which assesses the top N results or mean average precision that does takes a mean over M queries. In general it's assumed that if you increase recall (get more related results) you'll reduce precision (get more unrelated too) and so increasing precision (removing unrelated results) may reduce recall (removing some good ones by mistake)

Hope some of that us useful. And not patronising (I'm just sort of sitting on the fence of this email list). 

Max

Sent from my iPhone

On 3 Jun 2011, at 20:57, marijane white <[log in to unmask]> wrote:

> I think that's quite possible.
> 
> Here are a couple references I am familiar with.
> 
> Walker/Janes/Tenopir's Online Retrieval is a bit dated but it does discuss
> the subject of precision and recall in bibliographic database searching.
> http://books.google.com/books?id=Srn3Jg7O4XoC&lpg=PP1&pg=PP1#v=onepage&q&f=false
> 
> Beyond bibliographic databases, Baeza-Yates/Riberio-Neto's discusses the
> subject in a broader context.
> http://www.amazon.com/Modern-Information-Retrieval-Concepts-Technology/dp/0321416910
> 
> 
> -marijane
> 
> On Fri, Jun 3, 2011 at 10:53 AM, Fleming, Declan <[log in to unmask]> wrote:
> 
>> Hi - I'm wondering if she is using a definition of "database" that seems to
>> be common in libraries, that means "a resource on the web that we pay for".
>> 
>> D
>> 
>> -----Original Message-----
>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
>> Alain Borel
>> Sent: Friday, June 03, 2011 10:24 AM
>> To: [log in to unmask]
>> Subject: Re: [CODE4LIB] Precision and Recall
>> 
>> Dave Caroline <[log in to unmask]> a écrit :
>>> The questions seem related to search engines or should you be googling
>>> for full text indexes or the other more correct name inverted index.
>>> Because in the normal scheme of events databases return exactly what
>>> you ask for.
>> 
>> One could argue that the same thing happens with search engines. After all,
>> both databases and search engines are deterministic programs that provide a
>> set of records in response to a query.
>> 
>> Precision and recall are not determined by what you ask - what defines them
>> is how relevant the output records are with respect to a real-life question.
>> It isn't tied to a technology. Of course, it can be more or less difficult
>> to translate this question into a query, and the program might be more or
>> less "smart" while processing the query.
>> Both aspects affect precision and recall, in my opinion.
>> 
>> Anybody who ever used a bibliographic database using Google-like queries
>> can testify that a database can have extremely poor precision and recall in
>> some use cases ;-)
>> 
>> Best regards,
>> Alain Borel
>> EPFL Bibliothèque
>> Rolex Learning Center
>> 1015 Lausanne (Switzerland)
>>