Ideally we'd need to take account of word proximity by default, even for
non-quoted input, and have proximity rank above a naive AND hit.
I suspect most people do not quote their Google input.
Cool Mike. Just pasted your valuable SQL into mysql, changed the
field and table names... and got great recall!
--
Peter Corrigan
Head of Library Systems
National University of Ireland, Galway
Office: +353-91-492497
Mob: +353-87-2798505
[log in to unmask]
On Wed, 24 Nov 2004 10:06:59 -0500, Mike Rylander <[log in to unmask]> wrote:
> On Wed, 24 Nov 2004 09:21:31 -0500, Ross Singer
> <[log in to unmask]> wrote:
> > What do you think is more appropriate (and intuitive) for a search
> > engine if the user gives no boolean, "and" or "or"?
> >
> > I guess my question is, assuming it's a keyword search, and the user
> > types in "institute paper science", would it be more appropriate to
> > default to "institute AND paper AND science" or "institute OR paper OR
> > science".
>
> IMHO, the logical thing would be to OR the terms together, then count
> the keyword matches for each item and use that as the first component
> in the sort. Of course, that's assuming that the search itself wasn't
> quoted. if the actual string was quoted, then the terms should be
> ANDed. Here is some (inefficient) SQL to show what I mean:
>
> Search string: paper science
> SQL: SELECT recordid, text, ( CASE WHEN POSITION('paper' IN text) >= 0
> THEN 1 ELSE 0 END + CASE WHEN POSITION('science' IN text) >= 0 THEN 1
> ELSE 0 END ) AS rank FROM keyword_table WHERE LOWER(text) LIKE
> '%paper%' OR LOWER(text) LIKE '%science%' ORDER BY 3 DESC, 2 ASC;
>
> Search String: institute "paper science"
> SQL: SELECT recordid, text, ( CASE WHEN POSITION('paper' IN text) >= 0
> THEN 1 ELSE 0 END + CASE WHEN POSITION('science' IN text) >= 0 THEN 1
> ELSE 0 END + CASE WHEN POSITION('institute' IN text) >= 0 THEN 1 ELSE
> 0 END) AS rank FROM keyword_table WHERE (LOWER(text) LIKE '%paper%'
> AND LOWER(text) LIKE '%science%') OR LOWER(text) LIKE '%institute%'
> ORDER BY 3 DESC, 2 ASC;
>
> Now, that only counts each matching word once per searched string, but
> you get the idea.
>
> >
> > I'm just sort of curious what other people's take on this might be.
>
> I am to. This is just my take on it, and I'm a programmer, not a
> librarian, so perhaps I'm not the best person to answer the question
> ;)
>
> >
> > Thanks,
> > -Ross.
> >
>
> --
> Mike Rylander
> [log in to unmask]
> GPLS -- PINES Development
> Database Developer
>
>
--
Peter Corrigan
Head of Library Systems
National University of Ireland, Galway
Office: +353-91-492497
Mob: +353-87-2798505
[log in to unmask]
|