Ideally we'd need to take account of word proximity by default, even for non-quoted input, and have proximity rank above a naive AND hit. I suspect most people do not quote their Google input. Cool Mike. Just pasted your valuable SQL into mysql, changed the field and table names... and got great recall! -- Peter Corrigan Head of Library Systems National University of Ireland, Galway Office: +353-91-492497 Mob: +353-87-2798505 [log in to unmask] On Wed, 24 Nov 2004 10:06:59 -0500, Mike Rylander <[log in to unmask]> wrote: > On Wed, 24 Nov 2004 09:21:31 -0500, Ross Singer > <[log in to unmask]> wrote: > > What do you think is more appropriate (and intuitive) for a search > > engine if the user gives no boolean, "and" or "or"? > > > > I guess my question is, assuming it's a keyword search, and the user > > types in "institute paper science", would it be more appropriate to > > default to "institute AND paper AND science" or "institute OR paper OR > > science". > > IMHO, the logical thing would be to OR the terms together, then count > the keyword matches for each item and use that as the first component > in the sort. Of course, that's assuming that the search itself wasn't > quoted. if the actual string was quoted, then the terms should be > ANDed. Here is some (inefficient) SQL to show what I mean: > > Search string: paper science > SQL: SELECT recordid, text, ( CASE WHEN POSITION('paper' IN text) >= 0 > THEN 1 ELSE 0 END + CASE WHEN POSITION('science' IN text) >= 0 THEN 1 > ELSE 0 END ) AS rank FROM keyword_table WHERE LOWER(text) LIKE > '%paper%' OR LOWER(text) LIKE '%science%' ORDER BY 3 DESC, 2 ASC; > > Search String: institute "paper science" > SQL: SELECT recordid, text, ( CASE WHEN POSITION('paper' IN text) >= 0 > THEN 1 ELSE 0 END + CASE WHEN POSITION('science' IN text) >= 0 THEN 1 > ELSE 0 END + CASE WHEN POSITION('institute' IN text) >= 0 THEN 1 ELSE > 0 END) AS rank FROM keyword_table WHERE (LOWER(text) LIKE '%paper%' > AND LOWER(text) LIKE '%science%') OR LOWER(text) LIKE '%institute%' > ORDER BY 3 DESC, 2 ASC; > > Now, that only counts each matching word once per searched string, but > you get the idea. > > > > > I'm just sort of curious what other people's take on this might be. > > I am to. This is just my take on it, and I'm a programmer, not a > librarian, so perhaps I'm not the best person to answer the question > ;) > > > > > Thanks, > > -Ross. > > > > -- > Mike Rylander > [log in to unmask] > GPLS -- PINES Development > Database Developer > > -- Peter Corrigan Head of Library Systems National University of Ireland, Galway Office: +353-91-492497 Mob: +353-87-2798505 [log in to unmask]