Yep - It's the same with MySQL - where the POSITION function is a synonym for the LOCATE function e.g. LOCATE(substr,str,pos) where pos is an optional offset in str. If not specified, pos is one. It returns 0 when substr is not present in str. Thanks again, Peter On Wed, 24 Nov 2004 13:36:19 -0500, Mike Rylander <[log in to unmask]> wrote: > On Wed, 24 Nov 2004 18:27:50 +0000, Peter Corrigan > <[log in to unmask]> wrote: > > Ideally we'd need to take account of word proximity by default, even for > > non-quoted input, and have proximity rank above a naive AND hit. > > I suspect most people do not quote their Google input. > > > > Cool Mike. Just pasted your valuable SQL into mysql, changed the > > field and table names... and got great recall! > > Glad you liked it! After testing it against my data (with Postgresql, > it may be different in MySQL) I realized that POSITION returns a > 1-based positions, so you'll want to change the '>=' tests to just > '>'. > > > > > -- > > > > Peter Corrigan > > Head of Library Systems > > National University of Ireland, Galway > > > > Office: +353-91-492497 > > Mob: +353-87-2798505 > > [log in to unmask] > > > > > > > > > > On Wed, 24 Nov 2004 10:06:59 -0500, Mike Rylander <[log in to unmask]> wrote: > > > On Wed, 24 Nov 2004 09:21:31 -0500, Ross Singer > > > <[log in to unmask]> wrote: > > > > What do you think is more appropriate (and intuitive) for a search > > > > engine if the user gives no boolean, "and" or "or"? > > > > > > > > I guess my question is, assuming it's a keyword search, and the user > > > > types in "institute paper science", would it be more appropriate to > > > > default to "institute AND paper AND science" or "institute OR paper OR > > > > science". > > > > > > IMHO, the logical thing would be to OR the terms together, then count > > > the keyword matches for each item and use that as the first component > > > in the sort. Of course, that's assuming that the search itself wasn't > > > quoted. if the actual string was quoted, then the terms should be > > > ANDed. Here is some (inefficient) SQL to show what I mean: > > > > > > Search string: paper science > > > SQL: SELECT recordid, text, ( CASE WHEN POSITION('paper' IN text) >= 0 > > > THEN 1 ELSE 0 END + CASE WHEN POSITION('science' IN text) >= 0 THEN 1 > > > ELSE 0 END ) AS rank FROM keyword_table WHERE LOWER(text) LIKE > > > '%paper%' OR LOWER(text) LIKE '%science%' ORDER BY 3 DESC, 2 ASC; > > > > > > Search String: institute "paper science" > > > SQL: SELECT recordid, text, ( CASE WHEN POSITION('paper' IN text) >= 0 > > > THEN 1 ELSE 0 END + CASE WHEN POSITION('science' IN text) >= 0 THEN 1 > > > ELSE 0 END + CASE WHEN POSITION('institute' IN text) >= 0 THEN 1 ELSE > > > 0 END) AS rank FROM keyword_table WHERE (LOWER(text) LIKE '%paper%' > > > AND LOWER(text) LIKE '%science%') OR LOWER(text) LIKE '%institute%' > > > ORDER BY 3 DESC, 2 ASC; > > > > > > Now, that only counts each matching word once per searched string, but > > > you get the idea. > > > > > > > > > > > I'm just sort of curious what other people's take on this might be. > > > > > > I am to. This is just my take on it, and I'm a programmer, not a > > > librarian, so perhaps I'm not the best person to answer the question > > > ;) > > > > > > > > > > > Thanks, > > > > -Ross. > > > > > > > > > > -- > > > Mike Rylander > > > [log in to unmask] > > > GPLS -- PINES Development > > > Database Developer > > > > > > > > > > > > -- > > Peter Corrigan > > Head of Library Systems > > National University of Ireland, Galway > > > > Office: +353-91-492497 > > Mob: +353-87-2798505 > > [log in to unmask] > > > > > -- > > > Mike Rylander > [log in to unmask] > GPLS -- PINES Development > Database Developer > -- Peter Corrigan Head of Library Systems National University of Ireland, Galway Office: +353-91-492497 Mob: +353-87-2798505 [log in to unmask]