Print

Print


Yep - It's the same with MySQL - where the POSITION function is a
synonym for the LOCATE function e.g. LOCATE(substr,str,pos) where pos
is an optional offset in str. If not specified, pos is one. It returns
0 when substr is not present in str.

Thanks again,
Peter

On Wed, 24 Nov 2004 13:36:19 -0500, Mike Rylander <[log in to unmask]> wrote:
> On Wed, 24 Nov 2004 18:27:50 +0000, Peter Corrigan
> <[log in to unmask]> wrote:
> > Ideally  we'd need to take account of word proximity by default, even for
> > non-quoted input, and have proximity rank above a naive AND hit.
> > I suspect most people do not quote their Google input.
> >
> > Cool Mike.  Just pasted your valuable SQL into mysql, changed the
> > field and table names... and got great recall!
>
> Glad you liked it!  After testing it against my data (with Postgresql,
> it may be different in MySQL) I realized that POSITION returns a
> 1-based positions, so you'll want to change the '>=' tests to just
> '>'.
>
> >
> > --
>
>
> > Peter Corrigan
> > Head of Library Systems
> > National University of Ireland, Galway
> >
> > Office: +353-91-492497
> > Mob:   +353-87-2798505
> > [log in to unmask]
> >
> >
> >
> >
> > On Wed, 24 Nov 2004 10:06:59 -0500, Mike Rylander <[log in to unmask]> wrote:
> > > On Wed, 24 Nov 2004 09:21:31 -0500, Ross Singer
> > > <[log in to unmask]> wrote:
> > > > What do you think is more appropriate (and intuitive) for a search
> > > > engine if the user gives no boolean, "and" or "or"?
> > > >
> > > > I guess my question is, assuming it's a keyword search, and the user
> > > > types in "institute paper science", would it be more appropriate to
> > > > default to "institute AND paper AND science" or "institute OR paper OR
> > > > science".
> > >
> > > IMHO, the logical thing would be to OR the terms together, then count
> > > the keyword matches for each item and use that as the first component
> > > in the sort.  Of course, that's assuming that the search itself wasn't
> > > quoted.  if the actual string was quoted, then the terms should be
> > > ANDed.  Here is some (inefficient) SQL to show what I mean:
> > >
> > > Search string: paper science
> > > SQL: SELECT recordid, text, ( CASE WHEN POSITION('paper' IN text) >= 0
> > > THEN 1 ELSE 0 END + CASE WHEN POSITION('science' IN text) >= 0 THEN 1
> > > ELSE 0 END ) AS rank FROM keyword_table WHERE LOWER(text) LIKE
> > > '%paper%' OR LOWER(text) LIKE '%science%' ORDER BY 3 DESC, 2 ASC;
> > >
> > > Search String: institute "paper science"
> > > SQL: SELECT recordid, text, ( CASE WHEN POSITION('paper' IN text) >= 0
> > > THEN 1 ELSE 0 END + CASE WHEN POSITION('science' IN text) >= 0 THEN 1
> > > ELSE 0 END + CASE WHEN POSITION('institute' IN text) >= 0 THEN 1 ELSE
> > > 0 END) AS rank FROM keyword_table WHERE (LOWER(text) LIKE '%paper%'
> > > AND LOWER(text) LIKE '%science%') OR LOWER(text) LIKE '%institute%'
> > > ORDER BY 3 DESC, 2 ASC;
> > >
> > > Now, that only counts each matching word once per searched string, but
> > > you get the idea.
> > >
> > > >
> > > > I'm just sort of curious what other people's take on this might be.
> > >
> > > I am to.  This is just my take on it, and I'm a programmer, not a
> > > librarian, so perhaps I'm not the best person to answer the question
> > > ;)
> > >
> > > >
> > > > Thanks,
> > > > -Ross.
> > > >
> > >
> > > --
> > > Mike Rylander
> > > [log in to unmask]
> > > GPLS -- PINES Development
> > > Database Developer
> > >
> > >
> >
> >
> > --
> > Peter Corrigan
> > Head of Library Systems
> > National University of Ireland, Galway
> >
> > Office: +353-91-492497
> > Mob:   +353-87-2798505
> > [log in to unmask]
> >
>
>
> --
>
>
> Mike Rylander
> [log in to unmask]
> GPLS -- PINES Development
> Database Developer
>


--
Peter Corrigan
Head of Library Systems
National University of Ireland, Galway

Office: +353-91-492497
Mob:   +353-87-2798505
[log in to unmask]