LISTSERV 16.5 - CODE4LIB Archives

Bibliocommons has done a lot of this with OPAC search logs - maybe someone from there could give a lightning talk.

Peter


________________________________

From: Code for Libraries on behalf of K.G. Schneider
Sent: Sat 2/3/2007 1:08 PM
To: [log in to unmask]
Subject: [CODE4LIB] search analytics, part deux



Someone wrote to ask me what I mean by search analytics. Fair question.

The blurb for Lou Rosenfeld and Rich Wiggins' forthcoming book pretty much
does a good job of describing what I mean:

http://www.rosenfeldmedia.com/books/searchanalytics/

"Any organization that has a searchable web site or intranet is sitting on
top of hugely valuable and usually under-exploited data: logs that capture
what users are searching for, how often each query was searched, and how
many results each query retrieved. Search queries are gold: they are real
data that show us exactly what users are searching for in their own words.
This book shows you how to use search analytics to carry on a conversation
with your customers: listen to and understand their needs, and improve your
content, navigation and search performance to meet those needs."

By "roll-your-own" analytics, I'm talking about taking techniques such as
this:

http://www.onlamp.com/pub/a/onlamp/2003/08/21/better_search_engine.html

Or, from an in-house recipe we used last year, produce logs this way:

Ingredients

Timestamp, original query, normalized query, parameters, number of
results, referring page, IP or session ID

Procedure

Timestamp: best format is year-month-day:hour:minute:second

Original query: as entered by user

Normalized query: after lowercasing, stemming, removal of field names, etc.

Parameters: any field names, languages, character sets, etc. Nice to
put the results page number in here

Number of results: unique to search engine, 0 hits is very important

Referring page: referer field, useful for locating confusing
locations within the sites, external links, etc.

IP or session ID: allows us to follow the progress of a multi-part
query. session ID is far better for privacy considerations.

Mix. Produce (at minimum) these reports:

Top 1% of query terms (often 10-15% of all queries)

top no-matches queries (0 results)

top referring pages for search, both internal and external

number and sources of empty queries

---

Note that you don't have to run these queries continuously to get useful
information. A strong sample can be invaluable. For that matter, if you're
doing iterative evaluation-say, across vendor products-using the same terms
is almost essential; I was turning into Jack from The Shining by the end of
our search engine implementation at my Former Place Of Work, but the
consistency was important.


Karen G. Schneider
Acting Associate Director of Libraries for Technology & Research
Florida State University
Email/AIM: [log in to unmask]
Blog: http://quodvide.wordpress.com <http://quodvide.wordpress.com/>
Phone: 850-644-5214
Cell: 850-590-3370