Bibliocommons has done a lot of this with OPAC search logs - maybe someone from there could give a lightning talk. Peter ________________________________ From: Code for Libraries on behalf of K.G. Schneider Sent: Sat 2/3/2007 1:08 PM To: [log in to unmask] Subject: [CODE4LIB] search analytics, part deux Someone wrote to ask me what I mean by search analytics. Fair question. The blurb for Lou Rosenfeld and Rich Wiggins' forthcoming book pretty much does a good job of describing what I mean: http://www.rosenfeldmedia.com/books/searchanalytics/ "Any organization that has a searchable web site or intranet is sitting on top of hugely valuable and usually under-exploited data: logs that capture what users are searching for, how often each query was searched, and how many results each query retrieved. Search queries are gold: they are real data that show us exactly what users are searching for in their own words. This book shows you how to use search analytics to carry on a conversation with your customers: listen to and understand their needs, and improve your content, navigation and search performance to meet those needs." By "roll-your-own" analytics, I'm talking about taking techniques such as this: http://www.onlamp.com/pub/a/onlamp/2003/08/21/better_search_engine.html Or, from an in-house recipe we used last year, produce logs this way: Ingredients Timestamp, original query, normalized query, parameters, number of results, referring page, IP or session ID Procedure Timestamp: best format is year-month-day:hour:minute:second Original query: as entered by user Normalized query: after lowercasing, stemming, removal of field names, etc. Parameters: any field names, languages, character sets, etc. Nice to put the results page number in here Number of results: unique to search engine, 0 hits is very important Referring page: referer field, useful for locating confusing locations within the sites, external links, etc. IP or session ID: allows us to follow the progress of a multi-part query. session ID is far better for privacy considerations. Mix. Produce (at minimum) these reports: Top 1% of query terms (often 10-15% of all queries) top no-matches queries (0 results) top referring pages for search, both internal and external number and sources of empty queries --- Note that you don't have to run these queries continuously to get useful information. A strong sample can be invaluable. For that matter, if you're doing iterative evaluation-say, across vendor products-using the same terms is almost essential; I was turning into Jack from The Shining by the end of our search engine implementation at my Former Place Of Work, but the consistency was important. Karen G. Schneider Acting Associate Director of Libraries for Technology & Research Florida State University Email/AIM: [log in to unmask] Blog: http://quodvide.wordpress.com <http://quodvide.wordpress.com/> Phone: 850-644-5214 Cell: 850-590-3370