Ethan, In PHP5 you can do Xpath on non-XHTML HTML. You just need to set it to ignore the errors. I discovered this when trying to do some screen scraping. The HH I found helpful is at http://blogoscoped.com/archive/2004_06_23_index.html#108802750834787821 Now sure it is helpful to you, but I thought I'd mention it. Karen On 12/7/09 10:48 AM, "Ethan Gruber" <[log in to unmask]> wrote: > Thanks for the input so far. > > Ben, another problem with digestibility of the search results is that it's > not XHTML, and therefore not well-formed XML, making it impossible to > process with XPath. > > I think I'll experiment with the Solr solution, but like the AutoSuggester > being developed at OCLC, the index would be fairly static unless there was a > way to pull updates from LOC into it. > > Ethan > > On Mon, Dec 7, 2009 at 11:43 AM, LeVan,Ralph <[log in to unmask]> wrote: > >> Here in OCLC Research we've been experimenting with AutoSuggester >> services. The folks in charge of our copy of LCSH are considering >> putting up an AutoSuggester for that. I'll let you know in the next >> couple of days how that goes. >> >> To make the service work at keystroke speeds, we've had to precalculate >> the responses and load them into a database of their own. We walk >> through the database that we're building the AutoSuggester for, pulling >> out 4-tuples of data: the term to be suggested, the recordID associated >> with the term (in case multiple terms might be suggested for the same >> record), a weight for the term and a string of other arbitrary data to >> send along with the recommendation (in the case of our VIAF >> AutoSuggester, that's a list of authority control numbers). Those >> tuples are then evaluated to generate a list of the 10 best terms that >> match each keystroke combination and that list is turned into a record >> and the keystroke combination is the key to that record. We then load >> those records into a simple text database indexing on the keystroke >> combination. We front-end that database with a simple service and we're >> done. >> >> The only downside to this scheme is that the AutoSuggester database is >> relatively static. >> >> Ralph >> >>> -----Original Message----- >>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf >> Of >>> Ethan Gruber >>> Sent: Monday, December 07, 2009 11:14 AM >>> To: [log in to unmask] >>> Subject: Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web >> service >>> >>> It doesn't seem very efficient. It is taking me at least 30 seconds >> to load >>> a page of 'a*' in http://id.loc.gov/authorities/search/ >>> >>> On Mon, Dec 7, 2009 at 11:05 AM, Houghton,Andrew <[log in to unmask]> >>> wrote: >>> >>>>> From: Code for Libraries [mailto:[log in to unmask]] On >>> Behalf Of >>>>> Winona Salesky >>>>> Sent: Monday, December 07, 2009 11:00 AM >>>>> To: [log in to unmask] >>>>> Subject: Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web >>>>> service >>>>> >>>>> Quoting Ethan Gruber <[log in to unmask]>: >>>>> >>>>>> I have a need to integrate the LCSH terms into a web form that >> uses >>>>>> auto-suggest to control the vocabulary. Is this technically >> possible >>>>> with >>>>>> the id.loc.gov service? >>>> >>>> Why can't you just add a "*" to the end of the data in your search >> form >>>> and send the request to the id.loc.gov search, per: >>>> >>>> <http://id.loc.gov/authorities/techcenter/searching.html> >>>> >>>> then parse the response? >>>> >>>> >>>> Andy. >>>> >> -- Karen A. Coombs Head of Libraries' Web Services University of Houston 114 University Libraries Houston, TX 77204-2000 Phone: (713) 743-3713 Fax: (713) 743-9811 Email: [log in to unmask]