In PHP5 you can do Xpath on non-XHTML HTML. You just need to set it to
ignore the errors. I discovered this when trying to do some screen scraping.
The HH I found helpful is at
Now sure it is helpful to you, but I thought I'd mention it.
On 12/7/09 10:48 AM, "Ethan Gruber" <[log in to unmask]> wrote:
> Thanks for the input so far.
> Ben, another problem with digestibility of the search results is that it's
> not XHTML, and therefore not well-formed XML, making it impossible to
> process with XPath.
> I think I'll experiment with the Solr solution, but like the AutoSuggester
> being developed at OCLC, the index would be fairly static unless there was a
> way to pull updates from LOC into it.
> On Mon, Dec 7, 2009 at 11:43 AM, LeVan,Ralph <[log in to unmask]> wrote:
>> Here in OCLC Research we've been experimenting with AutoSuggester
>> services. The folks in charge of our copy of LCSH are considering
>> putting up an AutoSuggester for that. I'll let you know in the next
>> couple of days how that goes.
>> To make the service work at keystroke speeds, we've had to precalculate
>> the responses and load them into a database of their own. We walk
>> through the database that we're building the AutoSuggester for, pulling
>> out 4-tuples of data: the term to be suggested, the recordID associated
>> with the term (in case multiple terms might be suggested for the same
>> record), a weight for the term and a string of other arbitrary data to
>> send along with the recommendation (in the case of our VIAF
>> AutoSuggester, that's a list of authority control numbers). Those
>> tuples are then evaluated to generate a list of the 10 best terms that
>> match each keystroke combination and that list is turned into a record
>> and the keystroke combination is the key to that record. We then load
>> those records into a simple text database indexing on the keystroke
>> combination. We front-end that database with a simple service and we're
>> The only downside to this scheme is that the AutoSuggester database is
>> relatively static.
>>> -----Original Message-----
>>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf
>>> Ethan Gruber
>>> Sent: Monday, December 07, 2009 11:14 AM
>>> To: [log in to unmask]
>>> Subject: Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web
>>> It doesn't seem very efficient. It is taking me at least 30 seconds
>> to load
>>> a page of 'a*' in http://id.loc.gov/authorities/search/
>>> On Mon, Dec 7, 2009 at 11:05 AM, Houghton,Andrew <[log in to unmask]>
>>>>> From: Code for Libraries [mailto:[log in to unmask]] On
>>> Behalf Of
>>>>> Winona Salesky
>>>>> Sent: Monday, December 07, 2009 11:00 AM
>>>>> To: [log in to unmask]
>>>>> Subject: Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web
>>>>> Quoting Ethan Gruber <[log in to unmask]>:
>>>>>> I have a need to integrate the LCSH terms into a web form that
>>>>>> auto-suggest to control the vocabulary. Is this technically
>>>>>> the id.loc.gov service?
>>>> Why can't you just add a "*" to the end of the data in your search
>>>> and send the request to the id.loc.gov search, per:
>>>> then parse the response?
Karen A. Coombs
Head of Libraries' Web Services
University of Houston
114 University Libraries
Houston, TX 77204-2000
Phone: (713) 743-3713
Fax: (713) 743-9811
Email: [log in to unmask]