Print

Print


Ethan,

In PHP5 you can do Xpath on non-XHTML HTML. You just need to set it to
ignore the errors. I discovered this when trying to do some screen scraping.
The HH I found helpful is at
http://blogoscoped.com/archive/2004_06_23_index.html#108802750834787821

Now sure it is helpful to you, but I thought I'd mention it.

Karen




On 12/7/09 10:48 AM, "Ethan Gruber" <[log in to unmask]> wrote:

> Thanks for the input so far.
> 
> Ben, another problem with digestibility of the search results is that it's
> not XHTML, and therefore not well-formed XML, making it impossible to
> process with XPath.
> 
> I think I'll experiment with the Solr solution, but like the AutoSuggester
> being developed at OCLC, the index would be fairly static unless there was a
> way to pull updates from LOC into it.
> 
> Ethan
> 
> On Mon, Dec 7, 2009 at 11:43 AM, LeVan,Ralph <[log in to unmask]> wrote:
> 
>> Here in OCLC Research we've been experimenting with AutoSuggester
>> services.  The folks in charge of our copy of LCSH are considering
>> putting up an AutoSuggester for that.  I'll let you know in the next
>> couple of days how that goes.
>> 
>> To make the service work at keystroke speeds, we've had to precalculate
>> the responses and load them into a database of their own.  We walk
>> through the database that we're building the AutoSuggester for, pulling
>> out 4-tuples of data: the term to be suggested, the recordID associated
>> with the term (in case multiple terms might be suggested for the same
>> record), a weight for the term and a string of other arbitrary data to
>> send along with the recommendation (in the case of our VIAF
>> AutoSuggester, that's a list of authority control numbers).  Those
>> tuples are then evaluated to generate a list of the 10 best terms that
>> match each keystroke combination and that list is turned into a record
>> and the keystroke combination is the key to that record.  We then load
>> those records into a simple text database indexing on the keystroke
>> combination.  We front-end that database with a simple service and we're
>> done.
>> 
>> The only downside to this scheme is that the AutoSuggester database is
>> relatively static.
>> 
>> Ralph
>> 
>>> -----Original Message-----
>>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf
>> Of
>>> Ethan Gruber
>>> Sent: Monday, December 07, 2009 11:14 AM
>>> To: [log in to unmask]
>>> Subject: Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web
>> service
>>> 
>>> It doesn't seem very efficient.  It is taking me at least 30 seconds
>> to load
>>> a page of 'a*' in http://id.loc.gov/authorities/search/
>>> 
>>> On Mon, Dec 7, 2009 at 11:05 AM, Houghton,Andrew <[log in to unmask]>
>>> wrote:
>>> 
>>>>> From: Code for Libraries [mailto:[log in to unmask]] On
>>> Behalf Of
>>>>> Winona Salesky
>>>>> Sent: Monday, December 07, 2009 11:00 AM
>>>>> To: [log in to unmask]
>>>>> Subject: Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web
>>>>> service
>>>>> 
>>>>> Quoting Ethan Gruber <[log in to unmask]>:
>>>>> 
>>>>>> I have a need to integrate the LCSH terms into a web form that
>> uses
>>>>>> auto-suggest to control the vocabulary.  Is this technically
>> possible
>>>>> with
>>>>>> the id.loc.gov service?
>>>> 
>>>> Why can't you just add a "*" to the end of the data in your search
>> form
>>>> and send the request to the id.loc.gov search, per:
>>>> 
>>>> <http://id.loc.gov/authorities/techcenter/searching.html>
>>>> 
>>>> then parse the response?
>>>> 
>>>> 
>>>> Andy.
>>>> 
>> 

-- 
Karen A. Coombs
Head of Libraries' Web Services
University of Houston
114 University Libraries
Houston, TX  77204-2000
Phone: (713) 743-3713
Fax: (713) 743-9811
Email: [log in to unmask]