Print

Print


Arash,
Yes, we have made WorldCat available to researchers under a special
license agreement. I suggest contacting Thom Hickey<[log in to unmask]>
about such an arrangement. Thanks,
Roy

On Fri, May 18, 2012 at 3:46 AM, Arash.Joorabchi <[log in to unmask]> wrote:
> Dear Karen,
>
> I am conducting a research experiment on automatic text classification and I am trying to retrieve top matching bib records (which include DDC fields) for a set of keyphrases extracted from a given document. So, I suppose this is a rather exceptional use case. In fact, the right approach for this experiment is to process the full dump of WorldCat database directly rather than sending a limited number of queries via the API.
>
> I read here:
> http://dltj.org/article/worldcat-lld-may-become-available under-odc-by/
> that WorldCat might become available as open linked data in future, which would solve my problem and help similar text mining projects. However, I wonder if it is currently available to researchers under a research/non-commercial use license agreement.
>
> Regards,
> Arash
>
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Karen Coombs
> Sent: 17 May 2012 08:37
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set
>
> I forwarded this thread to the Product Manager for the WorldCat Search
> API. She responded back that unfortunately this query is not possible
> using the API at this time.
>
> FYI, the SRU interface to WorldCat Search API doesn't currently
> support any scan type searches either.
>
> Is there a particular use case you're trying to support? Know that
> would help us document this as a possible enhancement.
>
> Karen
>
> Karen Coombs
> Senior Product Analyst
> Web Services
> OCLC
> [log in to unmask]
>
> On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi <[log in to unmask]> wrote:
>> Hi Andy,
>>
>>
>>
>> I am a SRU newbie myself, so I don't know how this could be achieved
>> using scan operations and could not find much info on SRU website
>> (http://www.loc.gov/standards/sru/).
>>
>> As for the wildcards, according to this guide:
>> http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea
>> rchworldcatquickreference.pdf the symbols should be preceded by at least
>> 3 characters, and therefore clauses like:
>>
>>
>>
>> ... AND srw.dd=*
>>
>> ... AND srw.dd=?.*
>>
>> ... AND srw/dd=###.*
>>
>> ... AND srw/dd=?3.*
>>
>>
>>
>>
>>
>> do not work and result in the following error:
>>
>> Diagnostics
>>
>> Identifier:
>>
>> info:srw/diagnostic/1/9
>>
>> Meaning:
>>
>>
>>
>> Details:
>>
>>
>>
>> Message:
>>
>> Not enough chars in truncated term:Truncated words too short(9)
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Arash
>>
>>
>>
>> ________________________________
>>
>> From: Houghton,Andrew [mailto:[log in to unmask]]
>> Sent: 16 May 2012 11:58
>> To: Arash.Joorabchi
>> Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records
>> without a DDC no from the result set
>>
>>
>>
>> I'm not an SRU guru, but is it possible to do a scan and look for a
>> postings of zero?
>>
>>
>>
>> Andy.
>>
>> On May 16, 2012, at 6:39, "Arash.Joorabchi" <[log in to unmask]>
>> wrote:
>>
>>        Hi mark,
>>
>>        Srw.dd=* does not work either:
>>
>>        Identifier:     info:srw/diagnostic/1/27
>>        Meaning:
>>        Details:        srw.dd
>>        Message:        The index [srw.dd] did not include a searchable
>> value
>>
>>        I suppose the only option left is to retrieve everything and
>> filter the results on the client side.
>>
>>        Thanks for your quick reply.
>>        Arash
>>
>>
>>        -----Original Message-----
>>        From: Code for Libraries [mailto:[log in to unmask]] On
>> Behalf Of Mike Taylor
>>        Sent: 16 May 2012 10:43
>>        To: [log in to unmask]
>>        Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of
>> records without a DDC no from the result set
>>
>>        There is no standard way in CQL to express "field X is not
>> empty".
>>        Depending on implementations, NOT srw.dd="" might work (but
>> evidently
>>        doesn't in this case).  Another possibility is srw.dd=*, but
>> again
>>        that may or may not work, and might be appallingly inefficient
>> if it
>>        does.  NOT srw.dd=null will definitely not work: "null" is not a
>>        special word in CQL.
>>
>>        -- Mike.
>>
>>
>>        On 16 May 2012 10:32, Arash.Joorabchi <[log in to unmask]>
>> wrote:
>>        >  Hi all,
>>        >
>>        > I am sending SRU queries to the WorldCat in the following
>> form:
>>        >
>>        >
>>        >                String host =
>>        > "http://worldcat.org/webservices/catalog/search/";
>>        >            String query = "sru?query=srw.kw=\"" + keyword +
>> "\""
>>        >                                + " AND srw.ln exact \"eng\""
>>        >                                + " AND srw.mt all \"bks\""
>>        >                                + " AND srw.nt=\"" + keyword +
>> "\""
>>        >                                + "&servicelevel=full"
>>        >                                + "&maximumRecords=100"
>>        >                              + "&sortKeys=relevance,,0"
>>        >                                + "&wskey=[wskey]";
>>        >
>>        > And it is working fine, however I'd like to limit the results
>> to those
>>        > records that have a DDC number assigned to them, but I don't
>> know what's
>>        > the right way to specify this limit in the query.
>>        >
>>        >  NOT srw.dd=""
>>        >  NOT srw.dd=null
>>        >
>>        > Neither of above work
>>        >
>>        >
>>        > Thanks,
>>        > Arash
>>        >
>>
>> ________________________________
>>
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 2012.0.2176 / Virus Database: 2425/5001 - Release Date:
>> 05/15/12
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2012.0.2176 / Virus Database: 2425/5004 - Release Date: 05/16/12