Print

Print


Dear Karen,

I am conducting a research experiment on automatic text classification and I am trying to retrieve top matching bib records (which include DDC fields) for a set of keyphrases extracted from a given document. So, I suppose this is a rather exceptional use case. In fact, the right approach for this experiment is to process the full dump of WorldCat database directly rather than sending a limited number of queries via the API.

I read here: 
http://dltj.org/article/worldcat-lld-may-become-available under-odc-by/ 
that WorldCat might become available as open linked data in future, which would solve my problem and help similar text mining projects. However, I wonder if it is currently available to researchers under a research/non-commercial use license agreement.

Regards,
Arash

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Karen Coombs
Sent: 17 May 2012 08:37
To: [log in to unmask]
Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set

I forwarded this thread to the Product Manager for the WorldCat Search
API. She responded back that unfortunately this query is not possible
using the API at this time.

FYI, the SRU interface to WorldCat Search API doesn't currently
support any scan type searches either.

Is there a particular use case you're trying to support? Know that
would help us document this as a possible enhancement.

Karen

Karen Coombs
Senior Product Analyst
Web Services
OCLC
[log in to unmask]

On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi <[log in to unmask]> wrote:
> Hi Andy,
>
>
>
> I am a SRU newbie myself, so I don't know how this could be achieved
> using scan operations and could not find much info on SRU website
> (http://www.loc.gov/standards/sru/).
>
> As for the wildcards, according to this guide:
> http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea
> rchworldcatquickreference.pdf the symbols should be preceded by at least
> 3 characters, and therefore clauses like:
>
>
>
> ... AND srw.dd=*
>
> ... AND srw.dd=?.*
>
> ... AND srw/dd=###.*
>
> ... AND srw/dd=?3.*
>
>
>
>
>
> do not work and result in the following error:
>
> Diagnostics
>
> Identifier:
>
> info:srw/diagnostic/1/9
>
> Meaning:
>
>
>
> Details:
>
>
>
> Message:
>
> Not enough chars in truncated term:Truncated words too short(9)
>
>
>
>
>
> Thanks,
>
> Arash
>
>
>
> ________________________________
>
> From: Houghton,Andrew [mailto:[log in to unmask]]
> Sent: 16 May 2012 11:58
> To: Arash.Joorabchi
> Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records
> without a DDC no from the result set
>
>
>
> I'm not an SRU guru, but is it possible to do a scan and look for a
> postings of zero?
>
>
>
> Andy.
>
> On May 16, 2012, at 6:39, "Arash.Joorabchi" <[log in to unmask]>
> wrote:
>
>        Hi mark,
>
>        Srw.dd=* does not work either:
>
>        Identifier:     info:srw/diagnostic/1/27
>        Meaning:
>        Details:        srw.dd
>        Message:        The index [srw.dd] did not include a searchable
> value
>
>        I suppose the only option left is to retrieve everything and
> filter the results on the client side.
>
>        Thanks for your quick reply.
>        Arash
>
>
>        -----Original Message-----
>        From: Code for Libraries [mailto:[log in to unmask]] On
> Behalf Of Mike Taylor
>        Sent: 16 May 2012 10:43
>        To: [log in to unmask]
>        Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of
> records without a DDC no from the result set
>
>        There is no standard way in CQL to express "field X is not
> empty".
>        Depending on implementations, NOT srw.dd="" might work (but
> evidently
>        doesn't in this case).  Another possibility is srw.dd=*, but
> again
>        that may or may not work, and might be appallingly inefficient
> if it
>        does.  NOT srw.dd=null will definitely not work: "null" is not a
>        special word in CQL.
>
>        -- Mike.
>
>
>        On 16 May 2012 10:32, Arash.Joorabchi <[log in to unmask]>
> wrote:
>        >  Hi all,
>        >
>        > I am sending SRU queries to the WorldCat in the following
> form:
>        >
>        >
>        >                String host =
>        > "http://worldcat.org/webservices/catalog/search/";
>        >            String query = "sru?query=srw.kw=\"" + keyword +
> "\""
>        >                                + " AND srw.ln exact \"eng\""
>        >                                + " AND srw.mt all \"bks\""
>        >                                + " AND srw.nt=\"" + keyword +
> "\""
>        >                                + "&servicelevel=full"
>        >                                + "&maximumRecords=100"
>        >                              + "&sortKeys=relevance,,0"
>        >                                + "&wskey=[wskey]";
>        >
>        > And it is working fine, however I'd like to limit the results
> to those
>        > records that have a DDC number assigned to them, but I don't
> know what's
>        > the right way to specify this limit in the query.
>        >
>        >  NOT srw.dd=""
>        >  NOT srw.dd=null
>        >
>        > Neither of above work
>        >
>        >
>        > Thanks,
>        > Arash
>        >
>
> ________________________________
>
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2012.0.2176 / Virus Database: 2425/5001 - Release Date:
> 05/15/12

-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2176 / Virus Database: 2425/5004 - Release Date: 05/16/12