On May 18, 2012, at 6:46 AM, Arash.Joorabchi wrote:
> Dear Karen,
>
> I am conducting a research experiment on automatic text classification and I am trying to retrieve top matching bib records (which include DDC fields) for a set of keyphrases extracted from a given document. So, I suppose this is a rather exceptional use case. In fact, the right approach for this experiment is to process the full dump of WorldCat database directly rather than sending a limited number of queries via the API.
>
> I read here:
> http://dltj.org/article/worldcat-lld-may-become-available under-odc-by/
> that WorldCat might become available as open linked data in future, which would solve my problem and help similar text mining projects. However, I wonder if it is currently available to researchers under a research/non-commercial use license agreement.
Why not use Open Library's dataset (which is freely available with no restrictions)?
http://openlibrary.org/developers/dumps
-Ross.
>
> Regards,
> Arash
>
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Karen Coombs
> Sent: 17 May 2012 08:37
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set
>
> I forwarded this thread to the Product Manager for the WorldCat Search
> API. She responded back that unfortunately this query is not possible
> using the API at this time.
>
> FYI, the SRU interface to WorldCat Search API doesn't currently
> support any scan type searches either.
>
> Is there a particular use case you're trying to support? Know that
> would help us document this as a possible enhancement.
>
> Karen
>
> Karen Coombs
> Senior Product Analyst
> Web Services
> OCLC
> [log in to unmask]
>
> On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi <[log in to unmask]> wrote:
>> Hi Andy,
>>
>>
>>
>> I am a SRU newbie myself, so I don't know how this could be achieved
>> using scan operations and could not find much info on SRU website
>> (http://www.loc.gov/standards/sru/).
>>
>> As for the wildcards, according to this guide:
>> http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea
>> rchworldcatquickreference.pdf the symbols should be preceded by at least
>> 3 characters, and therefore clauses like:
>>
>>
>>
>> ... AND srw.dd=*
>>
>> ... AND srw.dd=?.*
>>
>> ... AND srw/dd=###.*
>>
>> ... AND srw/dd=?3.*
>>
>>
>>
>>
>>
>> do not work and result in the following error:
>>
>> Diagnostics
>>
>> Identifier:
>>
>> info:srw/diagnostic/1/9
>>
>> Meaning:
>>
>>
>>
>> Details:
>>
>>
>>
>> Message:
>>
>> Not enough chars in truncated term:Truncated words too short(9)
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Arash
>>
>>
>>
>> ________________________________
>>
>> From: Houghton,Andrew [mailto:[log in to unmask]]
>> Sent: 16 May 2012 11:58
>> To: Arash.Joorabchi
>> Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records
>> without a DDC no from the result set
>>
>>
>>
>> I'm not an SRU guru, but is it possible to do a scan and look for a
>> postings of zero?
>>
>>
>>
>> Andy.
>>
>> On May 16, 2012, at 6:39, "Arash.Joorabchi" <[log in to unmask]>
>> wrote:
>>
>> Hi mark,
>>
>> Srw.dd=* does not work either:
>>
>> Identifier: info:srw/diagnostic/1/27
>> Meaning:
>> Details: srw.dd
>> Message: The index [srw.dd] did not include a searchable
>> value
>>
>> I suppose the only option left is to retrieve everything and
>> filter the results on the client side.
>>
>> Thanks for your quick reply.
>> Arash
>>
>>
>> -----Original Message-----
>> From: Code for Libraries [mailto:[log in to unmask]] On
>> Behalf Of Mike Taylor
>> Sent: 16 May 2012 10:43
>> To: [log in to unmask]
>> Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of
>> records without a DDC no from the result set
>>
>> There is no standard way in CQL to express "field X is not
>> empty".
>> Depending on implementations, NOT srw.dd="" might work (but
>> evidently
>> doesn't in this case). Another possibility is srw.dd=*, but
>> again
>> that may or may not work, and might be appallingly inefficient
>> if it
>> does. NOT srw.dd=null will definitely not work: "null" is not a
>> special word in CQL.
>>
>> -- Mike.
>>
>>
>> On 16 May 2012 10:32, Arash.Joorabchi <[log in to unmask]>
>> wrote:
>> > Hi all,
>> >
>> > I am sending SRU queries to the WorldCat in the following
>> form:
>> >
>> >
>> > String host =
>> > "http://worldcat.org/webservices/catalog/search/";
>> > String query = "sru?query=srw.kw=\"" + keyword +
>> "\""
>> > + " AND srw.ln exact \"eng\""
>> > + " AND srw.mt all \"bks\""
>> > + " AND srw.nt=\"" + keyword +
>> "\""
>> > + "&servicelevel=full"
>> > + "&maximumRecords=100"
>> > + "&sortKeys=relevance,,0"
>> > + "&wskey=[wskey]";
>> >
>> > And it is working fine, however I'd like to limit the results
>> to those
>> > records that have a DDC number assigned to them, but I don't
>> know what's
>> > the right way to specify this limit in the query.
>> >
>> > NOT srw.dd=""
>> > NOT srw.dd=null
>> >
>> > Neither of above work
>> >
>> >
>> > Thanks,
>> > Arash
>> >
>>
>> ________________________________
>>
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 2012.0.2176 / Virus Database: 2425/5001 - Release Date:
>> 05/15/12
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2012.0.2176 / Virus Database: 2425/5004 - Release Date: 05/16/12
|