Yes, it would be ok to check this in with a copy of the license
alongside it - or even just a readme with a link to the license would be

The approach is a formulaic one, there are 10 records in there each
built up using the same algorithm. You can add these to a store of
legitimate marc data with very little chance of them appearing in search
results unless searched for specifically.

Taking just one field out of one record (in MarcEdit syntax)...

=027  \\$ara0271a1r ra0271a2r ra0271a3r$zra0271z1r ra0271z2r
ra0271z3r$6ra027161r ra027162r ra027163r$8ra027181r ra027182r ra027183r

Each subfield contains 3 tokens (words) for subfield a:

ra0271a1r ra0271a2r ra0271a3r

'r' at the start and end of each token is a padding character for
testing truncation, 'a' is the record type, '027' the field, '1' the
occurrence of that field, 'a' the subfield code, '1', '2', '3' the
occurrence of the token in the subfield. This allows all of the
truncation, word, phrase, completeness and position combinations to be
tested separately - with just one record coming back for each.


Rob Styles
Programme Manager, Data Services, Talis
tel: +44 (0)870 400 5000
fax: +44 (0)870 400 5001
direct: +44 (0)870 400 5004
mobile: +44 (0)7971 475 257
msn: [log in to unmask]

> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf
> Erik Hatcher
> Sent: 09 February 2007 10:18
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] Radioactive records for Solr
> On Feb 9, 2007, at 3:58 AM, Rob Styles wrote:
> > Here's the set that I generated a while ago - it's quite big as it
> > covers the full Marc21 field and subfield set for bibliographic
> > records.
> > I'm releasing these under the terms of our Talis Community License.
> > (
> IANAL, so to clarify this license, would it be ok for me to check
> this into Solr's repository at Apache (keeping the license file
> alongside)?
> I'm not quite sure what we'd do with this data just yet, as it looked
> like gibberish at first blush, but looking at the document Peter
> linked to it is by definition supposed to be this way and not overlap
> with real data.
> > Would people be interested in a write-up of how we've used
> > RadioactiveMarc and automated tests to validate Bath and US National
> > Profile compliance?
> Absolutely.  This would certainly factor into my Solr efforts in
> crafting more automated tests.
>         Erik

The very latest from Talis
read the latest news at
listen to our podcasts
see us at these events
join the discussion here
join our developer community
and read our blogs

Any views or personal opinions expressed within this email may not be those of Talis Information Ltd. The content of this email message and any files that may be attached are confidential, and for the usage of the intended recipient only. If you are not the intended recipient, then please return this message to the sender and delete it. Any use of this e-mail by an unauthorised recipient is prohibited.

Talis Information Ltd is a member of the Talis Group of companies and is registered in England No 3638278 with its registered office at Knights Court, Solihull Parkway, Birmingham Business Park, B37 7YB.