Yes, it would be ok to check this in with a copy of the license alongside it - or even just a readme with a link to the license would be fine. The approach is a formulaic one, there are 10 records in there each built up using the same algorithm. You can add these to a store of legitimate marc data with very little chance of them appearing in search results unless searched for specifically. Taking just one field out of one record (in MarcEdit syntax)... =027 \\$ara0271a1r ra0271a2r ra0271a3r$zra0271z1r ra0271z2r ra0271z3r$6ra027161r ra027162r ra027163r$8ra027181r ra027182r ra027183r Each subfield contains 3 tokens (words) for subfield a: ra0271a1r ra0271a2r ra0271a3r 'r' at the start and end of each token is a padding character for testing truncation, 'a' is the record type, '027' the field, '1' the occurrence of that field, 'a' the subfield code, '1', '2', '3' the occurrence of the token in the subfield. This allows all of the truncation, word, phrase, completeness and position combinations to be tested separately - with just one record coming back for each. rob Rob Styles Programme Manager, Data Services, Talis tel: +44 (0)870 400 5000 fax: +44 (0)870 400 5001 direct: +44 (0)870 400 5004 mobile: +44 (0)7971 475 257 msn: [log in to unmask] irc: irc.freenode.net/mmmmmrob,isnick > -----Original Message----- > From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of > Erik Hatcher > Sent: 09 February 2007 10:18 > To: [log in to unmask] > Subject: Re: [CODE4LIB] Radioactive records for Solr > > On Feb 9, 2007, at 3:58 AM, Rob Styles wrote: > > Here's the set that I generated a while ago - it's quite big as it > > covers the full Marc21 field and subfield set for bibliographic > > records. > > I'm releasing these under the terms of our Talis Community License. > > (http://www.talis.com/tdn/tcl) > > IANAL, so to clarify this license, would it be ok for me to check > this into Solr's repository at Apache (keeping the license file > alongside)? > > I'm not quite sure what we'd do with this data just yet, as it looked > like gibberish at first blush, but looking at the document Peter > linked to it is by definition supposed to be this way and not overlap > with real data. > > > Would people be interested in a write-up of how we've used > > RadioactiveMarc and automated tests to validate Bath and US National > > Profile compliance? > > Absolutely. This would certainly factor into my Solr efforts in > crafting more automated tests. > > Erik The very latest from Talis read the latest news at www.talis.com/news listen to our podcasts www.talis.com/podcasts see us at these events www.talis.com/events join the discussion here www.talis.com/forums join our developer community www.talis.com/tdn and read our blogs www.talis.com/blogs Any views or personal opinions expressed within this email may not be those of Talis Information Ltd. The content of this email message and any files that may be attached are confidential, and for the usage of the intended recipient only. If you are not the intended recipient, then please return this message to the sender and delete it. Any use of this e-mail by an unauthorised recipient is prohibited. Talis Information Ltd is a member of the Talis Group of companies and is registered in England No 3638278 with its registered office at Knights Court, Solihull Parkway, Birmingham Business Park, B37 7YB.