You could also try to use the code I put in SolrMarc utilities classes ha ha ha. - Naomi On Mar 31, 2011, at 10:25 AM, Keith Jenkins wrote: > The Google Code regex looks like it will accept any 1-3 letters at the > start of the call number. But LCC has no I, O, W, X, or Y > classifications. > > So you might want to use something more like ^[A-HJ-NP-VZ] at the > start of the regex. > > Also, there are only a few major classifications that use three > letters. Like DJK, and several in the Ks. I'm not sure, but there > might be others. > > Keith > > > On Thu, Mar 31, 2011 at 1:11 PM, Jonathan Rochkind > <[log in to unmask]> wrote: >> Except now I wonder if those annoying MLCS call numbers might >> actually be >> properly MATCHED by this regex, when I need em excluded. They are >> annoying >> _similar_ to a classified call number. Well, one way to find out. >> >> And the reason this matters is to try and use an LCC to map to a >> 'discipline' or other broad category, either directly from the LCC >> schedule >> labels, or using a mapping like umich's: >> http://www.lib.umich.edu/browse/categories/ >> >> But if it's not really an LCC at all, and you try to map it, you'll >> get bad >> postings. >> >> On 3/31/2011 1:03 PM, Jonathan Rochkind wrote: >>> >>> Thanks, that looks good! >>> >>> It's hosted on Google Code, but I don't think that code is anything >>> "Google uses", it looks like it's from our very own Bill Dueber. >>> >>> On 3/31/2011 12:38 PM, Tod Olson wrote: >>>> >>>> Check the regexp that Google uses in their call number >>>> normalization: >>>> >>>> http://code.google.com/p/library-callnumber-lc/wiki/Home >>>> >>>> You may want to remove the prefix part, and allow for a fourth >>>> cutter. >>>> >>>> The folks at UNC pointed me to this a few months ago. >>>> >>>> -Tod >>>> >>>> On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: >>>> >>>>> Does anyone have a good regular expression that will match all >>>>> legal LC >>>>> Call Numbers from the LC Classified Schedule, but will generally >>>>> not >>>>> match things that could not possibly be an LC Call Number from >>>>> the LC >>>>> Classified Schedule? >>>>> >>>>> In particular, I need it to NOT match an "MLC" call number, >>>>> which is an >>>>> LC assigned call number that shows up in an 050 with no way to >>>>> distinguish based on indicators, but isn't actually from the LC >>>>> Schedules. Here's an example of an "MLC" call number: >>>>> >>>>> "MLCS 83/5180 (P)" >>>>> >>>>> Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can >>>>> exclude them just like that. But it looks like there are also >>>>> OTHER >>>>> things that can show up in the 050 but aren't actually from the >>>>> classified schedule, the OCLC documentation even contains an >>>>> example of >>>>> "Microfilm 19072 E". >>>>> >>>>> What a mess, huh? So, yeah, regex anyone? >>>>> >>>>> [You can probably guess why I care if it's from the LC Classified >>>>> Schedule or not]. >>>> >>>> Tod Olson<[log in to unmask]> >>>> Systems Librarian >>>> University of Chicago Library >>>> >>