Thanks, that looks good! It's hosted on Google Code, but I don't think that code is anything "Google uses", it looks like it's from our very own Bill Dueber. On 3/31/2011 12:38 PM, Tod Olson wrote: > Check the regexp that Google uses in their call number normalization: > > http://code.google.com/p/library-callnumber-lc/wiki/Home > > You may want to remove the prefix part, and allow for a fourth cutter. > > The folks at UNC pointed me to this a few months ago. > > -Tod > > On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: > >> Does anyone have a good regular expression that will match all legal LC >> Call Numbers from the LC Classified Schedule, but will generally not >> match things that could not possibly be an LC Call Number from the LC >> Classified Schedule? >> >> In particular, I need it to NOT match an "MLC" call number, which is an >> LC assigned call number that shows up in an 050 with no way to >> distinguish based on indicators, but isn't actually from the LC >> Schedules. Here's an example of an "MLC" call number: >> >> "MLCS 83/5180 (P)" >> >> Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can >> exclude them just like that. But it looks like there are also OTHER >> things that can show up in the 050 but aren't actually from the >> classified schedule, the OCLC documentation even contains an example of >> "Microfilm 19072 E". >> >> What a mess, huh? So, yeah, regex anyone? >> >> [You can probably guess why I care if it's from the LC Classified >> Schedule or not]. > Tod Olson<[log in to unmask]> > Systems Librarian > University of Chicago Library >