Check the regexp that Google uses in their call number normalization: http://code.google.com/p/library-callnumber-lc/wiki/Home You may want to remove the prefix part, and allow for a fourth cutter. The folks at UNC pointed me to this a few months ago. -Tod On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: > Does anyone have a good regular expression that will match all legal LC > Call Numbers from the LC Classified Schedule, but will generally not > match things that could not possibly be an LC Call Number from the LC > Classified Schedule? > > In particular, I need it to NOT match an "MLC" call number, which is an > LC assigned call number that shows up in an 050 with no way to > distinguish based on indicators, but isn't actually from the LC > Schedules. Here's an example of an "MLC" call number: > > "MLCS 83/5180 (P)" > > Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can > exclude them just like that. But it looks like there are also OTHER > things that can show up in the 050 but aren't actually from the > classified schedule, the OCLC documentation even contains an example of > "Microfilm 19072 E". > > What a mess, huh? So, yeah, regex anyone? > > [You can probably guess why I care if it's from the LC Classified > Schedule or not]. Tod Olson <[log in to unmask]> Systems Librarian University of Chicago Library