Check the regexp that Google uses in their call number normalization:
You may want to remove the prefix part, and allow for a fourth cutter.
The folks at UNC pointed me to this a few months ago.
On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote:
> Does anyone have a good regular expression that will match all legal LC
> Call Numbers from the LC Classified Schedule, but will generally not
> match things that could not possibly be an LC Call Number from the LC
> Classified Schedule?
> In particular, I need it to NOT match an "MLC" call number, which is an
> LC assigned call number that shows up in an 050 with no way to
> distinguish based on indicators, but isn't actually from the LC
> Schedules. Here's an example of an "MLC" call number:
> "MLCS 83/5180 (P)"
> Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can
> exclude them just like that. But it looks like there are also OTHER
> things that can show up in the 050 but aren't actually from the
> classified schedule, the OCLC documentation even contains an example of
> "Microfilm 19072 E".
> What a mess, huh? So, yeah, regex anyone?
> [You can probably guess why I care if it's from the LC Classified
> Schedule or not].
Tod Olson <[log in to unmask]>
University of Chicago Library