Hi Jonathan,
Although designed for a different purpose, you might want to take a look at the regex in the LC call number sorting utilities on this page: http://rocky.uta.edu/doran/sortlc/
Note that unparsable call numbers printed to STDERR with error message. So you could run it against a list containing valid and "MLC" call numbers and see which ones end up where, refine regexp, retry, rinse, and repeat. If you make significant (or any) improvements to the regexp being used, I'd be delighted to incorporate it back into those LC sort utilities.
-- Michael
# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [log in to unmask]
# http://rocky.uta.edu/doran/
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Jonathan Rochkind
> Sent: Thursday, March 31, 2011 11:29 AM
> To: [log in to unmask]
> Subject: [CODE4LIB] regexp for LCC?
>
> Does anyone have a good regular expression that will match all legal LC
> Call Numbers from the LC Classified Schedule, but will generally not
> match things that could not possibly be an LC Call Number from the LC
> Classified Schedule?
>
> In particular, I need it to NOT match an "MLC" call number, which is an
> LC assigned call number that shows up in an 050 with no way to
> distinguish based on indicators, but isn't actually from the LC
> Schedules. Here's an example of an "MLC" call number:
>
> "MLCS 83/5180 (P)"
>
> Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can
> exclude them just like that. But it looks like there are also OTHER
> things that can show up in the 050 but aren't actually from the
> classified schedule, the OCLC documentation even contains an example of
> "Microfilm 19072 E".
>
> What a mess, huh? So, yeah, regex anyone?
>
> [You can probably guess why I care if it's from the LC Classified
> Schedule or not].
|