Except now I wonder if those annoying MLCS call numbers might actually be properly MATCHED by this regex, when I need em excluded. They are annoying _similar_ to a classified call number. Well, one way to find out. And the reason this matters is to try and use an LCC to map to a 'discipline' or other broad category, either directly from the LCC schedule labels, or using a mapping like umich's: http://www.lib.umich.edu/browse/categories/ But if it's not really an LCC at all, and you try to map it, you'll get bad postings. On 3/31/2011 1:03 PM, Jonathan Rochkind wrote: > Thanks, that looks good! > > It's hosted on Google Code, but I don't think that code is anything > "Google uses", it looks like it's from our very own Bill Dueber. > > On 3/31/2011 12:38 PM, Tod Olson wrote: >> Check the regexp that Google uses in their call number normalization: >> >> http://code.google.com/p/library-callnumber-lc/wiki/Home >> >> You may want to remove the prefix part, and allow for a fourth cutter. >> >> The folks at UNC pointed me to this a few months ago. >> >> -Tod >> >> On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: >> >>> Does anyone have a good regular expression that will match all legal LC >>> Call Numbers from the LC Classified Schedule, but will generally not >>> match things that could not possibly be an LC Call Number from the LC >>> Classified Schedule? >>> >>> In particular, I need it to NOT match an "MLC" call number, which is an >>> LC assigned call number that shows up in an 050 with no way to >>> distinguish based on indicators, but isn't actually from the LC >>> Schedules. Here's an example of an "MLC" call number: >>> >>> "MLCS 83/5180 (P)" >>> >>> Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can >>> exclude them just like that. But it looks like there are also OTHER >>> things that can show up in the 050 but aren't actually from the >>> classified schedule, the OCLC documentation even contains an example of >>> "Microfilm 19072 E". >>> >>> What a mess, huh? So, yeah, regex anyone? >>> >>> [You can probably guess why I care if it's from the LC Classified >>> Schedule or not]. >> Tod Olson<[log in to unmask]> >> Systems Librarian >> University of Chicago Library >>