The Google Code regex looks like it will accept any 1-3 letters at the start of the call number. But LCC has no I, O, W, X, or Y classifications. So you might want to use something more like ^[A-HJ-NP-VZ] at the start of the regex. Also, there are only a few major classifications that use three letters. Like DJK, and several in the Ks. I'm not sure, but there might be others. Keith On Thu, Mar 31, 2011 at 1:11 PM, Jonathan Rochkind <[log in to unmask]> wrote: > Except now I wonder if those annoying MLCS call numbers might actually be > properly MATCHED by this regex, when I need em excluded. They are annoying > _similar_ to a classified call number. Well, one way to find out. > > And the reason this matters is to try and use an LCC to map to a > 'discipline' or other broad category, either directly from the LCC schedule > labels, or using a mapping like umich's: > http://www.lib.umich.edu/browse/categories/ > > But if it's not really an LCC at all, and you try to map it, you'll get bad > postings. > > On 3/31/2011 1:03 PM, Jonathan Rochkind wrote: >> >> Thanks, that looks good! >> >> It's hosted on Google Code, but I don't think that code is anything >> "Google uses", it looks like it's from our very own Bill Dueber. >> >> On 3/31/2011 12:38 PM, Tod Olson wrote: >>> >>> Check the regexp that Google uses in their call number normalization: >>> >>> http://code.google.com/p/library-callnumber-lc/wiki/Home >>> >>> You may want to remove the prefix part, and allow for a fourth cutter. >>> >>> The folks at UNC pointed me to this a few months ago. >>> >>> -Tod >>> >>> On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: >>> >>>> Does anyone have a good regular expression that will match all legal LC >>>> Call Numbers from the LC Classified Schedule, but will generally not >>>> match things that could not possibly be an LC Call Number from the LC >>>> Classified Schedule? >>>> >>>> In particular, I need it to NOT match an "MLC" call number, which is an >>>> LC assigned call number that shows up in an 050 with no way to >>>> distinguish based on indicators, but isn't actually from the LC >>>> Schedules. Here's an example of an "MLC" call number: >>>> >>>> "MLCS 83/5180 (P)" >>>> >>>> Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can >>>> exclude them just like that. But it looks like there are also OTHER >>>> things that can show up in the 050 but aren't actually from the >>>> classified schedule, the OCLC documentation even contains an example of >>>> "Microfilm 19072 E". >>>> >>>> What a mess, huh? So, yeah, regex anyone? >>>> >>>> [You can probably guess why I care if it's from the LC Classified >>>> Schedule or not]. >>> >>> Tod Olson<[log in to unmask]> >>> Systems Librarian >>> University of Chicago Library >>> >