The Google Code regex looks like it will accept any 1-3 letters at the
start of the call number. But LCC has no I, O, W, X, or Y
classifications.
So you might want to use something more like ^[A-HJ-NP-VZ] at the
start of the regex.
Also, there are only a few major classifications that use three
letters. Like DJK, and several in the Ks. I'm not sure, but there
might be others.
Keith
On Thu, Mar 31, 2011 at 1:11 PM, Jonathan Rochkind <[log in to unmask]> wrote:
> Except now I wonder if those annoying MLCS call numbers might actually be
> properly MATCHED by this regex, when I need em excluded. They are annoying
> _similar_ to a classified call number. Well, one way to find out.
>
> And the reason this matters is to try and use an LCC to map to a
> 'discipline' or other broad category, either directly from the LCC schedule
> labels, or using a mapping like umich's:
> http://www.lib.umich.edu/browse/categories/
>
> But if it's not really an LCC at all, and you try to map it, you'll get bad
> postings.
>
> On 3/31/2011 1:03 PM, Jonathan Rochkind wrote:
>>
>> Thanks, that looks good!
>>
>> It's hosted on Google Code, but I don't think that code is anything
>> "Google uses", it looks like it's from our very own Bill Dueber.
>>
>> On 3/31/2011 12:38 PM, Tod Olson wrote:
>>>
>>> Check the regexp that Google uses in their call number normalization:
>>>
>>> http://code.google.com/p/library-callnumber-lc/wiki/Home
>>>
>>> You may want to remove the prefix part, and allow for a fourth cutter.
>>>
>>> The folks at UNC pointed me to this a few months ago.
>>>
>>> -Tod
>>>
>>> On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote:
>>>
>>>> Does anyone have a good regular expression that will match all legal LC
>>>> Call Numbers from the LC Classified Schedule, but will generally not
>>>> match things that could not possibly be an LC Call Number from the LC
>>>> Classified Schedule?
>>>>
>>>> In particular, I need it to NOT match an "MLC" call number, which is an
>>>> LC assigned call number that shows up in an 050 with no way to
>>>> distinguish based on indicators, but isn't actually from the LC
>>>> Schedules. Here's an example of an "MLC" call number:
>>>>
>>>> "MLCS 83/5180 (P)"
>>>>
>>>> Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can
>>>> exclude them just like that. But it looks like there are also OTHER
>>>> things that can show up in the 050 but aren't actually from the
>>>> classified schedule, the OCLC documentation even contains an example of
>>>> "Microfilm 19072 E".
>>>>
>>>> What a mess, huh? So, yeah, regex anyone?
>>>>
>>>> [You can probably guess why I care if it's from the LC Classified
>>>> Schedule or not].
>>>
>>> Tod Olson<[log in to unmask]>
>>> Systems Librarian
>>> University of Chicago Library
>>>
>
|