You could also try to use the code I put in SolrMarc utilities classes
ha ha ha.
- Naomi
On Mar 31, 2011, at 10:25 AM, Keith Jenkins wrote:
> The Google Code regex looks like it will accept any 1-3 letters at the
> start of the call number. But LCC has no I, O, W, X, or Y
> classifications.
>
> So you might want to use something more like ^[A-HJ-NP-VZ] at the
> start of the regex.
>
> Also, there are only a few major classifications that use three
> letters. Like DJK, and several in the Ks. I'm not sure, but there
> might be others.
>
> Keith
>
>
> On Thu, Mar 31, 2011 at 1:11 PM, Jonathan Rochkind
> <[log in to unmask]> wrote:
>> Except now I wonder if those annoying MLCS call numbers might
>> actually be
>> properly MATCHED by this regex, when I need em excluded. They are
>> annoying
>> _similar_ to a classified call number. Well, one way to find out.
>>
>> And the reason this matters is to try and use an LCC to map to a
>> 'discipline' or other broad category, either directly from the LCC
>> schedule
>> labels, or using a mapping like umich's:
>> http://www.lib.umich.edu/browse/categories/
>>
>> But if it's not really an LCC at all, and you try to map it, you'll
>> get bad
>> postings.
>>
>> On 3/31/2011 1:03 PM, Jonathan Rochkind wrote:
>>>
>>> Thanks, that looks good!
>>>
>>> It's hosted on Google Code, but I don't think that code is anything
>>> "Google uses", it looks like it's from our very own Bill Dueber.
>>>
>>> On 3/31/2011 12:38 PM, Tod Olson wrote:
>>>>
>>>> Check the regexp that Google uses in their call number
>>>> normalization:
>>>>
>>>> http://code.google.com/p/library-callnumber-lc/wiki/Home
>>>>
>>>> You may want to remove the prefix part, and allow for a fourth
>>>> cutter.
>>>>
>>>> The folks at UNC pointed me to this a few months ago.
>>>>
>>>> -Tod
>>>>
>>>> On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote:
>>>>
>>>>> Does anyone have a good regular expression that will match all
>>>>> legal LC
>>>>> Call Numbers from the LC Classified Schedule, but will generally
>>>>> not
>>>>> match things that could not possibly be an LC Call Number from
>>>>> the LC
>>>>> Classified Schedule?
>>>>>
>>>>> In particular, I need it to NOT match an "MLC" call number,
>>>>> which is an
>>>>> LC assigned call number that shows up in an 050 with no way to
>>>>> distinguish based on indicators, but isn't actually from the LC
>>>>> Schedules. Here's an example of an "MLC" call number:
>>>>>
>>>>> "MLCS 83/5180 (P)"
>>>>>
>>>>> Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can
>>>>> exclude them just like that. But it looks like there are also
>>>>> OTHER
>>>>> things that can show up in the 050 but aren't actually from the
>>>>> classified schedule, the OCLC documentation even contains an
>>>>> example of
>>>>> "Microfilm 19072 E".
>>>>>
>>>>> What a mess, huh? So, yeah, regex anyone?
>>>>>
>>>>> [You can probably guess why I care if it's from the LC Classified
>>>>> Schedule or not].
>>>>
>>>> Tod Olson<[log in to unmask]>
>>>> Systems Librarian
>>>> University of Chicago Library
>>>>
>>
|