A few belated ramblings from a cataloger:
1) GEOGRAPHICAL SUBDIVISION
It used to be that geographical subdivision was much more flexible and was supposed to convey different meanings depending on where it occurred in the string. Then there was some research showing that not only did users not know how to interpret this, but catalogers did not understand these rules and were constructing inconsistent headings. This led to a movement for simplification. From LC's Subject Heading Manual:
"The Subject Subdivisions Conference that took place at Airlie, Virginia, in 1991 recommended that the standard order of subdivisions be [topic]–[place]–[chronology]–[form]. In 1992, it was decided to adopt that order where it could be applied."
This leaves a standard order of $a, $b [rare], $x, $z, $y, $v with some exceptions.
As was pointed out earlier, the current rule is to "put the geographic subdivision ($$z) as near the end as is legal." This can be mechanically determined based on a fixed field in the authority record. Although fixed fields in bib records are often unreliable, those in authority records are probably as accurate as they can reasonably be made to be, allowing for human error. This is both because LC coordinates training and reviews records and because the fixed fields are used as decision points so there are short-term consequences for later catalogers if they're not done right.
The fixed field (008/06) in LCSH authority records that tells you if a geographic subdivision can come after the heading (http://www.loc.gov/marc/authority/ad008.html). Id.loc.gov doesn't seem to give you that info, but it might be nice if it did.
650 _0 $a Education [sh 85040989, Geo Subd = i = Subdivided geographically-indirect] $z England [n 82068148] $x Finance [sh2002007885, Geo Subd = # = Not subdivided geographically]
650 _0 $a Education [sh 85040989, Geo Subd = i = Subdivided geographically-indirect] $x Economic aspects [sh 99005484 Geo Subd = i = Subdivided geographically-indirect] $z England [n 82068148].
One reason not to rely on found order is that LC has been moving in the direction of the Airlie House recommendation so in addition to the usual mistakes, you'll probably come across a lot of older forms if you take data from the wild. For example, until somewhat recently, the "economic aspects" record above looked like the "finance" one so you'll probably still see records like
650 _0 $a Education $z England $x Economic aspects.
A) Indirect Subdivision
In general, when a heading string starts with a geographic name, it is in direct order:
651 _0 $a London (England) [n 79005665] $x Economic conditions [sh 99005736].
If a geographic name is modifying a topical heading, it is given in indirect order:
650 _0 $a Education [sh 85040989] $z England $z London [n 79005665; covers both $z subfields].
Thanks to a project that OCLC did for FAST (which uses only the indirect style), in most cases both of these can be extracted from the authority record, which will have a 781 with the indirect form added:
151 $a London (England)
451 $a Londinium (England)
781 0 $z England $z London
Some records (usually for geographic areas within cities) cannot be used to modify topical headings, but can be used in 651$a as the main term in a heading string. There are identified by a note and lack of 781.
151 $a Hackney (London, England)
667 $a SUBJECT USAGE: This heading is not valid for use as a geographic subdivision.
B) Geographic Entities and Name vs. Subject Headings
Notice that in the above example, the control number/identifier for Education starts with sh while the one for London starts with n. This is an important distinction. Heading identifiers that start with sh are LCSH terms found in the subject authority file and are available from id.loc.gov. I think these all fall into FRBR's group 3 bib entities. Heading identifiers that start with n are stored in the LC NAF (Name Authority File) and are not available as linked data. These are the FRBR group 1 and 2 entities and maybe some from group 3. Most of these can also be used as subjects in LCSH. So you can't actually get at all the building blocks of LCSH strings nor use linked data for all subjects.
Named geographic features (e.g., mountains, lakes, continents) are established in the subject authority file using the rules in the Subject Cataloging Manual for LCSH. The headings are tagged 151 and can be found at id.loc.gov.
151 $a McKinley, Mount (Alaska)
151 $a Erie, Lake
151 $a Asia
Geographic features appear in bib records only as 651 or 650+ $z subject terms.
Jurisdiction names (e.g., cities, states, countries) are established in the name authority file using descriptive cataloging rules (e.g., AACR2 ch 23 and the NACO Participants' Manual). They are tagged 151 and cannot be found at id.loc.gov
151 $a Paris (France)
151 $a Oregon
151 $a Mexico
Depending on their function, jurisdiction names can be tagged both as geographic subject headings (651, 650+ $z) and as tagged as corporate names (110, 710 and as subjects 610) in bibliographic records.
For an example, go to http://lccn.loc.gov/92643471 and click on “View LC holdings for this title in the: LC Online Catalog” and then MARC view.
110 1_ $a New York (N.Y.)
245 10 $a New York City Charter and Administrative Code. $p Amendments, complete with indices.
650 _0 $a Delegated legislation $z New York (State) $z New York.
There is a famous part of the Subject Headings Manual (H405) known as the “division of the world” that addresses ambiguous entities (such as building and park names) and
1. whether they are established in the name authority file or the subject authority file
2. whether they are established using the descriptive cataloging rules or subject cataloging rules
3. how they are tagged in MARC (usually 110 corporate name or 151 geographic name)
The current list of these categories can be found at http://www.loc.gov/catdir/pcc/saco/alpha405.html.
Because the rules and processes for naming things are different for descriptive rules and subject rules, there is some fallout from this decision.
Differences in what the name is because of different rules
Differences in the use of related broader terms
Subject authority records can have broader terms, such as
151 $a Fuji, Mount (Japan)
550 $a Mountains $z Japan $w g [broader term]
550 $a Volcanoes $z Japan $w g [broader term]
781 $z Japan $z Fuji, Mount
This means that if you browse the subject authority file for Mountains--Japan, you'll get a list of the 92 mountains in Japan that have been established in LCSH. You can't do this in the name authority file.
Differences in how name changes are treated
Descriptive rules establish a separate, linked record for every name variation. As 110/710 access points (authors), the heading used is the one for the name at the time of writing:
See also later heading Istanbul (Turkey)
Valid as a name heading for the period 330-1453
See also earlier heading Constantinople
Valid as a name heading after 1453
Subject rules use the latest name for more or less co-extensive territories. Thus only Istanbul (Turkey) is used as a subject and the historical subdivisions are all listed there. For example:
Istanbul (Turkey) $x History $y Siege, 1203-1204
Istanbul (Turkey) $x History $y 20th century
The Constantinople record has the follow 667 note:
SUBJECT USAGE: This heading is not valid for use as a subject. Works about this place are entered under Istanbul (Turkey).
Lack of co-extensivity results in separate subject authority records as is the case for Russia, the Soviet Union and Russia (Federation).
Differences in who creates the record and how easily and quickly it can be done
Creation of name authority records is widely distributed through LC's cooperative cataloging program (NACO). Name authority records exist largely independently. It is only necessary to uniquely identify the name with the existing name authority file (at least ideally), provide sufficient cross-references that someone can reasonably find the name, and link the name to any immediately preceding or following names if those are present in any bibliographic records.
Creation of subject authority records is centralized at LC. Although the cataloging community can make suggestions through a formal process (SACO), all proposals are vetted by LC, who make the final decision. Although LCSH has many problems in practice, it is intended to function as a coherent, interdependent web. This requires a big picture perspective and more attention to how the individual headings fit into the whole.
One more twist
Occasionally, geographic names can be used in 6xx $x if they are being used as topics. For example
100 $a Shakespeare, William, $d 1564-1616 $x Knowledge $x Greece [Shakespeare’s knowledge about Greece]
150 $a Information storage and retrieval systems $x United States [Information storage and retrieval systems about the U.S.]
For more info on how this works, see the ALCTS/PCC manual for basic LCSH training at http://www.loc.gov/catworkshop/courses/basicsubject/pdf/lcsh-trnee-manual.pdf. In particular, session 10 (Names as Subjects) and the part beginning on 11-11 about MARC coding of geographic names.
2) HEADINGS WHERE NOT ALL COMPONENTS ARE ESTABLISHED EXPLICITLY
The example in the discussion was
650 _0 $a English language $v Dictionaries $x Albanian.
This could potentially be mapped to
sh 85043413 150 English language + sh 99001605 185 $v Dictionaries $x French, [Italian, etc.]
And perhaps even expanded to $v Dictionaries + $a Albanian language. In the end you might produce a useful list of languages in LCSH by this means. For everything in $x, there ought to be a corresponding 150 with "language" appended. There are a number of these pattern subdivisions. The other common ones are for religious topics, such as the following, and some related to wars or nationalities.
180 ǂx Religious aspects ǂx Baptists, [Catholic Church, etc.]
180 ǂx Religious aspects ǂx Buddhism, [Christianity, etc.]
The authority record headings can be identified by their use of etc. and brackets.
Music headings are also often not constructed explicitly but consist of headings built up from components like names of instruments and using certain rules. You could theoretically deconstruct most of these, but I don't know that it would be worthwhile. When I was at Ball State, we did something like this (over a bounded set of records, though) where we mapped chamber music subject headings to coded data to drive a search form that allowed users to search by instrumentation (http://www.bsu.edu/libraries/viewpage.asp?src=./librarycatalogs/chambermusic.html). Although that did require some manual follow-up (for pieces where the instrumentation and genre are spread across more than one heading), it was mostly an automated process.
I also am not sure that I agree with the suggestion that
“The second issue using your example is that you want to find the ‘longest’ matching heading. While the pieces parts are there, so is the enumerated authority heading:
150 __ $a Education $z England
as LCCN sh2008102746. So your heading is actually composed of the enumerated headings:
sh2008102746 150 __ $a Education $z England
sh2002007885 180 __ $x Finance”
For a couple reasons.
1) A lot of things are currently arbitrarily enumerated. It used to be that things were only enumerated when they couldn’t be constructed from the rules, but more recently LC has begun an attempt to explicitly establish all the common combinations in an attempt to appease ILS’s supposed need for strings for validation (what I think is a misguided game of whack-a-mole). These records can be identified by the 667 note “Record generated for validation purposes,” but not, so far as I can tell, at id.loc.gov.
I tried to describe an alternate vision of encoding the information needed for creating the combinations rather than the exponentially large number of combinations themselves in the section called Use of these Categories to Improve Consistency and Authority Control in part 9 (http://journal.code4lib.org/articles/23#problem9) of my Code4Lib Journal article on LCSH and faceting. LC is actually starting some experiments in this direction.
The point is that there is often no logical difference between what is enumerated and what’s not; it’s just what LC happens to have gotten to. So I think this will leave you with inconsistent and less useful data.
2) I also think you would get more flexibility and bang for your buck if you broke the headings down into the smallest possible parts.
Education + England + Finance
Because this gives you more pieces to offer up to people separately or potentially in any combination.
FWIW, there is also
150 $a Education $x Finance
So if the $x Finance is ever made geographically subdividable (which is the trend), your enumerated bit (at least if you go in order) may change.
I’m sure this is way too much info for most (or all) on this list, but in case it is helpful, I thought I’d throw it out there.