Print

Print


Have you considered the LOCAH work in mapping EAD into Linked Data?

http://archiveshub.ac.uk/locah/
and
http://data.archiveshub.ac.uk/

Rob




On Sun, Jan 19, 2014 at 5:10 PM, Ben Companjen
<[log in to unmask]>wrote:

> Hi Eric,
>
> While I'm no archivist by training (information systems engineer I am),
> I've learned a thing or two from having to work with EAD and its basis for
> use, ISAD(G) (all citations below are from ISAD(G), 2nd edition). As with
> all information modelling, either inside or outside the Linked Data
> domain, you should take a step back to look at the goal of the
> description. When you have a list of what you want to describe, you can
> start looking for ontologies.
>
> You probably know this, but I was triggered by "Because many archival
> descriptions are rooted in MARC
>     records, and MODS is easily mapped from MARC." to respond. IMO
> archival descriptions are rooted in rules for description, not a specific
> file format.
>
> So, when I of (some of) the essences of archival description, I think of:
>
> - "The purpose of archival description is to identify and explain the
> context and content of archival material in order to promote its
> accessibility. This is achieved by creating accurate and appropriate
> representations and by organizing them in accordance with predetermined
> models." (§I.2)
> - "… seven areas of descriptive information:
>   1. Identity Statement Area
>      (where essential information is conveyed to identify the unit of
> description)
>   2. Context Area
>      (where information is conveyed about the origin and custody of the
> unit of description)
>   3. Content and Structure Area
>      (where information is conveyed about the subject matter and
> arrangement of the unit of description)
>   4. Condition of Access and Use Area
>      (where information is conveyed about the availability of the unit of
> description)
>   5. Allied Materials Area
>      (where information is conveyed about materials having an important
> relationship to the unit of description)
>   6. Note Area
>      (where specialized information and information that cannot be
> accommodated in any of the other areas may be conveyed).
>   7. Description Control Area
>      (where information is conveyed on how, when and by whom the archival
> description was prepared)." (§I.11)
>
>
>
> There is a distinction between the thing being described, and the
> description itself, and both have an important role within the archival
> description. (If anything so far causes confusion with anyone here, I
> misunderstood and accept to be corrected :))
> NB: this is one way of thinking of descriptions. Incorporating the
> PROV-ontology would make sense for expressing more/other aspects of the
> provenance of archival entities, but I haven't got round to becoming an
> expert of PROV yet ;)
>
>
> ISAD(G) lists 26 "elements that may be combined to constitute the
> description of an archival entity".
>
> Trying to translate these 'elements', I'd end up with possible a lot more
> than 26 RDFS/OWL properties.
> *Depending on the type of archival entity you can/should of course use
> more specific ontologies.*
>
>
>
> Let me list some properties and related ontologies.
>
>
>
>
>
> # Identity statement area
>
> ## Identifiers
> The URI, naturally, and other IDs. Could be linked using
> dc(terms):identifier, or mods:identifier, or other ontologies. Ideally
> there is some way of linking the domain of the ID to the ID itself,
> because "box 101" is likely not unique in the universe. Perhaps you want
> to publish a URI strategy separately to explain how the URI was
> assembled/derived.
>
> ## Title
> Again DC(terms), MODS, RDA
>
> ## Date(s)
> You want properties that have a clear meaning. For example,
> dcterms:created and mods:dateCreated assume it is clear what "when the
> resource was created" means. DC terms are vague, I mean general, on
> purpose. You could create some properties `owl:subPropertyOf` dcterms date
> properties for this.
> I'd look into EDTF for encoding uncertain dates and ranges and BCE dates
> (MODS doesn't support BCE dates).
>
> ## Level of description
> What kind of 'documentary unit' does the description describe? A whole
> building's content or one piece of paper? I don't know of any ontology
> with terms "fonds", …, "file", "item", but you could say `<http URI>
> rdf:type <fonds class URI>`.
>
> ## Extent and medium
> Saying anything about extent and medium should possible only happen on the
> lowest level of description. Any higher level extent and medium should be
> calculated by aggregating lower level descriptions.
> On the lowest level, refer to class URIs. A combination of dimensions and
> material {c|sh}ould be a class, e.g. A4 paper 80 grams/square meter.
>
> # Context area
>
> ## Creator(s) and administrative/biographical history
> As ISAD(G) refers to ISAAR(CPF) for description of corporate bodies,
> people, and families, this is a perfect example of using existing people-
> and organisation-describing ontologies like FOAF, BIO, ORG, and others are
> useful for separate descriptions of the people and organisations involved.
> You want specific properties to describe the roles of these 'agents' in
> the history of the archival entity…
>
> ## Archival history and Immediate source of acquisition or transfer
> … and you would want them 'here' (of course there is no particular order
> in which these properties are used). PREMIS and PROV come to mind first
> for recording who did what to what, (where and?) when and with what
> result. There are probably some ontologies describing possible "events" as
> RDFS/OWL classes, so you could link to those.
> The immediate source of acquisition or transfer may be just another event.
>
> # Content and structure area
>
> ## Scope and content
> Descriptions, keywords, terms from authority files about "scope (such as,
> time periods, geography) and content, (such as documentary forms, subject
> matter, administrative processes) … appropriate to the level of
> description.": pretty natural fit for links to SKOS thesauri and other
> ontologies of real-world 'things'.
> One might think of dcterms:subject, dcterms:description,
> dcterms:temporalCoverage etc., but describing *how* exactly such terms
> relate to the archival entity needs more specific properties than
> "subject" et al.
>
> ## Appraisal, destruction and scheduling information
> Reasons for including things and (possibly) removal of archival entities
> should go very well in rules, and some types of rules go very well in
> ontologies. Making this up as I type: <class of letters written by the
> head of state> rdfs:subClassOf <class of 'things to be kept'>. The actual
> selection and destruction actions could be modelled in the same way as
> other actions are described for provenance.
>
> ## Accruals
> Whether more content can be expected probably depends on other properties
> of the archival entity, like its type(s) and creator(s). I don't know
> about specific properties to record this, but <class of living heads of
> state archival entities> rdfs:subClassOf <class of 'living' archival
> entities>? There are ways of modelling rules for this, like the Rules
> Interchange Format, but the rules may be defined by the archives and
> archivists.
>
> ## System of arrangement
> Thinking about this, I tend to think of a collection of keywords to
> describe the arrangement of a low-level archival entity like a folder or
> box: alphabetical, as found on deceased's desk. But there is more, of
> course. Perhaps using the Collection Ontology for low levels could help
> generate higher level 'systems of arrangement'.
>
> # Conditions of access and use area
>
> ## Conditions governing access and Conditions governing reproduction
> You can describe rights with the Creative Commons Rights Expression
> Language.
>
> ## Language of material
> mods:language maybe? Preferably used on sub-document level and generated
> for higher-level descriptions.
>
> ## Physical characteristics / technical requirements
> Conditions should follow from their respective properties: <class of
> PDF/A-1b files> ..:requiresForReading <class of PDF/A-1b readers> and
> rules that say documents in <class A> are embargoed for 20 years after
> creation + a creation date can present enough information to the agent to
> determine dcterms:dateAvailable.
>
>
> ## Finding aids
> As a non-archivist I had some trouble understanding the difference between
> descriptions and finding aids and what the exact use of a finding aid was.
> Also, having grown up with search engines, indexes, I think the concept
> may eventually become extinct. I guess you could use foaf:page to link a
> document-like finding aid to the archival entity and rdfs:seeAlso to point
> to machine-actionable related things.
>
> # Allied materials area
>
> ## Existence and location of originals/copies
> PROV can be used to link a copy to an original (and how the copy was
> created etc.). `<X> prov:wasDerivedFrom <Y>. <Y> :isAt <AnotherArchive>.`
>
> ## Related units of description / Publication note
> Use properties that describe the specific relations among archival
> entities. DC Terms has some useful ones, like for citations. Related items
> can be derived from all or selected properties automatically too.
>
> # Notes area
> ## Notes
> dcterms:description? Unlike a document containing rules that needs to be
> finished at some time, Linked Data has no such rule. You can always create
> a property with a well-defined meaning to use for specific information.
>
> # Description control area
>
> ## Archivist's note / dates of description
> Who did what when, where, why and how to the description itself. Same as
> for the unit of description itself.
> This may be a good time to draw a bit more attention to the question:
> *what is a description?*
> I don't have a (/ there is no) final answer, but as The One True Written
> Paper Description from long ago is becoming a set of triples, you want to
> think about it. You could link versions of RDF documents using PROV to
> record this information.
>
> ## Rules and conventions
> A link to the rules and conventions for description. Could also fit with
> the PROV provenance.
>
>
>
> No, this is not a list of ontologies to use/explore right away, but I hope
> you (and others) find it helpful, or perhaps even food for discussion.
> Also, have a look at CIDOC-CRM. It has lots of properties.
>
> Regards,
>
> Ben
>
> On 19-01-14 03:39, "Eric Lease Morgan" <[log in to unmask]> wrote:
>
> >If you were to select a set of RDF ontologies intended to be used in the
> >linked data of archival descriptions, then what ontologies would you
> >select?
> >
> >
> >  * Dublin Core Terms - This ontology is rather bibliographic in
> >    nature, and provides a decent framework for describing much of
> >    the content of archival descriptions.
> >
> >  * FOAF - Archival collections often originate from individual
> >    people. Such is the scope of FOAF, and FOAF is used by a number
> >    of other sets of linked data.
> >
> >  * MODS - Because many archival descriptions are rooted in MARC
> >    records, and MODS is easily mapped from MARC.
> >
> >  * Schema.org - This is an up-and-coming ontology heralded by the
> >    600-pound gorillas in the room -- Google, Microsoft, Yahoo, etc.
> >    While the ontology has not been put into practice for very long,
> >    it is growing and wide ranging.
> >
> >  * RDF - This ontology is necessary because linked data is
> >    manifested as... RDF
> >
> >  * RDFS - This ontology may be necessary because the archival
> >    community may be creating some of its own ontologies.
> >
> >  * OWL and SKOS - Both of these ontologies seem to be used to
> >    denote relationships between terms in other ontologies. In this
> >    way they are used to create classification schemes and thesauri.
> >    For example, they allow the implementor to denote "creator" in one
> >    ontology is the same as "author" in another ontology. Or they
> >    allow "country" in one ontology to be denoted as a parent
> >    geographic term for "city" in another ontology.
> >
> >While some or all of these ontologies may be useful for linked data of
> >archival descriptions, what might some other ontologies include?
> >(Remember, it is often "better" to select existing ontologies rather than
> >inventing, unless there is something distinctly unique about a particular
> >domain.) For example, how about an ontology denoting times? Or how about
> >one for places? FOAF is good for people, but what about organizations or
> >institutions?
> >
> >Inquiring minds would like to know.
> >
> >—
> >Eric Morgan
>