LISTSERV 16.5 - CODE4LIB Archives

Ed,

Thank you for the detailed response. That was very helpful. Yes, it
seems like good Web architecture is the API. Sounds like it would be
easy enough to start somewhere and add features over time.

I could see how exposing this data in a crawlable way could provide
some nice indexed landing pages to help improve discoverability of
related collections. I wonder though if this begs the question of who
other than my own institution would use such local authorities? Would
there really be other consumers? What's the likelihood that other
institutions will need to reuse my local name authorities?

Is the idea that if enough of us publish our local data in this way
that there could be aggregators or other means to make it easier to
reuse from a single source?

I can see the use case for a local authorities app. While I think it
would be cool to expose our local data to the world in this way, I'm
still trying to grasp at the larger value proposition.

Jason

On Thu, Jan 31, 2013 at 5:59 AM, Ed Summers <[log in to unmask]> wrote:
> Hi Jason,
>
> Heh, sorry for the long response below. You always ask interesting questions :-D
>
> I would highly recommend that vocabulary management apps like this
> assign an identifier to each entity, that can be expressed as a URL.
> If there is any kind of database backing the app you will get the
> identifier for free (primary key, etc). So for example let's say you
> have a record for John Chapman, who is on the faculty at OSU, which
> has a primary key of 123 in the database, you would have a
> corresponding URL for that record:
>
>   http://id.library.osu.edu/person/123
>
> When someone points their browser at that URL they get back a nice
> HTML page describing John Chapman. I would strongly recommend that
> schema.org microdata and/or opengraph protocol RDFa be layered into
> the page for SEO purposes, as well as anyone who happens to be doing
> scraping.  I would also highly recommend adding a sitemap to enable
> discovery, and synchronization.
>
> Having that URL is handy because you could add different machine
> readable formats that hang off of it, which you can express as <link>s
> in your HTML, for example lets say you want to have JSON, RDF and XML
> representations:
>
>   http://id.library.osu.edu/person/123.json
>   http://id.library.osu.edu/person/123.xml
>   http://id.library.osu.edu/person/123.rdf
>
> If you want to get fancy you can content negotiate between the generic
> url and the format specific URLs, e.g.
>
>   curl -i --header "Accept: application/json"
> http://id.library.osu.edu/person/123
>   HTTP/1.1 303 See Other
>   date: Thu, 31 Jan 2013 10:47:44 GMT
>   server: Apache/2.2.14 (Ubuntu)
>   location: http://id.library.osu.edu/person/123
>   vary: Accept-Encoding
>
> But that's gravy.
>
> What exactly you put in these representations is a somewhat open
> question I think. I'm a bit biased towards SKOS for the RDF because
> it's lightweight, this is exactly its use case, it is flexible (you
> can layer other assertions in easily), and (full disclosure) I helped
> with the standardization of it. If you did do this you could use
> JSON-LD for the JSON, or just come up with something that works.
> Likewise for the XML. You might want to consider supporting JSON-P for
> the JSON representation, so that it can be used from JavaScript in
> other people's applications.
>
> It might be interesting to come up with some norms here for
> interoperability on a Wiki somewhere, or maybe a prototype of some
> kind. But the focus should be on what you need to actual use it in
> some app that needs vocabulary management. Focusing on reusing work
> that has already been done helps a lot too. I think that helps ground
> things significantly. I would be happy to discuss this further if you
> want.
>
> Whatever the format, I highly recommend you try to have the data link
> out to other places on the Web that are useful. So for example the
> record for John Chapman could link to his department page, blog, VIAF,
> Wikipedia, Google Scholar Profile, etc. This work tends to require
> human eyes, even if helped by a tool (Autosuggest, etc), so what you
> do may have to be limited, or at least an ongoing effort. Managing
> them (link scrubbing) is an ongoing effort too. But fitting your stuff
> into the larger context of the Web will mean that other people will
> want to use your identifiers. It's the dream of Linked Data I guess.
>
> Lastly I recommend you have an OpenSearch API, which is pretty easy,
> almost trivial, to put together. This would allow people to write
> software to search for "John Chapman" and get back results (there
> might be more than one) in Atom, RSS or JSON.  OpenSearch also has a
> handy AutoSuggest format, which some JavaScript libraries work with.
> The nice thing about OpenSearch is that Browsers search boxes support
> it too.
>
> I guess this might sound like an information architecture more than an
> API. Hopefully it makes sense. Having a page that documents all this,
> with "API" written across the top, that hopefully includes terms of
> service, can help a lot with use by others.
>
> //Ed
>
> PS. I should mention that Jon Phipps and Diane Hillman's work on the
> Metadata Registry [2] did a lot to inform my thinking about the use of
> URLs to identify these things. The metadata registry is used for
> making the RDA and IFLA's FRBR vocabulary. It handles lots of stuff
> like versioning, etc ... which might be nice to have. Personally I
> would probably start small before jumping to installing the Metadata
> Registry, but it might be an option for you.
>
> [1] http://www.opensearch.org
> [2] http://trac.metadataregistry.org/
>
> On Wed, Jan 30, 2013 at 3:47 PM, Jason Ronallo <[log in to unmask]> wrote:
>> Ed,
>>
>> Any suggestions or recommendations on what such an API would look
>> like, what response format(s) would be best, and how to advertise the
>> availability of a local name authority API? Who should we expect would
>> use our local name authority API? Are any of the examples from the big
>> authority databases like VIAF ones that would be good to follow for
>> API design and response formats?
>>
>> Jason
>>
>> On Wed, Jan 30, 2013 at 3:15 PM, Ed Summers <[log in to unmask]> wrote:
>>> On Tue, Jan 29, 2013 at 5:19 PM, Kyle Banerjee <[log in to unmask]> wrote:
>>>> This would certainly be a possibility for other projects, but the use case
>>>> we're immediately concerned with requires an authority file that's
>>>> maintained by our local archives. It contains all kinds of information
>>>> about people (degrees, nicknames, etc) as well as terminology which is not
>>>> technically kosher but which we know people use.
>>>
>>> Just as an aside really, I think there's a real opportunity for
>>> libraries and archives to make their local thesauri and name indexes
>>> available for integration into other applications both inside and
>>> outside their institutional walls. Wikipedia, Freebase, VIAF are
>>> great, but their notability guidelines don't always the greatest match
>>> for cultural heritage organizations. So seriously consider putting a
>>> little web app around the information you have, using it for
>>> maintaining the data, making it available programatically (API), and
>>> linking it out to other databases (VIAF, etc) as needed.
>>>
>>> A briefer/pithier way of saying this is to quote Mark Matienzo [1]
>>>
>>>   Sooner or later, everyone needs a vocabulary management app.
>>>
>>> :-)
>>>
>>> //Ed
>>>
>>> PS. I think Mark Phillips has done some interesting work in this area
>>> at UNT. But I don't have anything to point you at, maybe Mark is tuned
>>> in, and can chime in.
>>>
>>> [1] https://twitter.com/anarchivist/status/269654403701682176


On Thu, Jan 31, 2013 at 5:59 AM, Ed Summers <[log in to unmask]> wrote:
> Hi Jason,
>
> Heh, sorry for the long response below. You always ask interesting questions :-D
>
> I would highly recommend that vocabulary management apps like this
> assign an identifier to each entity, that can be expressed as a URL.
> If there is any kind of database backing the app you will get the
> identifier for free (primary key, etc). So for example let's say you
> have a record for John Chapman, who is on the faculty at OSU, which
> has a primary key of 123 in the database, you would have a
> corresponding URL for that record:
>
>   http://id.library.osu.edu/person/123
>
> When someone points their browser at that URL they get back a nice
> HTML page describing John Chapman. I would strongly recommend that
> schema.org microdata and/or opengraph protocol RDFa be layered into
> the page for SEO purposes, as well as anyone who happens to be doing
> scraping.  I would also highly recommend adding a sitemap to enable
> discovery, and synchronization.
>
> Having that URL is handy because you could add different machine
> readable formats that hang off of it, which you can express as <link>s
> in your HTML, for example lets say you want to have JSON, RDF and XML
> representations:
>
>   http://id.library.osu.edu/person/123.json
>   http://id.library.osu.edu/person/123.xml
>   http://id.library.osu.edu/person/123.rdf
>
> If you want to get fancy you can content negotiate between the generic
> url and the format specific URLs, e.g.
>
>   curl -i --header "Accept: application/json"
> http://id.library.osu.edu/person/123
>   HTTP/1.1 303 See Other
>   date: Thu, 31 Jan 2013 10:47:44 GMT
>   server: Apache/2.2.14 (Ubuntu)
>   location: http://id.library.osu.edu/person/123
>   vary: Accept-Encoding
>
> But that's gravy.
>
> What exactly you put in these representations is a somewhat open
> question I think. I'm a bit biased towards SKOS for the RDF because
> it's lightweight, this is exactly its use case, it is flexible (you
> can layer other assertions in easily), and (full disclosure) I helped
> with the standardization of it. If you did do this you could use
> JSON-LD for the JSON, or just come up with something that works.
> Likewise for the XML. You might want to consider supporting JSON-P for
> the JSON representation, so that it can be used from JavaScript in
> other people's applications.
>
> It might be interesting to come up with some norms here for
> interoperability on a Wiki somewhere, or maybe a prototype of some
> kind. But the focus should be on what you need to actual use it in
> some app that needs vocabulary management. Focusing on reusing work
> that has already been done helps a lot too. I think that helps ground
> things significantly. I would be happy to discuss this further if you
> want.
>
> Whatever the format, I highly recommend you try to have the data link
> out to other places on the Web that are useful. So for example the
> record for John Chapman could link to his department page, blog, VIAF,
> Wikipedia, Google Scholar Profile, etc. This work tends to require
> human eyes, even if helped by a tool (Autosuggest, etc), so what you
> do may have to be limited, or at least an ongoing effort. Managing
> them (link scrubbing) is an ongoing effort too. But fitting your stuff
> into the larger context of the Web will mean that other people will
> want to use your identifiers. It's the dream of Linked Data I guess.
>
> Lastly I recommend you have an OpenSearch API, which is pretty easy,
> almost trivial, to put together. This would allow people to write
> software to search for "John Chapman" and get back results (there
> might be more than one) in Atom, RSS or JSON.  OpenSearch also has a
> handy AutoSuggest format, which some JavaScript libraries work with.
> The nice thing about OpenSearch is that Browsers search boxes support
> it too.
>
> I guess this might sound like an information architecture more than an
> API. Hopefully it makes sense. Having a page that documents all this,
> with "API" written across the top, that hopefully includes terms of
> service, can help a lot with use by others.
>
> //Ed
>
> PS. I should mention that Jon Phipps and Diane Hillman's work on the
> Metadata Registry [2] did a lot to inform my thinking about the use of
> URLs to identify these things. The metadata registry is used for
> making the RDA and IFLA's FRBR vocabulary. It handles lots of stuff
> like versioning, etc ... which might be nice to have. Personally I
> would probably start small before jumping to installing the Metadata
> Registry, but it might be an option for you.
>
> [1] http://www.opensearch.org
> [2] http://trac.metadataregistry.org/
>
> On Wed, Jan 30, 2013 at 3:47 PM, Jason Ronallo <[log in to unmask]> wrote:
>> Ed,
>>
>> Any suggestions or recommendations on what such an API would look
>> like, what response format(s) would be best, and how to advertise the
>> availability of a local name authority API? Who should we expect would
>> use our local name authority API? Are any of the examples from the big
>> authority databases like VIAF ones that would be good to follow for
>> API design and response formats?
>>
>> Jason
>>
>> On Wed, Jan 30, 2013 at 3:15 PM, Ed Summers <[log in to unmask]> wrote:
>>> On Tue, Jan 29, 2013 at 5:19 PM, Kyle Banerjee <[log in to unmask]> wrote:
>>>> This would certainly be a possibility for other projects, but the use case
>>>> we're immediately concerned with requires an authority file that's
>>>> maintained by our local archives. It contains all kinds of information
>>>> about people (degrees, nicknames, etc) as well as terminology which is not
>>>> technically kosher but which we know people use.
>>>
>>> Just as an aside really, I think there's a real opportunity for
>>> libraries and archives to make their local thesauri and name indexes
>>> available for integration into other applications both inside and
>>> outside their institutional walls. Wikipedia, Freebase, VIAF are
>>> great, but their notability guidelines don't always the greatest match
>>> for cultural heritage organizations. So seriously consider putting a
>>> little web app around the information you have, using it for
>>> maintaining the data, making it available programatically (API), and
>>> linking it out to other databases (VIAF, etc) as needed.
>>>
>>> A briefer/pithier way of saying this is to quote Mark Matienzo [1]
>>>
>>>   Sooner or later, everyone needs a vocabulary management app.
>>>
>>> :-)
>>>
>>> //Ed
>>>
>>> PS. I think Mark Phillips has done some interesting work in this area
>>> at UNT. But I don't have anything to point you at, maybe Mark is tuned
>>> in, and can chime in.
>>>
>>> [1] https://twitter.com/anarchivist/status/269654403701682176


On Thu, Jan 31, 2013 at 5:59 AM, Ed Summers <[log in to unmask]> wrote:
> Hi Jason,
>
> Heh, sorry for the long response below. You always ask interesting questions :-D
>
> I would highly recommend that vocabulary management apps like this
> assign an identifier to each entity, that can be expressed as a URL.
> If there is any kind of database backing the app you will get the
> identifier for free (primary key, etc). So for example let's say you
> have a record for John Chapman, who is on the faculty at OSU, which
> has a primary key of 123 in the database, you would have a
> corresponding URL for that record:
>
>   http://id.library.osu.edu/person/123
>
> When someone points their browser at that URL they get back a nice
> HTML page describing John Chapman. I would strongly recommend that
> schema.org microdata and/or opengraph protocol RDFa be layered into
> the page for SEO purposes, as well as anyone who happens to be doing
> scraping.  I would also highly recommend adding a sitemap to enable
> discovery, and synchronization.
>
> Having that URL is handy because you could add different machine
> readable formats that hang off of it, which you can express as <link>s
> in your HTML, for example lets say you want to have JSON, RDF and XML
> representations:
>
>   http://id.library.osu.edu/person/123.json
>   http://id.library.osu.edu/person/123.xml
>   http://id.library.osu.edu/person/123.rdf
>
> If you want to get fancy you can content negotiate between the generic
> url and the format specific URLs, e.g.
>
>   curl -i --header "Accept: application/json"
> http://id.library.osu.edu/person/123
>   HTTP/1.1 303 See Other
>   date: Thu, 31 Jan 2013 10:47:44 GMT
>   server: Apache/2.2.14 (Ubuntu)
>   location: http://id.library.osu.edu/person/123
>   vary: Accept-Encoding
>
> But that's gravy.
>
> What exactly you put in these representations is a somewhat open
> question I think. I'm a bit biased towards SKOS for the RDF because
> it's lightweight, this is exactly its use case, it is flexible (you
> can layer other assertions in easily), and (full disclosure) I helped
> with the standardization of it. If you did do this you could use
> JSON-LD for the JSON, or just come up with something that works.
> Likewise for the XML. You might want to consider supporting JSON-P for
> the JSON representation, so that it can be used from JavaScript in
> other people's applications.
>
> It might be interesting to come up with some norms here for
> interoperability on a Wiki somewhere, or maybe a prototype of some
> kind. But the focus should be on what you need to actual use it in
> some app that needs vocabulary management. Focusing on reusing work
> that has already been done helps a lot too. I think that helps ground
> things significantly. I would be happy to discuss this further if you
> want.
>
> Whatever the format, I highly recommend you try to have the data link
> out to other places on the Web that are useful. So for example the
> record for John Chapman could link to his department page, blog, VIAF,
> Wikipedia, Google Scholar Profile, etc. This work tends to require
> human eyes, even if helped by a tool (Autosuggest, etc), so what you
> do may have to be limited, or at least an ongoing effort. Managing
> them (link scrubbing) is an ongoing effort too. But fitting your stuff
> into the larger context of the Web will mean that other people will
> want to use your identifiers. It's the dream of Linked Data I guess.
>
> Lastly I recommend you have an OpenSearch API, which is pretty easy,
> almost trivial, to put together. This would allow people to write
> software to search for "John Chapman" and get back results (there
> might be more than one) in Atom, RSS or JSON.  OpenSearch also has a
> handy AutoSuggest format, which some JavaScript libraries work with.
> The nice thing about OpenSearch is that Browsers search boxes support
> it too.
>
> I guess this might sound like an information architecture more than an
> API. Hopefully it makes sense. Having a page that documents all this,
> with "API" written across the top, that hopefully includes terms of
> service, can help a lot with use by others.
>
> //Ed
>
> PS. I should mention that Jon Phipps and Diane Hillman's work on the
> Metadata Registry [2] did a lot to inform my thinking about the use of
> URLs to identify these things. The metadata registry is used for
> making the RDA and IFLA's FRBR vocabulary. It handles lots of stuff
> like versioning, etc ... which might be nice to have. Personally I
> would probably start small before jumping to installing the Metadata
> Registry, but it might be an option for you.
>
> [1] http://www.opensearch.org
> [2] http://trac.metadataregistry.org/
>
> On Wed, Jan 30, 2013 at 3:47 PM, Jason Ronallo <[log in to unmask]> wrote:
>> Ed,
>>
>> Any suggestions or recommendations on what such an API would look
>> like, what response format(s) would be best, and how to advertise the
>> availability of a local name authority API? Who should we expect would
>> use our local name authority API? Are any of the examples from the big
>> authority databases like VIAF ones that would be good to follow for
>> API design and response formats?
>>
>> Jason
>>
>> On Wed, Jan 30, 2013 at 3:15 PM, Ed Summers <[log in to unmask]> wrote:
>>> On Tue, Jan 29, 2013 at 5:19 PM, Kyle Banerjee <[log in to unmask]> wrote:
>>>> This would certainly be a possibility for other projects, but the use case
>>>> we're immediately concerned with requires an authority file that's
>>>> maintained by our local archives. It contains all kinds of information
>>>> about people (degrees, nicknames, etc) as well as terminology which is not
>>>> technically kosher but which we know people use.
>>>
>>> Just as an aside really, I think there's a real opportunity for
>>> libraries and archives to make their local thesauri and name indexes
>>> available for integration into other applications both inside and
>>> outside their institutional walls. Wikipedia, Freebase, VIAF are
>>> great, but their notability guidelines don't always the greatest match
>>> for cultural heritage organizations. So seriously consider putting a
>>> little web app around the information you have, using it for
>>> maintaining the data, making it available programatically (API), and
>>> linking it out to other databases (VIAF, etc) as needed.
>>>
>>> A briefer/pithier way of saying this is to quote Mark Matienzo [1]
>>>
>>>   Sooner or later, everyone needs a vocabulary management app.
>>>
>>> :-)
>>>
>>> //Ed
>>>
>>> PS. I think Mark Phillips has done some interesting work in this area
>>> at UNT. But I don't have anything to point you at, maybe Mark is tuned
>>> in, and can chime in.
>>>
>>> [1] https://twitter.com/anarchivist/status/269654403701682176