LISTSERV 16.5 - CODE4LIB Archives

On 2/23/2012 3:53 PM, Karen Coyle wrote:
> Jonathan, while having these thoughts your Umlaut service did come to 
> mind.  If you ever have time to expand on how it could work in a wide 
> open web environment, I'd love to hear it. (I know you explain below, 
> but I don't know enough about link resolvers to understand what it 
> really means from a short 
> explanation. Diagrams are always welcome!)

I'm not entirely sure what is meant by 'wide open web environment.'

I mean, part of the current environment is that there's lots of stuff on 
the web that is NOT free/open access, it's only available to certain 
licensed people. AND that libraries license a lot of this stuff on 
behalf of their user group. (Not just content, but sometimes services 
too).  It's really that environment Umlaut is focused on, if that 
changed, what would be required would have little to do with Umlaut as 
it is now, I think.

But I don't think anyone anticipates that changing anytime soon, I don't 
think that's what Karen means by 'wide open web environment.'

So if that continues to be the case.... I think Umlaut has a role 
working pretty much as it does now, it would work how it works.  (Maybe 
I'm not sufficiently forward-thinking).

I will admit that, while I come accross lots of barriers in implementing 
Umlaut, I have yet to come accross anything that makes me think "this 
would be a lot easier if only there was more RDF."  Maybe it's a failure 
of imagination on my part.  More downloadable data, sure. More http 
APIs, even more so.  And Umlaut already takes advantage of such things, 
especially the API's more than the downloadable data (it turns out it's 
a lot more 'expensive' to try to download data and do something with it 
yourself, compared to using an API someone else provides to do the heavy 
lifting for you).  But has it ever been much of a problem that the data 
is in some format other than RDF, such that it would be easier in RDF? 
Not from my perspective, not really. (In some ways, RDF is harder to 
deal with than other formats, from where I'm coming from. If a service 
does offer data in RDF triples as well as something else, I'm likely to 
choose the something else).

This may be ironic because Umlaut is very concerned with 'linking data', 
in the sense of figuring out whether this record from the local catalog 
represents 'the same thing' as this record from Amazon, as this record 
from Google Books, or HathiTrust. If this citation that came in as an 
OpenURL represents the 'same thing' as a record in a vendor database, or 
mendeley, or whatever.

There are real barriers in making this determination; they wouldn't be 
solved if everything was just in RDF, but they _would_ be solved if 
there were more consistent use of identifiers, for sure. I DO think 
"this would be easier if only there were more consistent use of 
identifiers" all the time.

That experience with Umlaut is also what leads me to believe that the 
WEMI ontology is not only not contradictory to "linked data 
applications", but _crucial_ for it. Realizing that without it, it's 
very hard to tell when something is "the same thing". There are lots of 
times Umlaut ends up saying "Okay, I found something that I _think_ is 
at least an edition of the same thing you care about, but I really can't 
tell you if it's the _same_ edition you are interested in or not."

So, yeah, Umlaut would work _better_ with more widespread use 
identifiers, and even better with consistent use of common identifiers. 
I guess that's maybe where RDF could come in, in expressing 
determinations people have made of "this identifier in system X 
represents the same 'thing' as this other identifier in system Y" 
(someone would still have to MAKE those determinations, RDF would just 
be one way to then convey that determination, and I wouldn't 
particularly care if it was conveyed in RDF or something else). So 
anyway, it would work better with some of that stuff, but would it work 
substantially _differently_? Not so much.

Ah, if web pages started having more embedded machine readable data with 
citations and identifiers of "what is being looked at" (microdata, RDFa, 
whatever), that would make it easier to get a user from some random web 
page _to_ an institution's Umlaut, that's one thing that would be nice.

You may (or may not) find the "What is Umlaut, Anyway?" article on the 
Umlaut wiki helpful.

https://github.com/team-umlaut/umlaut/wiki/What-is-Umlaut-anyway


And there's really not much to understand about 'link resolvers' for 
these purposes, except that there's this thing called OpenURL (really 
bad name), which is really just a way for one website to hyperlink to 
another website and pass a machine-readable citation to it. This 
application receiving the machine readable citation then tries to get 
the user to appropriate access or services for it, with regard to 
institutional entitlements. That's about it, if you understand that, you 
understand enough.  Except that most commercially available 'link 
resolvers' do a so-so job with scholarly article citations and full 
text, and don't even try much with anything else. (In part, because it's 
_hard_, especially to provide an out of the box solution, because of 
libraries diverse messed up infrastructures).