I highly recommend Chapter 6 of the Linked Data book which details different design approaches for Linked Data applications - sections 6.3  ( summarises the approaches as:

1. Crawling Pattern
2. On-the-fly dereferencing pattern
3. Query federation pattern

Generally my view would be that (1) and (2) are viable approaches for different applications, but that (3) is generally a bad idea (having been through federated search before!)


Owen Stephens
Owen Stephens Consulting
Email: [log in to unmask]
Telephone: 0121 288 6936

> On 26 Feb 2015, at 14:40, Eric Lease Morgan <[log in to unmask]> wrote:
> On Feb 25, 2015, at 2:48 PM, Esmé Cowles <[log in to unmask]> wrote:
>>> In the non-techie library world, linked data is being talked about (perhaps only in listserv traffic) as if the data (bibliographic data, for instance) will reside on remote sites (as a SPARQL endpoint??? We don't know the technical implications of that), and be displayed by <your local catalog/the centralized inter-national catalog> by calling data from that remote site. But the original question was how the data on those remote sites would be <access points> - how can I start my search by searching for that remote content?  I assume there has to be a database implementation that visits that data and pre-indexes it for it to be searchable, and therefore the index has to be local (or global a la Google or OCLC or its bibliographic-linked-data equivalent). 
>> I think there are several options for how this works, and different applications may take different approaches.  The most basic approach would be to just include the URIs in your local system and retrieve them any time you wanted to work with them.  But the performance of that would be terrible, and your application would stop working if it couldn't retrieve the URIs.
>> So there are lots of different approaches (which could be combined):
>> - Retrieve the URIs the first time, and then cache them locally.
>> - Download an entire data dump of the remote vocabulary and host it locally.
>> - Add text fields in parallel to the URIs, so you at least have a label for it.
>> - Index the data in Solr, Elasticsearch, etc. and use that most of the time, esp. for read-only operations.
> Yes, exactly. I believe Esmé has articulated the possible solutions well. escowles++  —ELM