Print

Print


On Dec 19, 2014, at 12:28 PM, Kyle Banerjee wrote:

> On Fri, Dec 19, 2014 at 7:57 AM, Joe Hourcle <[log in to unmask]>
> wrote:
> 
>> 
>> I can't comment on the linked data side of things so much, but in
>> following all of the comments from the US's push for opening up access to
>> federally funded research, I'd have to say that capitalism and
>> protectionist attitudes from 'publishers' seem to be a major factor in the
>> fight against open access.
>> 
> 
> That definitely doesn't help. But quite a few players own this problem.
> 
> Pockets where there is a culture of openness can be found but at least in
> my neck of the woods, researchers as a group fear being scooped and face
> incentive structures that discourage openness. You get brownie points for
> driving your metrics up as well as being first and novel, not for investing
> huge amounts of time structuring your data so that everyone else can look
> great using what you created.

There's been a lot of discussion of this problem over the last ~5 years or
so.  The general consensus is that :

1. We need better ways for people to acknowledge data being re-used.

	a. The need for standards for citation so that we can use 
	   bibliometric tools to extract the relationships
	b. The need for a citation specifically to the data, and not
	   a proxy (eg, the first results or instrument papers), to show
	   that maintaining the data is still important.
	c. Shift the work in determining how to acknowledge the data
	   from the re-user back to the distributor the data.

2. We need standards to make it easier for researchers to re-use data.

	Findability, accessibility of the file formats, documentation of
	data, etc.

3. We need institutions to change their culture to acknowledge that 
   producing really good data is as important for the research ecosystem
   as writing papers.  This includes decisions regarding awarding grants,
   tenure & promotion, etc.


Much of this is covered by the Joint Declaration of Data Citation
Principles:

	https://force11.org/datacitation

There are currently two sub-groups; one working on dissemination, to
make groups aware of the issues & the principles, and another (that I'm
on) working on issues of implementation.  We actually just submitted
something to PeerJ this week, on how to deal with 'machine actionable'
landing pages:

	https://peerj.com/preprints/697/

(I've been pushing for one of the sections to be clarified, so feel
free to comment ... if enough other people agree w/ me, maybe I can
get my changes into the final paper)


> Libraries face their own challenges in this regard. Even if we ignore that
> many libraries and library organizations are pretty tight with what they
> consider their intellectual property, there is still the issue that most of
> us are also under pressure to demonstrate impact, originality, etc. As a
> practical matter, this means we are rewarded for contributing to churn,
> imposing branding, keeping things siloed and local, etc. so that we can
> generate metrics that show how relevant we are to those who pay our bills
> even if we could do much more good by contributing to community initiatives.

But ... one of the other things that libraries do is make stuff available
to the public.  So as most aren't dealing with data, getting that into
their IRs means that they've then got more stuff that they can serve
to possibly help push up their metrics.

(not that I think those metrics are good ... I'd rather *not* transfer
data that people aren't going to use, but the bean counters like those
graphs of data transfer going up ... we just don't mention that it's
groups in China attempting to mirror our entire holdings)



> With regards to our local data initiatives, we don't push the open data
> aspect because this has practically no traction with researchers. What does
> interest them is meeting funder and publisher requirements as well as being
> able to transport their own research from one environment to another so
> that they can use it. The takeaway from this is that leadership from the
> top does matter.

The current strategy is to push for the scientific societies to implement
policies requiring the data be opened if it's to be used as evidence in
a journal article.  There are some exceptions*, but the recommendations
so far are to still set up the landing page to make the data citable,
but instead of linking directly to the data, provide an explanation of
what the procedures are to request access.

Through this, we have the requirement be that if the researcher wants
to publish their paper ... they have to provide the data, too.

We're run into a few interesting snags, though.  For instance, some are
only requiring the data that directly supports the paper to be published;
this means that we have no way of knowing if they cherry-picked their
data and the larger collection might have evidence to refute their
findings.

The 'publishers' seem to be for it, as some of them look at it as a way
to get people to publish 'data papers' (and of course, charge author
fees at the same time).  I personally don't like the concept for the
most part, as I think our understanding of the data changes with time,
and we need dynamic descriptions of the data, not static ones.  

I can see the 'data paper' concept working for adding to the documentation
of the data, particularly in cases where you might be describing how
some other community's data can be re-used, but I don't think it should
be the primary citation for data.



> The good news is that things seem to be moving in the right direction, even
> if it is at the speed of goo.

It's slower than I'd like, but at least from the data side of things,
it's accelerating.  There are a lot of groups working on issues,
but I'd say that the big player these days is Research Data Alliance:

	https://rd-alliance.org

They have various interest & working groups around different cross
discipline issues and focused on specific disciplines / types of
data.


-Joe

* There are some cases where it's not possible to publicly release data:
  Almost any IRB-encumbered research, individual health records,
  locations of endangered species, or export restricted information (eg,
  nuclear devices or launch systems)