Having done research, and now working in a very varied metadata role, I don't quite understand this discussion about data that is or isn't metadata. Scientific data is a great example of structured data, but it's not impossible to distinguish it from metadata purely describing a dataset.

However, if you have scientific research data created during the experiments, even if it's "operational", it's clearly part of "the" data. This doesn't mean there can't be metadata describing *that data*. Just because it's not glamorous data doesn't mean it's not essential to the scientific process. Similarly, just being about mundane or procedural things doesn't make data into metadata...!

You're absolutely right, the contextual information is certainly part of the experimental outcome in this example; otherwise it would be abstract data such as one might use in a textbook example.

Metadata would describe the dataset itself, not the scientific research. There's always a certain ambiguity involved in identifying "the data" as distinct from the metadata, and it's a false dichotomy to suggest metadata is not useful at all for the domain expert. It's contextual, and the definition is always at least partly based on your use case for the data and its description.

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Nate Vack
Sent: 14 February 2012 14:45
To: [log in to unmask]
Subject: Re: [CODE4LIB] Metadata

On Tue, Feb 14, 2012 at 1:22 AM, Graham Triggs <[log in to unmask]> wrote:

> That's an interesting distinction though. Do you need all that data in 
> order to make sense of the results? You don't [necessarily] need to 
> know who conducted some research, or when they conducted it in order 
> to analyse and make sense of the data. In the context of having the 
> data, this other information becomes irrelevant in terms of 
> understanding what that data says.

It is *essential* to understanding what the data says. Perhaps you find out your sensor was on the fritz during a time period -- you need to be able to know what datasets are suspect. Maybe the blood pressure effect you're looking at is mediated by circadian rhythms, and hence, times of day.

Not all of your data is necessary in every analysis, but a bunch of blood pressure measurements in the absence of contextual information is universally useless.

The metadata is part of the data.