On Wed, Mar 2, 2011 at 11:12 AM, Roy Tennant <[log in to unmask]> wrote:
> Godmar,
> I'm surprised you're asking this. Most of the questions you want
> answered could be answered by a basic programming construct: an
> if-then-else statement and a simple decision about what you want to
> use in your specific application (for example, do you prefer "text"
> with the period, or not?). About the only question that such a
> solution wouldn't deal with is "which fields are derived from which
> others", which strikes me as superfluous to your application if you
> know a hierarchy of preference. But perhaps I'm missing something
> here.
I'm not asking how to code it, I'm asking for the algorithm I should
use, given the fact that I'm not familiar with the provenance and
status of the data Summon returns (which, I understand, is a mixture
of original, harvested data, and "cleaned-up", processed data.)
Can you suggest such an algorithm, given the fact that each of the 8
elements I showed in the example (PublicationDateYear,
PublicationDateDecade, PublicationDate, PublicationDateCentury,
PublicationDate_xml.text, PublicationDate_xml.day,
PublicationDate_xml.month, PublicationDate_xml.year is optional? But
wait ---- I think I've also seen records where there is a
PublicationDateMonth, and records where some values have arrays of
length > 1.
Can you suggest, or at least outline, such an algorithm?
It would be helpful to know, for instance, if the presence of a
PublicationDate_xml field supplants any other PublicationDate* fields
(does it?) If a PublicationDate_xml field is absent, which field
would I want to look at next? Is PublicationDate more reliable than a
combination of PublicationDateYear and PublicationDateMonth (and
perhaps PublicationDateDay if it exists?)?
If the PublicationDate_xml is present, then: should I prefer the .text
option? What's the significance of that dot? Is it spurious, like the
identifier you mentioned you find in raw MARC records? If not, what,
if anything, is known about the presence of the other fields? What if
multiple fields are given in an array? Is the ordering significant
(e.g., the first one is more trustworthy?) Or should I sort them based
on a heuristics? (e.g., if "20100523" and "201005" is given, prefer
the former?) What if the data is contradictory?
These are the questions I'm seeking answers to; I know that those of
you who have coded their own Summon front-ends must have faced the
same questions when implementing their record displays.
- Godmar
|