> Well, the problem is when the original Marc4J author took the spec at it's
> word, and actually _acted upon_ the '4' and the '5', changing file semantics
> if they were different, and throwing an exception if it was a non-digit.
>
At least the author actually used the values rather than checking to see if
a 4 or 5 were there. I still don't see what the point of looking for a 0 in
an undefined field would be. I'm wondering what kind of nut job would write
this into the standard, but that's not the author's problem.
> Do you think he got it wrong? How was he supposed to know he got it wrong,
> he wrote to the spec and took it at it's word. Are you SURE there aren't any
> Marc formats other than Marc21 out there that actually do use these bytes
> with their intended meaning, instead of fixing them?
I wouldn't call it wrong -- the spec is a logical point of departure. MARC21
derives from an ISO standard that does not use those character positions and
which otherwise requires the same data layout, but the author wouldn't
necessarily know that.
Standards have something in common with laws in that how they are used in
the real world is as or more important than what is actually defined --
what's written and what's done in practice can be very different.
Everyone here who has parsed catalog data who has done an ILS migration
knows better than to just think for a second that fields can be assumed to
be used as defined except for very basic stuff.
> How was the Marc4J author supposed to be sure of that, or even guess it
> might be the case, and know he'd be serving users better by ignoring the
> spec here instead of following it?
There might not have been a good way to know. With data, one thing you
always want to do is ask a bunch of people who work with it all the time
about anomalies in the wild. Many great works of fiction masquerade as
documents which supposedly describe reality.
> Ie: I _thought_ I was writing only for Marc21, but then it turns out I've
> got to accept records from Outer Weirdistan that are a kind of legal Marc
> that actually uses those bytes for their intended meaning....
Any such MARC as it would be noncompliant with the ISO standard from which
MARC21 hails. If working from the MARC21 standard and weird records are in
question, there would be a greater chance of choking on nonumeric tags as
those are allowed by the ISO standard.
Ignoring that MARC21 would need to be redefined to be able to take on other
values, one can safely conclude that such a redefinition could only be
written by totally deranged individuals. Values lower than 4 and 5
respectively would limit record length to the point little or no data could
be stored, and greater values would be completely nonsensical as the MARC
record length limitation would mean that the extra space allocated by the
digits could only contain zeros.
In any case, MARC is a legacy standard from the 60's. The chances of new
flavors emerging are dismal at best.
> Again, I realize in the actual environment we've got, this is not a luxury
> we have. But it's a fault, not a benefit, to have lots of software
> everywhere behaving in non-compliant ways and creating invalid (according to
> the spec!) data.
>
Creating is another matter entirely. Since we can control what we create
ourselves, we make things a little better every time we make things
comformant. However, we can't control what others do and being able to read
everything is useful, including stuff created using tools/processes that
aren't up to scratch.
kyle
|