On Mon, Apr 15, 2019 at 11:20 AM Thomas Dunbar <[log in to unmask]> wrote:

> Hello everyone,
> I'm working on a proof of concept web application for common library data
> conversions with support for large files.
> The application is build using a serverless architecture, which allows me
> do this at scale and at low cost.

Love the concept -- I tried a few conversions, including some north of
200MB. Overall, it worked impressively. Not having to download software is
cool because you don't always have the ability to download software or
might need to do something using a cell phone.

For me personally, the chief needs driving conversions are : 1) To perform
fixes in a format that's easier to work with (e.g. no one fixes in binary
MARC) and convert back; 2) analysis -- i.e. identify records/elements that
have or don't have X; and 3) migrations (which have required further
manipulation in every single case). In other words, manipulations and
partial extractions. In the context of these use cases, delimited text,
plain text, XML, MARC, and JSON (to a lesser extent) dominate conversion

Regarding the MARC to text conversion, delimited text conversions need a
subdelimiter for repeated fields as this is what often must be loaded into
another system, presented in a table to someone, etc. -- the current method
which adds more lines will cause trouble for anyone without coding skills.
On a related note, considering the indicators part of the field makes
philosophical sense but it creates practical problems (especially with
nonrepeatable fields). For example, it scatters the 245 titles over as many
indicator variations that exist making the simple task of generating a list
of titles trickier than it should be. MARC already has a huge number of
fields, so when the indicator permutations are combined with separate
fielding for repeated fields, it takes no time at all to get many hundreds
of fields with files that aren't that big -- something headache inducing
even for those with mad skilz.

One thing you'll want to think about as you develop the tool is what the
people use it to accomplish. In my experience, conversions set you up for
what you were really doing rather than being objectives in their own right.

But again, very cool.