I would also add somewhere links to the definitions / standards for each of these files types. Not everyone who encounters MARC can be expected to know all the other acronyms-as-file-formats. cheers stuart -- ...let us be heard from red core to black sky On Tue, 16 Apr 2019 at 08:58, Kyle Banerjee <[log in to unmask]> wrote: > > On Mon, Apr 15, 2019 at 11:20 AM Thomas Dunbar <[log in to unmask]> wrote: > > > Hello everyone, > > > > I'm working on a proof of concept web application for common library data > > conversions with support for large files. > > The application is build using a serverless architecture, which allows me > > do this at scale and at low cost. > > > > Love the concept -- I tried a few conversions, including some north of > 200MB. Overall, it worked impressively. Not having to download software is > cool because you don't always have the ability to download software or > might need to do something using a cell phone. > > For me personally, the chief needs driving conversions are : 1) To perform > fixes in a format that's easier to work with (e.g. no one fixes in binary > MARC) and convert back; 2) analysis -- i.e. identify records/elements that > have or don't have X; and 3) migrations (which have required further > manipulation in every single case). In other words, manipulations and > partial extractions. In the context of these use cases, delimited text, > plain text, XML, MARC, and JSON (to a lesser extent) dominate conversion > needs. > > Regarding the MARC to text conversion, delimited text conversions need a > subdelimiter for repeated fields as this is what often must be loaded into > another system, presented in a table to someone, etc. -- the current method > which adds more lines will cause trouble for anyone without coding skills. > On a related note, considering the indicators part of the field makes > philosophical sense but it creates practical problems (especially with > nonrepeatable fields). For example, it scatters the 245 titles over as many > indicator variations that exist making the simple task of generating a list > of titles trickier than it should be. MARC already has a huge number of > fields, so when the indicator permutations are combined with separate > fielding for repeated fields, it takes no time at all to get many hundreds > of fields with files that aren't that big -- something headache inducing > even for those with mad skilz. > > One thing you'll want to think about as you develop the tool is what the > people use it to accomplish. In my experience, conversions set you up for > what you were really doing rather than being objectives in their own right. > > But again, very cool. > > kyle