@John - Thanks, I'd be interested to learn more about the supportable pattern you mentioned if there are any readings you'd recommend. @Joe - Cheers, Andreas Rauber's presentation sounds particularly relevant. Do you have a link? @Colin - Thanks for the feedback, I do plan to take a closer look at JIRA. Dave On Fri, Mar 13, 2015 at 11:49 PM, Joe Hourcle <[log in to unmask] > wrote: > > > On Wed, 11 Mar 2015, davesgonechina wrote: > > Hi John, >> >> Good question - we're taking in XLS, CSV, JSON, XML, and on a bad day PDF >> of varying file sizes, each requiring different transformation and audit >> strategies, on both regular and irregular schedules. New batches often >> feature schema changes requiring modification to ingest procedures, which >> we're trying to automate as much as possible but obviously require a human >> chaperone. >> >> Mediawiki is our default choice at the moment, but then I would still be >> looking for a good workflow management model for the structure of the >> wiki, >> especially since in my experience wikis are often a graveyard for the best >> intentions. >> > > > A few places that you might try asking this question again, to see if you > can find a solution that better answers your question: > > > The American Society for Information Science & Technology's Research Data > Access & Preservation group. It has a lot of librarians & archivists in > it, as well as people from various research disiplines: > > http://mail.asis.org/mailman/listinfo/rdap > http://www.asis.org/rdap/ > > ... > > The Research Data Alliance has a number of groups that might be relevant. > Here are a few that I suspect are the best fit: > > Libraries for Research Data IG > https://rd-alliance.org/groups/libraries-research-data.html > > Reproducibility IG > https://rd-alliance.org/groups/reproducibility-ig.html > > Research Data Provenance IG > https://rd-alliance.org/groups/research-data-provenance.html > > Data Citation WG > (as this fits into their 'dynamic data' problem) > https://rd-alliance.org/groups/data-citation-wg.html > > ('IG' is 'Interest Group', which are long-lived. 'WG' is 'Working Group' > which are formed to solve a specific problem and then disband) > > The group 'Publishing Data Workflows' might seem to be appropriate but > it's actually 'Workflows for Publishing Data' not 'Publishing of Data > Workflows' (which falls under 'Data Provenance' and 'Data Citation') > > There was a presentation at the meeting earlier this week by Andreas > Rauber in the Data Citation group on workflows using git or SQL databases > to be able to track appending or modification for CSV and similar ASCII > files. > > ... > > Also, I would consider this to be on-topic for Stack Exchange's "Open > Data" site (and I'm one of the moderators for the site): > > http://opendata.stackexchange.com/ > > -Joe > > > > > > > On Tue, Mar 10, 2015 at 8:10 PM, Scancella, John <[log in to unmask]> wrote: >> >> Dave, >>> >>> How are you getting the metadata streams? Are they actual stream objects, >>> or files, or database dumps, etc? >>> >>> As for the tools, I have used a number of the ones you listed below. I >>> personally prefer JIRA (and it is free for non-profit). If you are ok if >>> editing in wiki syntax I would recommend mediaWiki (it is what powers >>> Wikipedia). You could also take a look at continuous deployment >>> technologies like Virtual Machines (virtualbox), linux containers >>> (docker), >>> and rapid deployment tools (ansible, salt). Of course if you are doing >>> lots >>> of code changes you will want to test all of this continually (Jenkins). >>> >>> John Scancella >>> Library of Congress, OSI >>> >>> -----Original Message----- >>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of >>> davesgonechina >>> Sent: Tuesday, March 10, 2015 6:05 AM >>> To: [log in to unmask] >>> Subject: [CODE4LIB] Data Lifecycle Tracking & Documentation Tools >>> >>> Hi all, >>> >>> One of my projects involves harvesting, cleaning and transforming steady >>> streams of metadata from numerous publishers. It's an infinite loop but >>> every cycle can be a little bit or significantly different. Many issue >>> tracking tools are designed for a linear progression that ends in >>> deployment, not a circular workflow, and I've not hit upon a tool or use >>> strategy that really fits. >>> >>> The best illustration I've found so far of the type of workflow I'm >>> talking about is the DCC Curation Lifecycle Model < >>> http://www.dcc.ac.uk/sites/default/files/documents/ >>> publications/DCCLifecycle.pdf >>> >>>> >>>> . >>> >>> Here are some things I've tried or thought about trying: >>> >>> - Git comments >>> - Github Issues >>> - MySQL comments >>> - Bash script logs >>> - JIRA >>> - Trac >>> - Trello >>> - Wiki >>> - Unfuddle >>> - Redmine >>> - Zendesk >>> - Request Tracker >>> - Basecamp >>> - Asana >>> >>> Thoughts? >>> >>> Dave >>> >>> >>