Print

Print


On Wed, 11 Mar 2015, davesgonechina wrote:

> Hi John,
>
> Good question - we're taking in XLS, CSV, JSON, XML, and on a bad day PDF
> of varying file sizes, each requiring different transformation and audit
> strategies, on both regular and irregular schedules. New batches often
> feature schema changes requiring modification to ingest procedures, which
> we're trying to automate as much as possible but obviously require a human
> chaperone.
>
> Mediawiki is our default choice at the moment, but then I would still be
> looking for a good workflow management model for the structure of the wiki,
> especially since in my experience wikis are often a graveyard for the best
> intentions.


A few places that you might try asking this question again, to see if you 
can find a solution that better answers your question:


The American Society for Information Science & Technology's Research Data 
Access & Preservation group.  It has a lot of librarians & archivists in 
it, as well as people from various research disiplines:

 	http://mail.asis.org/mailman/listinfo/rdap
 	http://www.asis.org/rdap/

...

The Research Data Alliance has a number of groups that might be relevant. 
Here are a few that I suspect are the best fit:

 	Libraries for Research Data IG
 	https://rd-alliance.org/groups/libraries-research-data.html

 	Reproducibility IG
 	https://rd-alliance.org/groups/reproducibility-ig.html

 	Research Data Provenance IG
 	https://rd-alliance.org/groups/research-data-provenance.html

 	Data Citation WG
 	(as this fits into their 'dynamic data' problem)
 	https://rd-alliance.org/groups/data-citation-wg.html

('IG' is 'Interest Group', which are long-lived.  'WG' is 'Working Group' 
which are formed to solve a specific problem and then disband)

The group 'Publishing Data Workflows' might seem to be appropriate but 
it's actually 'Workflows for Publishing Data' not 'Publishing of Data 
Workflows' (which falls under 'Data Provenance' and 'Data Citation')

There was a presentation at the meeting earlier this week by Andreas 
Rauber in the Data Citation group on workflows using git or SQL databases 
to be able to track appending or modification for CSV and similar ASCII 
files.

...

Also, I would consider this to be on-topic for Stack Exchange's "Open 
Data" site  (and I'm one of the moderators for the site):

 	http://opendata.stackexchange.com/

-Joe





> On Tue, Mar 10, 2015 at 8:10 PM, Scancella, John <[log in to unmask]> wrote:
>
>> Dave,
>>
>> How are you getting the metadata streams? Are they actual stream objects,
>> or files, or database dumps, etc?
>>
>> As for the tools, I have used a number of the ones you listed below. I
>> personally prefer JIRA (and it is free for non-profit). If you are ok if
>> editing in wiki syntax I would recommend mediaWiki (it is what powers
>> Wikipedia). You could also take a look at continuous deployment
>> technologies like Virtual Machines (virtualbox), linux containers (docker),
>> and rapid deployment tools (ansible, salt). Of course if you are doing lots
>> of code changes you will want to test all of this continually (Jenkins).
>>
>> John Scancella
>> Library of Congress, OSI
>>
>> -----Original Message-----
>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
>> davesgonechina
>> Sent: Tuesday, March 10, 2015 6:05 AM
>> To: [log in to unmask]
>> Subject: [CODE4LIB] Data Lifecycle Tracking & Documentation Tools
>>
>> Hi all,
>>
>> One of my projects involves harvesting, cleaning and transforming steady
>> streams of metadata from numerous publishers. It's an infinite loop but
>> every cycle can be a little bit or significantly different. Many issue
>> tracking tools are designed for a linear progression that ends in
>> deployment, not a circular workflow, and I've not hit upon a tool or use
>> strategy that really fits.
>>
>> The best illustration I've found so far of the type of workflow I'm
>> talking about is the DCC Curation Lifecycle Model <
>> http://www.dcc.ac.uk/sites/default/files/documents/publications/DCCLifecycle.pdf
>>>
>> .
>>
>> Here are some things I've tried or thought about trying:
>>
>>    - Git comments
>>    - Github Issues
>>    - MySQL comments
>>    - Bash script logs
>>    - JIRA
>>    - Trac
>>    - Trello
>>    - Wiki
>>    - Unfuddle
>>    - Redmine
>>    - Zendesk
>>    - Request Tracker
>>    - Basecamp
>>    - Asana
>>
>> Thoughts?
>>
>> Dave
>>
>