@John - Thanks, I'd be interested to learn more about the supportable
pattern you mentioned if there are any readings you'd recommend.
@Joe - Cheers, Andreas Rauber's presentation sounds particularly relevant.
Do you have a link?
@Colin - Thanks for the feedback, I do plan to take a closer look at JIRA.
Dave
On Fri, Mar 13, 2015 at 11:49 PM, Joe Hourcle <[log in to unmask]
> wrote:
>
>
> On Wed, 11 Mar 2015, davesgonechina wrote:
>
> Hi John,
>>
>> Good question - we're taking in XLS, CSV, JSON, XML, and on a bad day PDF
>> of varying file sizes, each requiring different transformation and audit
>> strategies, on both regular and irregular schedules. New batches often
>> feature schema changes requiring modification to ingest procedures, which
>> we're trying to automate as much as possible but obviously require a human
>> chaperone.
>>
>> Mediawiki is our default choice at the moment, but then I would still be
>> looking for a good workflow management model for the structure of the
>> wiki,
>> especially since in my experience wikis are often a graveyard for the best
>> intentions.
>>
>
>
> A few places that you might try asking this question again, to see if you
> can find a solution that better answers your question:
>
>
> The American Society for Information Science & Technology's Research Data
> Access & Preservation group. It has a lot of librarians & archivists in
> it, as well as people from various research disiplines:
>
> http://mail.asis.org/mailman/listinfo/rdap
> http://www.asis.org/rdap/
>
> ...
>
> The Research Data Alliance has a number of groups that might be relevant.
> Here are a few that I suspect are the best fit:
>
> Libraries for Research Data IG
> https://rd-alliance.org/groups/libraries-research-data.html
>
> Reproducibility IG
> https://rd-alliance.org/groups/reproducibility-ig.html
>
> Research Data Provenance IG
> https://rd-alliance.org/groups/research-data-provenance.html
>
> Data Citation WG
> (as this fits into their 'dynamic data' problem)
> https://rd-alliance.org/groups/data-citation-wg.html
>
> ('IG' is 'Interest Group', which are long-lived. 'WG' is 'Working Group'
> which are formed to solve a specific problem and then disband)
>
> The group 'Publishing Data Workflows' might seem to be appropriate but
> it's actually 'Workflows for Publishing Data' not 'Publishing of Data
> Workflows' (which falls under 'Data Provenance' and 'Data Citation')
>
> There was a presentation at the meeting earlier this week by Andreas
> Rauber in the Data Citation group on workflows using git or SQL databases
> to be able to track appending or modification for CSV and similar ASCII
> files.
>
> ...
>
> Also, I would consider this to be on-topic for Stack Exchange's "Open
> Data" site (and I'm one of the moderators for the site):
>
> http://opendata.stackexchange.com/
>
> -Joe
>
>
>
>
>
>
> On Tue, Mar 10, 2015 at 8:10 PM, Scancella, John <[log in to unmask]> wrote:
>>
>> Dave,
>>>
>>> How are you getting the metadata streams? Are they actual stream objects,
>>> or files, or database dumps, etc?
>>>
>>> As for the tools, I have used a number of the ones you listed below. I
>>> personally prefer JIRA (and it is free for non-profit). If you are ok if
>>> editing in wiki syntax I would recommend mediaWiki (it is what powers
>>> Wikipedia). You could also take a look at continuous deployment
>>> technologies like Virtual Machines (virtualbox), linux containers
>>> (docker),
>>> and rapid deployment tools (ansible, salt). Of course if you are doing
>>> lots
>>> of code changes you will want to test all of this continually (Jenkins).
>>>
>>> John Scancella
>>> Library of Congress, OSI
>>>
>>> -----Original Message-----
>>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
>>> davesgonechina
>>> Sent: Tuesday, March 10, 2015 6:05 AM
>>> To: [log in to unmask]
>>> Subject: [CODE4LIB] Data Lifecycle Tracking & Documentation Tools
>>>
>>> Hi all,
>>>
>>> One of my projects involves harvesting, cleaning and transforming steady
>>> streams of metadata from numerous publishers. It's an infinite loop but
>>> every cycle can be a little bit or significantly different. Many issue
>>> tracking tools are designed for a linear progression that ends in
>>> deployment, not a circular workflow, and I've not hit upon a tool or use
>>> strategy that really fits.
>>>
>>> The best illustration I've found so far of the type of workflow I'm
>>> talking about is the DCC Curation Lifecycle Model <
>>> http://www.dcc.ac.uk/sites/default/files/documents/
>>> publications/DCCLifecycle.pdf
>>>
>>>>
>>>> .
>>>
>>> Here are some things I've tried or thought about trying:
>>>
>>> - Git comments
>>> - Github Issues
>>> - MySQL comments
>>> - Bash script logs
>>> - JIRA
>>> - Trac
>>> - Trello
>>> - Wiki
>>> - Unfuddle
>>> - Redmine
>>> - Zendesk
>>> - Request Tracker
>>> - Basecamp
>>> - Asana
>>>
>>> Thoughts?
>>>
>>> Dave
>>>
>>>
>>
|