Print

Print


@John - Thanks, I'd be interested to learn more about the supportable
pattern you mentioned if there are any readings you'd recommend.

@Joe - Cheers, Andreas Rauber's presentation sounds particularly relevant.
Do you have a link?

@Colin - Thanks for the feedback, I do plan to take a closer look at JIRA.

Dave

On Fri, Mar 13, 2015 at 11:49 PM, Joe Hourcle <[log in to unmask]
> wrote:

>
>
> On Wed, 11 Mar 2015, davesgonechina wrote:
>
>  Hi John,
>>
>> Good question - we're taking in XLS, CSV, JSON, XML, and on a bad day PDF
>> of varying file sizes, each requiring different transformation and audit
>> strategies, on both regular and irregular schedules. New batches often
>> feature schema changes requiring modification to ingest procedures, which
>> we're trying to automate as much as possible but obviously require a human
>> chaperone.
>>
>> Mediawiki is our default choice at the moment, but then I would still be
>> looking for a good workflow management model for the structure of the
>> wiki,
>> especially since in my experience wikis are often a graveyard for the best
>> intentions.
>>
>
>
> A few places that you might try asking this question again, to see if you
> can find a solution that better answers your question:
>
>
> The American Society for Information Science & Technology's Research Data
> Access & Preservation group.  It has a lot of librarians & archivists in
> it, as well as people from various research disiplines:
>
>         http://mail.asis.org/mailman/listinfo/rdap
>         http://www.asis.org/rdap/
>
> ...
>
> The Research Data Alliance has a number of groups that might be relevant.
> Here are a few that I suspect are the best fit:
>
>         Libraries for Research Data IG
>         https://rd-alliance.org/groups/libraries-research-data.html
>
>         Reproducibility IG
>         https://rd-alliance.org/groups/reproducibility-ig.html
>
>         Research Data Provenance IG
>         https://rd-alliance.org/groups/research-data-provenance.html
>
>         Data Citation WG
>         (as this fits into their 'dynamic data' problem)
>         https://rd-alliance.org/groups/data-citation-wg.html
>
> ('IG' is 'Interest Group', which are long-lived.  'WG' is 'Working Group'
> which are formed to solve a specific problem and then disband)
>
> The group 'Publishing Data Workflows' might seem to be appropriate but
> it's actually 'Workflows for Publishing Data' not 'Publishing of Data
> Workflows' (which falls under 'Data Provenance' and 'Data Citation')
>
> There was a presentation at the meeting earlier this week by Andreas
> Rauber in the Data Citation group on workflows using git or SQL databases
> to be able to track appending or modification for CSV and similar ASCII
> files.
>
> ...
>
> Also, I would consider this to be on-topic for Stack Exchange's "Open
> Data" site  (and I'm one of the moderators for the site):
>
>         http://opendata.stackexchange.com/
>
> -Joe
>
>
>
>
>
>
>  On Tue, Mar 10, 2015 at 8:10 PM, Scancella, John <[log in to unmask]> wrote:
>>
>>  Dave,
>>>
>>> How are you getting the metadata streams? Are they actual stream objects,
>>> or files, or database dumps, etc?
>>>
>>> As for the tools, I have used a number of the ones you listed below. I
>>> personally prefer JIRA (and it is free for non-profit). If you are ok if
>>> editing in wiki syntax I would recommend mediaWiki (it is what powers
>>> Wikipedia). You could also take a look at continuous deployment
>>> technologies like Virtual Machines (virtualbox), linux containers
>>> (docker),
>>> and rapid deployment tools (ansible, salt). Of course if you are doing
>>> lots
>>> of code changes you will want to test all of this continually (Jenkins).
>>>
>>> John Scancella
>>> Library of Congress, OSI
>>>
>>> -----Original Message-----
>>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
>>> davesgonechina
>>> Sent: Tuesday, March 10, 2015 6:05 AM
>>> To: [log in to unmask]
>>> Subject: [CODE4LIB] Data Lifecycle Tracking & Documentation Tools
>>>
>>> Hi all,
>>>
>>> One of my projects involves harvesting, cleaning and transforming steady
>>> streams of metadata from numerous publishers. It's an infinite loop but
>>> every cycle can be a little bit or significantly different. Many issue
>>> tracking tools are designed for a linear progression that ends in
>>> deployment, not a circular workflow, and I've not hit upon a tool or use
>>> strategy that really fits.
>>>
>>> The best illustration I've found so far of the type of workflow I'm
>>> talking about is the DCC Curation Lifecycle Model <
>>> http://www.dcc.ac.uk/sites/default/files/documents/
>>> publications/DCCLifecycle.pdf
>>>
>>>>
>>>>  .
>>>
>>> Here are some things I've tried or thought about trying:
>>>
>>>    - Git comments
>>>    - Github Issues
>>>    - MySQL comments
>>>    - Bash script logs
>>>    - JIRA
>>>    - Trac
>>>    - Trello
>>>    - Wiki
>>>    - Unfuddle
>>>    - Redmine
>>>    - Zendesk
>>>    - Request Tracker
>>>    - Basecamp
>>>    - Asana
>>>
>>> Thoughts?
>>>
>>> Dave
>>>
>>>
>>