Thanks again all,
I love OpenRefine - I've been working on the GOKb project (http://gokb.org) where K-Int (a UK based company) have developed an extension for OpenRefine which helps with the cleaning of data about electronic resources (esp. journals) from publishers and then pushes it into the GOKb database. The extension is fully integrated into the GOKb database but if anyone wants a look code is at https://github.com/k-int/gokb-phase1/tree/dev/refine. The extension checks the data and reports errors as well as offering ways of fixing common issues - there's more on the wiki https://wiki.kuali.org/display/OLE/OpenRefine+How-Tos
I did pitch an OpenRefine workshop for the same event as a 'data wrangling/cleaning' tool but the 'automation' session got the vote in the end - although there is definitely overlap. However I am delivering an OpenRefine workshop at the British Library next week - and great to see it is getting used across libraries.
The Google Doc Spreadsheets is also a great tip - I've run a course at the British Library which uses this to introduce the concept of APIs to non-techies. I blogged the original tutorial at http://www.meanboyfriend.com/overdue_ideas/2013/02/introduction-to-apis/ but a change to the BL open data platform means this no longer works :((
Thanks all again - I'll be trying to put stuff from the automation workshop online at some point and I'll post here when there is something up.
Best wishes,
Owen
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: [log in to unmask]
Telephone: 0121 288 6936
On 8 Jul 2014, at 03:52, davesgonechina <[log in to unmask]> wrote:
> +1 to OpenRefine. Some extensions, like RDF Refine <http://refine.deri.ie/>,
> currently only work with the old Google Refine (still available here
> <https://code.google.com/p/google-refine/>). There's a good deal of
> interesting projects for OpenRefine on GitHub and GitHub Gist.
>
> Google Docs Spreadsheets also has a surprising amount of functionality,
> such as importXML if you're willing to get your hands dirty with regular
> expressions.
>
> Dave
>
>
> On Tue, Jul 8, 2014 at 3:12 AM, Tillman, Ruth K. (GSFC-272.0)[CADENCE GROUP
> ASSOC] <[log in to unmask]> wrote:
>
>> Definite cosign on Open Refine. It's intuitive and spreadsheet-like enough
>> that a lot of people can understand it. You can do anything from
>> standardizing state names you get from a patron form to normalizing
>> metadata keywords for a database, so I think it'd be useful even for
>> non-techies.
>>
>> Ruth Kitchin Tillman
>> Metadata Librarian, Cadence Group
>> NASA Goddard Space Flight Center Library, Code 272
>> Greenbelt, MD 20771
>> Goddard Library Repository: http://gsfcir.gsfc.nasa.gov/
>> 301.286.6246
>>
>>
>> -----Original Message-----
>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
>> Terry Brady
>> Sent: Monday, July 07, 2014 1:35 PM
>> To: [log in to unmask]
>> Subject: Re: [CODE4LIB] 'automation' tools
>>
>> I learned about Open Refine <http://openrefine.org/> at the Code4Lib
>> conference, and it looks like it would be a great tool for normalizing
>> data. I worked on a few projects in the past in which this would have been
>> very helpful.
>>
|