In re Github facets - some projects do use labels to specifically indicate issues that may be beginner-friendly. The specific terms they use vary but you can find a searchable/filterable aggregation at http://up-for-grabs.net/ . Also OpenHatch is a friendly community aimed at getting people involved, and they have a place you can search for issues that might be relevant (e.g. https://openhatch.org/search/?language=Python&q= ). On Fri, Nov 3, 2017 at 9:29 AM, Julie Swierczek <[log in to unmask]> wrote: > Dan, > > This. Is. Awesome. And your interpretation of the date string "1933, > 1937-1938, 1941" is correct - I meant to say it should be 1933/1941. This > sort of error is exactly why I wanted to approach this programmatically, > and not type the dates by hand. I used student employees to copy the data > from the HTML pages into spreadsheets, and to check for spelling errors. > However, I didn't want to use students to type the dates. I feel like that > would be risking the creation of too much metacrap. I can't even type them > correctly myself, so I can't expect students to have 100% accuracy, either. > > Also, for anyone else following from home, I have to say why I love this > solution compared to all the others. > > 1) I have over 400 spreadsheets, some with over 1000 lines. While I > *could* use OpenRefine or Excel for a certain amount of date cleaning, that > assumes I am interested in - and have the time for - opening each file > individually and working on the dates one spreadsheet at a time. I can set > this script up to run through a bunch of csv files. I don't need to look at > them. (And, yes, I know how to set up a task in OpenRefine and save it and > use it again later - and I was working on building one of those - but that > is more time consuming than I want this task to be.) > > 2) This doesn't' use Ruby or perl or other tools that I don't know and > don't have time to use now. I said I can handle basic Python, and that's > what this is. > > 3) This is written simply and clearly, and doesn't do too much of 'let's > prove how awesome I am by using as few lines of code as possible', which is > really hard for newbies to interpret and change. (You know what I'm > talking about - something that a newbie would write in 200 lines and > someone else says, "Yeah, you idiot, I can do that in two lines". Cf. ALL > OF STACK OVERFLOW.) > > 4) Building on point number 3, this is written simply and clearly enough > that I can figure out how to modify it further if I come across any other > date cases that I haven't discovered so far. I would even feel confident > enough to submit a pull request if I do develop solutions for other date > formats for this. > > 5) Further, this is written simply and clearly enough that I can use this > as a model for figuring out how to write other Python stuff to handle other > similar tasks. This is now my favorite thing in all of GitHub. (I wish > GitHub had a special facet for 'newbie friendly' stuff. I know that is > somewhat subjective, but I can't tell you how many 'easy' tools that have > been recommended to me that would take me roughly a week to figure out how > to run once, and possibly another month of trying to troubleshoot error > messages to get it to actually work. Cf. http://tpverso.com/an-open- > letter-to-open-source-projects-for-lams/) > > I again want to thank Dan for this code and I also want to commend it to > everyone else's attention as the sort of code that is really friendly to > newbies. If you are thinking of writing a tool and you want to be able to > share it with institutions of all sizes, with a really low barrier to entry > (e.g., the knowledge of how to put a .py file in a directory, change the > filename in the .py file, and then run 'python test.py'), then this is a > good model of how code should be written. Also, while I am on my soapbox, > here's a great model for documentation: https://github.com/ > CarletonArchives/BagBatch. > > Thus Endeth the Lecture. > > Dan, thanks again. This just made my semester. > > Julie Swierczek > Transformer of Dates > -- Andromeda Yelton Senior Software Engineer, MIT Libraries: https://libraries.mit.edu/ President, Library & Information Technology Association: http://www.lita.org http://andromedayelton.com @ThatAndromeda <http://twitter.com/ThatAndromeda>