Hi all, I've done a couple projects mining the data from the code4lib listserv (e.g. https://ejournals.bc.edu/index.php/ital/article/view/5893 ). Both times the fastest route was finding helpful folks involved in it to provide me with a data dump vs. spending time on a scraper. The most recent work I did was in 2018 - I have a tarball of all the message log files for the listserv (some will be job posts and others not) which is 2003 through 2018. I believe I asked about this on the c4l Slack at the time and Wayne Graham from CLIR kindly helped me out with the data! This data is not anonymized (as it was/is publically available with names and emails associated) but I did anonymize the findings for reporting. Ellen - I'd be happy to chat sometime about how I mined the data for job titles and related skills/technologies, feel free to reach out to me directly! Best, Monica Maceli, Ph.D. Associate Professor Pratt Institute | School of Information 144 W 14th St, 6th Floor, New York, NY, 10011-7301 www.monicamaceli.com | [log in to unmask] On Fri, Jan 22, 2021 at 1:18 PM Andromeda Yelton <[log in to unmask]> wrote: > The initial commit in https://github.com/code4lib/shortimer/ was November > 2011, which is ten years for some values of ten. Taking a quick and > noncomprehensive glance around, I see postings as old as 2005. I don't see > an obvious API, but maybe a maintainer could weigh in about data dump > possibilities? > > On Fri, Jan 22, 2021 at 11:28 AM Eric Lease Morgan <[log in to unmask]> wrote: > > > On Jan 22, 2021, at 11:11 AM, Jill Ellern <[log in to unmask]> wrote: > > > > > I'm doing some research into systems librarian duties and wondering if > > there is an easy way to get a dump of the code4lib jobs from the last 10 > > years? In excel format? > > > > > > Easy? I'd be surprised. > > > > There are two or three sources of the Code4Lib jobs data: > > > > 1. the underlying data from the jobs.code4lib.org site > > > > 2. any one of a number of different Code4Lib mailing list Web archives > > > > 3. the archived mailbox (mbox) files from the mailing list > > > > I don't think the jobs site has been around for ten years. Has it? Nor do > > I know whether or not the data is archived. If it is, then I'd bet you > will > > be able get it in some sort of structured format like JSON or delimited > > delimited format like Excel. > > > > Scraping different Web archives would require... scraping which, > > personally, I run away from. > > > > Finally, the archived mbox files would be the most comprehensive, but a > > programmer would have to parse the mbox (email) files, which is a > > specialized task in and of itself. If you want to know where the mbox > files > > are located, then drop me a line and I'll let you know. Easy. > > > > Finally, what's the questions you would like to answer? How many system > > librarian jobs have been posted? Where were the jobs? What are the > > characteristics of systems librarianship and how have they changed over > > time? How much they pay? Extracting some of this information from the > > postings may be difficult, if not heroic in nature. > > > > -- > > Eric Morgan > > University of Notre Dame > > > > -- > Andromeda Yelton > Humanistic Machine Learning for Library Data > Lecturer, San José State University iSchool > https://andromedayelton.com > @ThatAndromeda > <http://twitter.com/ThatAndromeda> >