I've done a couple projects mining the data from the code4lib listserv
(e.g. https://ejournals.bc.edu/index.php/ital/article/view/5893 ). Both
times the fastest route was finding helpful folks involved in it to provide
me with a data dump vs. spending time on a scraper.
The most recent work I did was in 2018 - I have a tarball of all the
message log files for the listserv (some will be job posts and others not)
which is 2003 through 2018. I believe I asked about this on the c4l Slack
at the time and Wayne Graham from CLIR kindly helped me out with the data!
This data is not anonymized (as it was/is publically available with names
and emails associated) but I did anonymize the findings for reporting.
Ellen - I'd be happy to chat sometime about how I mined the data for job
titles and related skills/technologies, feel free to reach out to me
Monica Maceli, Ph.D.
Pratt Institute | School of Information
144 W 14th St, 6th Floor, New York, NY, 10011-7301
www.monicamaceli.com | [log in to unmask]
On Fri, Jan 22, 2021 at 1:18 PM Andromeda Yelton <[log in to unmask]>
> The initial commit in https://github.com/code4lib/shortimer/ was November
> 2011, which is ten years for some values of ten. Taking a quick and
> noncomprehensive glance around, I see postings as old as 2005. I don't see
> an obvious API, but maybe a maintainer could weigh in about data dump
> On Fri, Jan 22, 2021 at 11:28 AM Eric Lease Morgan <[log in to unmask]> wrote:
> > On Jan 22, 2021, at 11:11 AM, Jill Ellern <[log in to unmask]> wrote:
> > > I'm doing some research into systems librarian duties and wondering if
> > there is an easy way to get a dump of the code4lib jobs from the last 10
> > years? In excel format?
> > Easy? I'd be surprised.
> > There are two or three sources of the Code4Lib jobs data:
> > 1. the underlying data from the jobs.code4lib.org site
> > 2. any one of a number of different Code4Lib mailing list Web archives
> > 3. the archived mailbox (mbox) files from the mailing list
> > I don't think the jobs site has been around for ten years. Has it? Nor do
> > I know whether or not the data is archived. If it is, then I'd bet you
> > be able get it in some sort of structured format like JSON or delimited
> > delimited format like Excel.
> > Scraping different Web archives would require... scraping which,
> > personally, I run away from.
> > Finally, the archived mbox files would be the most comprehensive, but a
> > programmer would have to parse the mbox (email) files, which is a
> > specialized task in and of itself. If you want to know where the mbox
> > are located, then drop me a line and I'll let you know. Easy.
> > Finally, what's the questions you would like to answer? How many system
> > librarian jobs have been posted? Where were the jobs? What are the
> > characteristics of systems librarianship and how have they changed over
> > time? How much they pay? Extracting some of this information from the
> > postings may be difficult, if not heroic in nature.
> > --
> > Eric Morgan
> > University of Notre Dame
> Andromeda Yelton
> Humanistic Machine Learning for Library Data
> Lecturer, San José State University iSchool