I would see if you can just get an SQL or CSV dump of the tables, maybe it’s not super-normalized and you can get most of what you need in a table or two, or perhaps the provider would be so kind as to write a join for the data you need, and write a dump to a CSV file which you can the import in Excel and pursue / analyze to your heart’s content. That seems to be the easiest thing by far, to me anyway.
> On Jan 22, 2021, at 12:17 PM, Andromeda Yelton <[log in to unmask]> wrote:
>
> The initial commit in https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcode4lib%2Fshortimer%2F&data=04%7C01%7Csteven.j.turner%40ua.edu%7Ca7b50aed122a4cbb42bc08d8bf022f8d%7C2a00728ef0d040b4a4e8ce433f3fbca7%7C0%7C0%7C637469363394049896%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=k8a7Wvpbtq%2FJv5pJb5dsVRkLxm9i9yJ0S%2BfGmLy5OQM%3D&reserved=0 was November
> 2011, which is ten years for some values of ten. Taking a quick and
> noncomprehensive glance around, I see postings as old as 2005. I don't see
> an obvious API, but maybe a maintainer could weigh in about data dump
> possibilities?
>
> On Fri, Jan 22, 2021 at 11:28 AM Eric Lease Morgan <[log in to unmask]> wrote:
>
>> On Jan 22, 2021, at 11:11 AM, Jill Ellern <[log in to unmask]> wrote:
>>
>>> I'm doing some research into systems librarian duties and wondering if
>> there is an easy way to get a dump of the code4lib jobs from the last 10
>> years? In excel format?
>>
>>
>> Easy? I'd be surprised.
>>
>> There are two or three sources of the Code4Lib jobs data:
>>
>> 1. the underlying data from the jobs.code4lib.org site
>>
>> 2. any one of a number of different Code4Lib mailing list Web archives
>>
>> 3. the archived mailbox (mbox) files from the mailing list
>>
>> I don't think the jobs site has been around for ten years. Has it? Nor do
>> I know whether or not the data is archived. If it is, then I'd bet you will
>> be able get it in some sort of structured format like JSON or delimited
>> delimited format like Excel.
>>
>> Scraping different Web archives would require... scraping which,
>> personally, I run away from.
>>
>> Finally, the archived mbox files would be the most comprehensive, but a
>> programmer would have to parse the mbox (email) files, which is a
>> specialized task in and of itself. If you want to know where the mbox files
>> are located, then drop me a line and I'll let you know. Easy.
>>
>> Finally, what's the questions you would like to answer? How many system
>> librarian jobs have been posted? Where were the jobs? What are the
>> characteristics of systems librarianship and how have they changed over
>> time? How much they pay? Extracting some of this information from the
>> postings may be difficult, if not heroic in nature.
>>
>> --
>> Eric Morgan
>> University of Notre Dame
>
>
>
> --
> Andromeda Yelton
> Humanistic Machine Learning for Library Data
> Lecturer, San José State University iSchool
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fandromedayelton.com%2F&data=04%7C01%7Csteven.j.turner%40ua.edu%7Ca7b50aed122a4cbb42bc08d8bf022f8d%7C2a00728ef0d040b4a4e8ce433f3fbca7%7C0%7C0%7C637469363394049896%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yBAbEQzkJmwSiJI7pFNb9k%2F1LHMdgxerk67ERm%2B94ew%3D&reserved=0
> @ThatAndromeda
> <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftwitter.com%2FThatAndromeda&data=04%7C01%7Csteven.j.turner%40ua.edu%7Ca7b50aed122a4cbb42bc08d8bf022f8d%7C2a00728ef0d040b4a4e8ce433f3fbca7%7C0%7C0%7C637469363394049896%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=U4MuMPb8HiaJSUp8vG2BBdNz0PUDTx13nQ7BV9V7FXw%3D&reserved=0>
|