Print

Print


Data Engineer
DocumentCloud
Columbia

We're looking for a data engineer to join the growing team at DocumentCloud!
If you'd enjoy a chance to help develop the next generation of our service --
an open-source civic platform that more than 1,000 news organizations use to
analyze, annotate and publish documents for the public good -- we'd love to
hear from you.

  
This is a full-time, two-year position with full University of Missouri
benefits funded by a grant from the Knight Foundation. We're a nimble, tightly
knit team that works remotely -- we stay connected via Slack and video chats
-- so you can live where you'd like and work flexible hours.

  
You'll work on DocumentCloud's processing pipeline, which makes searching and
analyzing document collections accessible to journalists, to improve
DocumentCloud's extraction and analysis capabilities. The pipeline consists of
several open source tools wrapped up in our Ruby-based infrastructure (a
Rails-driven API and our CloudCrowd parallel processing toolkit). You'll also
play a key role in developing our production API capabilities, especially
focused around what information we extract for users from documents and how
best to do so.

  
Our ideal candidate would have the following skills and qualities:

  
-- Independent problem-solver who values learning, keeps current on trends, and knows how to pick the right set of tools for a problem.  
-- Able to write clean, well-documented code; you know your way around Git, and your Github account shows activity.  
-- Strong ability to collaborate and communicate with a distributed team.  
-- Ruby and Rails.  
-- Experience with Unix-based systems.  
-- Some knowledge of data science, linguistics, information extraction or search. SOLR experience is a bonus.  
-- An interest in language and data processing.  
-- Knowledge of SQL (Postgres preferred).  
  
You'll join DocumentCloud at a significant time. We're enjoying widespread use
of our platform, and our tools have been used to investigate and publish
stories from the grand jury decision in Ferguson, Missouri, to the Guardian's
NSA spying leaks. We collaborate with organizations such as the Washington
Post, The Associated Press and Mozilla's OpenNews fellows to build better ways
to present the news, and you'll have the chance to be part of the community
exploring this intersection of news, data and technology.

  
To apply, please contact us at [log in to unmask]



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/19970/
To post a new job please visit http://jobs.code4lib.org/