Begin forwarded message:
CERN Fellowships: text mining scientific documents; author
disambiguation in INSPIRE
The CERN Scientific Information Service is looking for two
enthusiastic and motivated developers with experience in text-mining
or digital libraries, to join a dynamic international collaboration
which is building, enhancing and operating the INSPIRE information
service, a digital library which is a key working tool used by 50’000
scientists worldwide in their cutting-edge research in High-Energy
Physics. We have two fellowships: the first for the text mining of
scientific documents, the second for author disambiguation and
What you will do (text mining fellowship):
- Develop and expand our current text-mining of documents to
extract all possible metadata: authors, affiliations, references and
additional scientific content (figures, tables and more). Build
infrastructure to mine in real time, leveraging user feedback, as
scientists share documents, or for bulk mining of large collections of
scanned/OCR’ed historical material.
- Integrate, harmonize and expand all steps in the treatment of
documents upon ingestion in INSPIRE from multiple sources, from
extracting metadata to grabbing figures, from detecting similarities
to spotting duplication.
- Explore opportunities in the extraction of the contextual
information provided by the location of references, figures and tables
in scientific texts.
What you will do (author disambiguation and management fellowship):
- Expand and develop our author disambiguation and
profile-claiming production infrastructure, with the aim to
automatically associate every newly accessed document to the correct
- Extend our author-article algorithmic and crowd-sourced tools
to provide assertions about the academic affiliation of scientists
- Assure seamless interoperability and bulk-data exchange with
other relevant partners such as NASA-ADS, arXiv.org, ORCID and leading
publishers in Physics.
Other things you will do (for both fellowships):
- According to your inclination and abilities, help out on other
projects, such as crowdsourcing aspects of digital library curation,
integrating our services with other data sources via linked open data,
UI/UX design, operations of production and mining of usage data.
- We require limited participation in stand-by duty for
hot-fixes in the operation of the INSPIRE web service on evenings,
weekends and public holidays.
- You are a citizen of one of the CERN Member states: Austria,
Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France,
Germany, Greece, Hungary, Italy, Netherlands, Norway, Poland,
Portugal, Slovakia, Spain, Sweden, Switzerland and the United Kingdom.
Citizens from Romania can now also apply.
- You hold a BSc, MSc or PhD in Computer Science and have less
than 10 years professional experience after your highest diploma.
- You understand how scientists communicate and have either a
proven track record in handling or mining technical or academic
documents, or an experience in author disambiguation in a large-scale
- You have a solid experience in developing in a LAMP (Linux,
Apache, MySQL, Python) stack, preferably in open source projects,
using git or similar DVCS, and desirably in a production environment.
- Familiarity with issues and standards in information systems
are an asset: XML, XSLT, RSS, OAI-PMH
Who we are:
CERN is the world leading laboratory in High-Energy Physics, home to
the record-smashing LHC accelerator. Together with partners at
SLAC/Stanford, Fermilab and DESY/Hamburg, The CERN Scientific
Information Service and IT teams are building INSPIRE: a digital
library serving 1 million records to 50’000 scientists in the field
worldwide, which is in beta at http://inspirebeta.net. We collaborate
closely with sister infrastructures arXiv at Cornell and the NASA/ADS
at Harvard, as well as leading publishers in the field. We are
founding members of the ORCID initiative, and stalwarts of Open Access
through a myriad projects and initiatives.
What we offer:
- Contract duration: One year, which might be extended for a
second year, conditional to performance. Further extension up to a
maximum of three years can be granted under some circumstances.
- Financial conditions: Fellows stipends are competitive and
calculated individually according to age and qualifications, in the
range 55’000-85’000 CHF per annum, net. Fellow are entitled to
additional family and child allowances. International civil servants
in the area are allowed to purchase discounted tax-free vehicles.
- Leave: Fellows are entitled to 2.5 days paid leave per month,
plus two weeks at Christmas and a few other local holidays.
- Insurance: Fellows are covered by CERN’s comprehensive health
scheme for themselves and their dependents.
- Travel expenses: Fellows are entitled to travel expenses for
themselves and their family and may be entitled to an installation
grant. We also offer help with finding suitable accommodation.
How to apply:
Create an account and submit a complete electronic application form at
http://bit.ly/oDhSRq , containing your Curriculum Vitae, photocopy of
the last (highest) qualification, a short (half page) description of
your motivation for coming to CERN and work with INSPIRE, and the
names of three referees who will provide us with letters of
recommendation. It is your responsibility to arrange for these
letters. Please indicate “INSPIRE” in the field “Miscellaneous
information: Please give details of the work you are interested in
doing at CERN”. NOTE that we will not be able to process your
In parallel, it is indispensable that you also send us a copy of your
CV at [log in to unmask]
Irrespective of deadlines indicate on the application web page, the
application and ALL supporting documents should reach us BEFORE August
10th, 2011. Retained candidates will be interviewed remotely in the
second half of August. The two successful candidates will start on
Built on the CERN Open Source Invenio digital library software, and
hosting 1 million records hand-curated over 40 years by partners at
SLAC/Stanford, Fermilab and DESY/Hamburg, INSPIRE serves 1 million
records to 50’000 High-Energy Physics researchers worldwide. INSPIRE,
in beta at http://inspirebeta.net, provides fast metadata and
full-text searches, author disambiguation, citation analysis, and is
expanding its content and services in a community-centric approach, in
addition to journal publications and other scientific contents. We
anticipate users will soon be submitting scientific documents, and
large scale recovery of historical OCR’ed material will take place,
with hundreds of thousands of documents from 3 to 300 pages long,
which will have to be mined for automatic generation of metadata.
Further, we will explore and expand initiatives for figures and tables
extraction from the text, as well as contextual information on
Further information about the CERN fellowship program is available at
Further information about the position can be obtained by writing to
[log in to unmask]