The successful candidate will work closely with LOCKSS Program technical staff
to analyze publisher Web sites and build Web crawler plugins to process the
content for preservation. A LOCKSS plugin is specific to each publishing
platform and determines what content will be collected and preserved. Work
will be reviewed before being committed to production.
1. Analyze publisher Web sites, and their hierarchy, URL structure, layout,
2. Implement crawling strategies as LOCKSS plugins and perform quality
assurance on them.
3. Improve on existing LOCKSS plugins by upgrading them to our current best
practices, refining them to account for incremental changes to publisher Web
sites and addressing bug reports from end users.
Describe the technical or business knowledge required to complete the job's
• Java Programming - One-year experience
• Knowledge of XML, HTML/XHTML, CSS
• Knowledge of URL structures
• Knowledge of regular expressions
• Familiarity with a UNIX-based operating system
• Proven inquisitiveness, curiosity, and a quick learner of new tools
• Strong analytical skills for effective problem solving
• Fierce attention to detail
• Understanding of quality control methods, procedures, and guidelines
• Excellent organizational and communication skills
• Ability to work as part of a small team
• Ability to work in a high pressure, large volume production environment and
to meet production standards on time and as specified.
Four-year college degree or equivalent in Computer Science
30 Analyze Publisher Web Sites
10 Talk to publishers and platform vendors
40 Implement crawling strategies as LOCKSS plugins and perform quality
assurance on them
5 Assist Senior Engineers with miscellaneous tasks.
10 Improve and upgrade existing LOCKSS plugins
5 Process content for preservation.
Brought to you by code4lib jobs: http://jobs.code4lib.org/job/8274/