Web Archive Engineer - 60432
This position is double-posted at the 4P3 and 4P4 levels.
This is a four-year fixed-term position with the possibility of an extension.
University Libraries (SUL) is seeking a talented software engineer to
support the Web Archiving Service. This is a four year fixed-term
position with the possibility of an extension.
position is a key element in the implementation and ongoing support of
SUL's Web Archiving Service. The Service will enable the archiving of
web content into the Stanford Digital Repository (SDR) on behalf of
Stanford librarians, faculty, and researchers and in support of the
University's needs for research, teaching, library collection building,
and regulatory compliance.
Web Archiving Engineer will primarily develop and maintain software to
facilitate web archiving workflows and use cases: harvesting, data
management, quality assurance, discovery, indexing, access and analysis.
This will entail deployment, local optimization and possible
enhancement of community-developed open source web archiving tools and
to the Manager for Application Development and working closely with the
Web Archiving Service Manager, the successful candidate will be
responsible for developing, configuring and/or managing web archiving
systems and related digital library components; pioneering tools and
techniques for the collection, replay and preservation of the next
generation of web technologies; troubleshooting and resolving technical
issues related to Service operation; and streamlining the processing of
archived web content through the entire lifecycle.
Primary Responsibilities: Systems Analysis, Architecture Design, Implementation and Administration (50%) Provide
technical analysis and software engineering support for web archiving
and related digital preservation activities at SUL. Install, configure
and manage Heritrix, Wayback Machine and other components necessary to
build an end-to-end service. Streamline the ingest of harvested and
other target content and associated metadata into repository, discovery
and access environments.
Operational Support (25%) Collaborate
with the Web Archiving Service Manager to troubleshoot and resolve
technical issues affecting harvest, replay and web archiving workflows.
Generate Wayback Machine and Lucene indexes to enable web archive
replay, full-text searching and metadata analysis.
Harvest Engineering (15%) Develop
tools and techniques to enable archival capture and replay of rich
media, streaming content, social media as well as traditional web page
content. Administer web crawls to maximize data capture quality and
efficient use of limited resources.
Community Engagement (10%) Play
an active role in the cultural heritage web archiving community. Stay
abreast of evolving best practices and tools for web archiving and make
appropriate recommendations for local service enhancement.
- Demonstrated expertise with Ruby and Ruby on Rails application development.
- Demonstrated expertise deploying, configuring and managing Apache HTTP Server and Apache Tomcat.
- Demonstrated expertise with Unix/Linux and command-line utilities, such as awk, find, and grep.
- Demonstrated expertise with XML and XSLT.
experience with relational database design and management, including
implementing database applications for MySQL, Oracle or PostgreSQL.
learner. Adept at quickly learning new scripting and programming
languages and making sense of unfamiliar architectures and application
ability to write solid, simple, elegant code both independently and in a
team-programming environment and within schedule limitations.
ability to work collaboratively with multiple levels of staff and
colleagues at peer institutions and within the open source community on
projects from specification to launch. Excellent verbal and written
ability to apply best practices to technical projects, especially
test-first development and automated testing. Must also make effective
use of team collaboration tools, build management and version control
experience providing ongoing support for technical services, including
experience monitoring and managing a solution.
- Four-year college degree or equivalent, with five to seven years of demonstrated experience.
- At the 4P4 level, four-year college degree or equivalent, with more than seven years of demonstrated experience.
- Demonstrated knowledge of web archiving tools, techniques, issues and trends.
- Demonstrated expertise with Lucene/Solr.
- Demonstrated expertise with distributed computing technologies, such as Hadoop, HBase and Pig.
- Demonstrated experience with file characterization tools, such as JHOVE, FITS, DROID and Apache Tika.
- Demonstrated experience with library-related metadata and metadata standards, particularly DC, MODS, MARC, METS and EAD.
success participating in community-based open source projects,
especially those relevant to SUL's Digital Library architecture, such as
Fedora, Blacklight, Solr or Hydra.
experience with library applications and technology, especially
experience participating in relevant library open source efforts.
- Demonstrated experience working in an academic and/or library environment.
- Masterís degree in Computer Science, Information Science or related field.
Manager, Application Development
Digital Library Systems & Services
Stanford University Library