Print

Print


**Job Summary**  
  
We are looking for an exceptional search engineer with battle-proven
experience in developing and deploying Lucene/Solr based search applications.
You will be given the rare opportunity to lead the redesign of the search
infrastructure that powers Wikipedia, and all projects run by the Wikimedia
Foundation.

  
  
**The Challenge**  
  
Currently, the search infrastructure powers about 1400 requests per second and
consists of a replicated server farm of 50 nodes in two datacenters that
provides search for over 800 wikis, including all of Wikipedia. Our current
codebase is in dire straits, it was developed around Lucene 2.3 and is plagued
by bitrot.

  
Some of the challenges that you will have to solve include:

  

  * Develop a near-realtime indexer
  * Develop a sharding strategy for the indexes
  * Develop a suite of precision/recall and performance benchmarks
  * Develop a method by which the Wikipedia community can help improve the quality of the search index
  * Develop solutions for long-standing feature / bug requests from the Wikimedia communities, including:
 - Index transcluded wikitext

 - Better tokenization of wikitext

 - Multi-language search support

 - Other relevant search bugs

  
  
**Your Background**  
  

  * You have multiple large-scale (>1M documents) Solr deployments under your belt and have experience with indexing non-latin based alphabets
  * Unicode does not intimidate you. We are not just looking for experience
  * We also want somebody who sees improving search as an important step to better support both our editors and readers on all of the Wikimedia projects
  * You obviously are very comfortable with Java, Maven and have an intimate knowledge of the Lucene and Solr libraries
  * Preferably, you have a formal education in computer science with a specialization in information retrieval and you speak one or more languages besides English
  * Experience with MediaWiki and the Wikipedia community in general is also a big plus
  * You are passionate about the free culture movement and know how to get your point across in a consensus-based environment
  
**About the Wikimedia Foundation**  
  
The Wikimedia Foundation is the non-profit organization that operates
Wikipedia, the free encyclopedia. According to comScore Media Metrix,
Wikipedia and the other projects operated by the Wikimedia Foundation receive
more than 482 million unique visitors per month, making them the 5th most
popular web property worldwide. Available in more than 270 languages,
Wikipedia contains more than 21 million articles contributed by a global
volunteer community of more than 100,000 people. Based in San Francisco,
California, the Wikimedia Foundation is an audited, 501(c)(3) charity that is
funded primarily through donations and grants. The Wikimedia Foundation was
created in 2003 to manage the operation of Wikipedia and its sister projects.
It currently employs 78 staff members. Wikimedia is supported by local chapter
organizations in 31 countries or regions.

  

[http://blog.wikimedia.org](http://blog.wikimedia.org)



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/5290/