We are looking for someone to join our technical team at NICTA to work on our distributed search system, the Lens. The Lens provides a free 'Innovation Cartography' service to the general public, allowing related innovation data to be discovered and shared easily by anyone. Over the coming year we will be adding many terabytes of new data, including global patents, scientific literature, business and legal information. Your initial work in this role will focus on improving and replacing our current data transform and import systems. These systems must accommodate a variety of different data formats and also be fast enough to process very large data sets. You will be a vital factor in our future capability to provide massive amounts of data for public use. Although your initial role is concentrated on data processing and importing, you will also be an integral part of the wider development team. We share tasks and knowledge around the team so you'll get to work with others and on other parts of the system. You'll be encouraged to learn, innovate and continually make the high performance, distributed systems which make up the Lens even more awesome. **Your responsibilities:** * Develop high performance, deployable software for importing massive textual data sets * Develop high performance, deployable software for transforming those data sets between different data formats and making those data sets accessible for high-use, public-access data services. **You will need the following in your skills toolkit:** * XML parsing techniques and frameworks - StAX, SAX, DOM. * Java and in particular concurrent/multi-core processing. * Experience in distributed systems, from design through to deployment/administration. * Robust scripting - especially using bash, ruby or python. * Good knowledge of Linux development/administration - including utilities such ssh, scp, rsync, grep, find, tar, awk etc * Specific technologies: * Lucene and/or Solr * Amazon EC2 * Tomcat * MySQL * Cutting edge OCR at large scale (highly regarded) * Patent data knowledge (highly regarded) **About The Lens: Cambia and NICTA** _Our goal is to greatly enhance the public good by creating an open and inclusive innovation system, which melds many disparate information sources, dramatically expanding the availability and discoverability of human knowledge. We think of it as 'Innovation Cartography', maps which allow us to discover otherwise unreachable knowledge._ _Working on the Lens is a Lifestyle choice. Flexible work hours and a casual, friendly environment in exchange for passion and dedication. Our team is small, highly productive, open- minded about solutions and focused on delivering high-impact public goods._ **NICTA (National ICT Australia Ltd)** is Australia's Information and Communications Technology Research Centre of Excellence. NICTA develops technologies that generate economic, social and environmental benefits for Australia. NICTA collaborates with industry on joint projects, creates new companies, and provides new talent to the ICT sector through a NICTA-enhanced PhD program. With four laboratories around Australia and over 700 people, NICTA is the largest organisation in Australia dedicated to ICT research. NICTA is funded by the Australian Government through the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence Program. NICTA is also funded and supported by the Australian Capital Territory, the New South Wales and Victorian Governments, the Australian National University, the University of New South Wales, the University of Melbourne, the University of Queensland, the University of Sydney, Griffith University, Queensland University of Technology and Monash University. **Cambia** is a globally prominent not-for-profit social enterprise and the leading provider of free patent and intellectual property search and analysis. Cambia's mission is the democratization of problem solving using science and technology. Cambia is the founding partner of the Lens, together with NICTA (National ICT Australia) and QUT (Queensland University of Technology). Brought to you by code4lib jobs: http://jobs.code4lib.org/job/6410/