Hi Eric The best place to look is probably http://meta.wikimedia.org/wiki/Alternative_parsers I'm guessing the "non-parser dumper", which uses MediaWiki's internal code to the do rendering, might be the a good choice. regards Dave Pattern University of Huddersfield ________________________________ From: Code for Libraries on behalf of Eric Lease Morgan Sent: Sun 10/09/2006 14:28 To: [log in to unmask] Subject: [CODE4LIB] munging wikimedia How do I go about munging wikimedia content? After realizing that downloadable data dumps of Wikipedia are sorted by language code, I was able to acquire the 1.6 GB compressed data, uncompress it, parse it with Parse::MediaWikiDump, and output things like article title and article text. The text contains all sorts of wikimedia mark-up: [[]], \\, #, ==, *, etc. I suppose someone has already written something that converts this markup into HTML and/or plain text, but I can't find anything. If you were to get the Wikipeda content, cache it locally, index it, and provide access to the index, then how would you deal with the Wiki mark-up? -- Eric Lease Morgan University Libraries of Notre Dame This transmission is confidential and may be legally privileged. If you receive it in error, please notify us immediately by e-mail and remove it from your system. If the content of this e-mail does not relate to the business of the University of Huddersfield, then we do not endorse it and will accept no liability.