sh0dan // VoxPod

Thursday, October 16, 2008

MDX Wikiparser 1.0

Here is version 1.0 of my wiki parser. It is still rather rough, but since you can download the most popular wikipedias below, it is only here for reference.

Requires JRE or JDK 1.5 or later to be installed.
Requires MySQL 5.0 or later installed.

Create a new shema in your database called wikindex
or something similar.

To run the project from the command line, go to the dist folder and
type the following:

java -jar "WikiParser.jar" [parameters] "InputFile.xml.bz2" "OutputFile.txt"

I have also included a sample bat file you can use as a basis for your own conversions. Be sure to adjust --databaseurl --user: and --password, if needed.

For fast indexing, you should have your database on a ramdisk, or an SSD, if you have one. You can find a good free ramdisk for 2000, XP and Vista here. 500MB should be enough for the english wikipedia.

Download WikiParser v1.0.