sh0dan // VoxPod

Wednesday, September 10, 2008

Wikipedia mDict for Windows Mobile

I'm still having some fun playing with my wikipedia parser I posted below. The main reason why I think this kind of exercise is fun is because of the immense amount of data it contains.

I took the time, and implemented the first two points on my "possible improvements" list below. I knew that a database would be the logical next step, so I fired up my MySQL, and implemented a simple first pass, that would index all links in the wikipedia. The second pass would then use these statistics to select the most relevant articles based on these stats.

So that way I was able to get a much more consistent subset of wikipedia, with inline resolved redirects.

So here is a dump of the most popular wikipedias for mDict, a free dictionary reader that can be used on Windows Mobile and Windows Smartphones :

(Updated May 2009)
Download from LegalTorrents™

Installation:
Copy the MDX file to your SD Card/Internal storage, and select Library/Search All then it should add wikipedia to your library.


Lastest version is for mDict 3.0

Hosting kindly provided by LegalTorrents™. Any donations on the page will be split 15/85 to LegalTorrents and Wikipedia Foundation.

Feel free to comment, if you have specific requests.

26 Comments:

  • I can't seem to get yours to work; to confirm that it wasn't the phone at fault, I downloaded the 2006 dump that's been around and it worked fine. It gives me an error of "can't load mdx: fail to read file."

    By Blogger Brian, at 8:48 pm  

  • This comment has been removed by the author.

    By Blogger Klaus Post, at 10:06 pm  

  • Alright, I'll try downloading it again, thanks for that speedy response. It's weird, the file and your blog both say it should be 684 MB, but I'll see if it works this time. I'll let you know if it works!

    By Blogger Brian, at 11:06 pm  

  • Awesome! It worked this time, especially after double checking against the 717 you gave me. It's so great having this file over the old one that's been floating around for a couple of years. Thanks a lot! I'm a big wikipedia fanatic so it's great that I have all this intense information at my fingertips.

    By Blogger Brian, at 6:43 am  

  • Hi, I've downloaded your dump but encounter following problem: every article (at least in english wiki) ends after first header: eg. article 'Computer' ends with 'History of computing', which means only a little fragment of the page is actually in the dump. I'm using MDict 2.5. Is there some solution for this?

    By Anonymous Mare, at 2:42 pm  

  • @mare: There is a fixed maximum length of all articles, to keep it down in size.

    This is because mDict cannot handle uncompressed inputs that are larger than 2GB. And right now the input is exactly 2GB.

    The other dictionaries have more relaxed maximum size, but very long articles are still truncated.

    By Blogger Klaus Post, at 2:59 pm  

  • klaus... i downloaded the mdx file several times... I still cannot load mdx file in MDict. any suggestions?

    Frank

    By Anonymous Anonymous, at 3:51 pm  

  • I have tested them on mDict 2.5 and 3.0 alpha 2, where they work fine.

    By Blogger Klaus Post, at 10:27 pm  

  • Hi. Thank you for the wikipedia mDcit. I used french database and I noticed dates of birthday or death are not always noted (see albert einstein or nicolas sarkozy). Is it possible to fix this bug for next update?
    djtarek

    By Anonymous Anonymous, at 4:02 pm  

  • HI.I'm a chinese. The software I mean "Mdict" arey very very wonderful.

    But I'm looking for a dictonary Pocket Oxford English dictionary for Mdict (WM version).

    Can you give me a download web?

    thanks!

    By Blogger 朱九渊, at 10:22 pm  

  • Thank you very much, i've tried to make it myself for months without positive results. Could you build wiktionary as well?

    By Anonymous Anonymous, at 12:43 am  

  • many many thanks for your efforts, u prove tht best things in life come without a price tag. one question i wud like to ask u is " how do u do it??" is it possible for a computer layman like me to convert an entire website (my interest being www.emedicine.com) into an .mdx database??

    By Blogger yogi1982, at 8:18 pm  

  • @朱九渊:Try looking at the mDict homepage.

    @Anon: I will have a look. It might be a bug in my software.

    @Anon2: Dictionaries void need completely new conversion routines, which isn't on my to-do list.

    @yogi1982: The conversion software are only for wikipedias based on mediawiki.

    By Blogger Klaus Post, at 8:48 pm  

  • Have your wiki download on a X5 and an X51v(With Lenny's L11 ROM)Worked on both straight out of the box.

    Thanks much
    Rich L

    By Anonymous Anonymous, at 9:15 am  

  • Hi, it would be possible to process wikipedii in Czech?

    Thanks much
    Sid

    By Anonymous Sid, at 12:12 pm  

  • Excellent job, although I think a better encyclopedia for this program (MDict) would be one that could have the introductory paragraph of every wikipedia articles (+1,000,000)into it. So by highlighting on an unknown word and by using the 'word picker' icon that comes with the program, you could have access fast to an extra paragraph or so. just like a lexicon, only better.
    This program is not like Tomeraider where you sit and learn by the hours. You use mdict more like a reference took so to speak.

    By Anonymous KoTso, at 5:46 am  

  • Absolutely capital!
    My GF and I use your english version all the time.

    Is it possible to make a Norwegian and a Swedish Wikipedia MDict version?

    (She is Swedish and I'm Norwegian, so such versions would be a good fact source for info about our non-native country)

    By Blogger Thomas, at 9:57 pm  

  • Is it possible to make a Croatia Wikipedia MDict version?

    By Blogger Drazen, at 11:19 am  

  • What about spanish version?

    By Blogger spicajames, at 7:00 pm  

  • @spicajames: I will include a spanish dump in the next update.

    @KotSo: I don't plan doing different versions. Most eliminated pages are very spcialized, or simply redirects. If you can find someone who has Java skills, he or she might be able to convert wikipedia with different settings.

    By Blogger Klaus Post, at 10:41 pm  

  • Thanks sh0dan. Your choice for the data size was perfect, I think. For average ppc users who use 1-4Gb memory card, around 100Mb is too small and more than 1Gb is too big. 700Mb is not too big and not too small. :) from Korea

    By Anonymous Anonymous, at 9:52 am  

  • I have just downloaded. It works great. It would have been better if the article titles(head words) were displayed at the start of each article contents. You know.. PDA version MDict doesn't show long titles correctly on the input area. Split View don't resolve the problem, either. But thinking that how you have spent your precious time on making this dicitionary, I can only thank you. Keep up the good work sh0dan!

    By Anonymous Anonymous, at 10:50 am  

  • Dear Klaus...first of all GREAT WORK.
    I installed Wikipedia in Italian version but i noticed that in mdx file there are all wikipedia voice but truncated, not complete. Now...is it possible to get a FULL version of wiki-mdx file?
    Best regards.

    By Blogger manowar1978, at 2:55 am  

  • @manowar1978: All wikipedias are truncated somewhat. If you area computer wizz, or know someone who is, you might be able to convert an uncut wikipedia.

    In my next conversion I will probably make the non-english languages slightly larger.

    By Blogger Klaus Post, at 10:48 am  

  • next update ?

    By Blogger mctarek, at 8:41 pm  

  • This particular blog is really awesome as well as informative

    By Anonymous windows mobile, at 12:13 pm  

Post a Comment

<< Home