sh0dan // VoxPod

Monday, May 25, 2009

Wikipedia for mDict - May 2009

Here are the updated Wikipedia for mDict.

Languages:
  • English
  • German
  • Spanish
  • Portuguese
  • Russian
  • Chinese
  • Japanese
  • Polish
  • Italian
  • Danish (direct download below)
Download from LegalTorrents.

Changes:
  • Title added to all pages.
  • Spanish version added.
  • Chinese version added.
  • English "Jumbo" edition added, with 2.3 million full articles.

Since the Danish wikipedia is so small, here is a direct download link for the Danish Wikipedia.

If you do not have access to BitTorrent download, here is a mini version of the English Wikipedia (155.000 articles, very abbreviated).

If you are interested in helping doing further updates, please contact me.

22 Comments:

  • I would like to see a Swedish version. Is that possible and what do i have to do? Step by step instructions would be great!

    By Anonymous Niklas, at 8:50 pm  

  • I've created a copy of the greek Wikipedia (dated 12-Jul-09). It has all 46202 articles and it's only 62MB.

    http://rapidshare.com/files/256984374/el-wiki-full-20090712.rar

    Thanks for your software, my friend!!!

    By Anonymous NiTroGen, at 5:49 pm  

  • Swedish wikipedia dumped by me.
    ~300 000 articles. 204MB
    http://downloads.hemkoll.nu/sv-wikipedia-20090712.mdx

    Converting 20090725 right now.
    I see many "skipped" articles. Why? I have set --minlinks 0

    By Anonymous Niklas, at 5:23 pm  

  • Swedish dump 2009-07-25: http://download.hemkoll.nu/sv-wikipedia-20090725.mdx 209MB

    By Anonymous Niklas, at 12:41 am  

  • @Niklas: Glas to hear it worked out! "Articles" that has a colon in their name are automatically skipped, as they do not contain anything useful. That is stuff like images, categories and similar autogenerated stuff.

    By Blogger Klaus Post, at 7:17 pm  

  • Thanks Klaus. Could you please post the lincense that wikipedia uses in html-format. I see that you use that in the dictionary information. Im a bit confused. It seems like Wikipedia uses two different licenses now. I know that they are changing the license but it seems like the old and new are used at the same time now? Best regards!

    By Anonymous Niklas (niklas@hemkoll.nu), at 1:17 pm  

  • Thanks very much for this! I grabbed the small English one, the Japanese and the Chinese versions. However, the jumbo one, I only seem to be able to get 58% and it's been stuck there for a week now. Any suggestions?

    By Blogger Ben, at 2:22 pm  

  • @Ben: You might want to try again, possibly using another client. There are usually about 10 seeds up - myself included, which should make it possible for you to download it all.

    @Niklas: Use the PC version of the mDict client, open a dictionary and RightClick+View Source.

    By Blogger Klaus Post, at 2:30 pm  

  • I have tried the "jumbo" version on my pda (XDA Zest) using MDict 3.0 and I get the error: "Open dictionary failed". Has anyone had this working using the PPC WM2003+ version of MDict? I'm pretty sure the dictionary file is intact and readable. Perhaps the index is too large for a 128MB device?

    By Anonymous Adam, at 9:28 pm  

  • Hi Klaus.
    The "Wikipedia for mDict - May 2009" in Japanese link seem to die. Could you fix it and it's so kind if you send new link to me by mail. adr: sieucrazy@gmail.com. I desire it for my study. Thank!

    By Blogger Tron, at 5:39 pm  

  • @Tron: I just rechecked the link, and it works fine. I'm also seeding it myself, so it shouldn't be a problem to download it. Are you sure your Torrent program i working correctly?

    By Blogger Klaus Post, at 6:33 pm  

  • Klaus,

    Sorry to my bad english... i'm brazilian. I have the version portuguese - May/2009.

    When you will make a new version of the wikipedia in mdx?

    Thanks,

    Hugo

    hugo.ferreira@ibest.com.br

    By Anonymous Anonymous, at 12:31 pm  

  • I expect to create a new version when a 2010 dump has been created. Wikipedia is still doing the dump they started the 1st of december. So with a month or two the next should be ready.

    By Blogger Klaus Post, at 12:36 pm  

  • Looking forward to a new version !
    Any chance of a version with illustrations? I wouldn't mind using an SD card just for Wikipedia if it came to it personally.

    By Anonymous Anonymous, at 6:29 am  

  • Hi Klaus. Any news about the new 2010 version of wikipedia - mdx?

    Grateful

    Hugo

    By Anonymous Anonymous, at 4:32 am  

  • Great work Sh0dan!

    I also would like to participate in this work..

    btw how did you manage to create an MDX file larger than 2 GB, (I have seen this in your legaltorrent Wikipedia English Jumbo 2.7 GB)

    If I'm not mistaken, you've said that (in your other blog post) that the source size for MDX builder is just limited to 2GB

    Thanks,

    By Anonymous Dre, at 6:53 am  

  • Hello Kaus,

    How are you?
    I wonder if you will still make the new version of Wikipedia (MDX) in Portuguese. We of the Brazilian community, we eagerly await the release of this new version. We use the very wikipedia offline to study in our schools because we have no connection to the Internet and libraries have few books.

    Access your weekly blog hoping to find a link to download the 2010 version of wikipedia mdx in portuguese.

    For further information, hopeful look in the e-mail: hugof@hospitalalianca.com.br.

    Sorry for my bad English.

    Grateful,

    Hugo Carneiro

    By Anonymous Anonymous, at 3:23 pm  

  • Hi ShOdan,

    Really appreiciate your work. Thanks.

    Have a Windows PPC (Xperia X1) where it all seems to work fine. Don't intend to use Mdict (ver 3.1) on this phone (various reasons) and was only for testing/troubleshooting purposes.

    Actually intend to use it on a Smartphone with Winmo6.5 (Samsung Omnia Pro B7320). Am using MDict 3.2 (WinMo 6.5 compliant) and pretty much like Adam, I get the following message:

    "Open dictionary failed: \Storage Card \en-wiki-jumbo.mdx, Fail to read file"

    Both memory cards are 4GB SD HC cards with FAT32 file system formatting.

    Now since it is the same file copied to both cards, there should not be a problem with the file itself.

    Please advise on how to resolve the issue. (on the Smartphone)

    Thanks

    (in case anyone else wants to suggest what to do pls mail at got2log@ gmail )

    By Blogger g, at 11:54 pm  

  • Thanks for sharing the source code. An update, based on your work, for October 2010 version is available in: http://ahuv.net/wikipedia

    Links to torrents:

    English: 3,483,000 items - Here
    Spanish: 1,361,000 items - Here
    French: 1,239,000 items - Here
    German: 1,033,000 items - Here
    Russian: 720,000 items - Here
    Portuguese: 681,000 items - Here
    Hebrew: 111,000 items
    Arabic: 183,000 items
    Persian: 109,000 items

    By Anonymous Or, at 10:40 am  

  • hello,
    thanks, and I want to ask befor downloading:
    1-is the articlesin medium en-wiki
    are in complete length ?

    2-also in en- medium wikipedia ,
    On what basis was the selection of articles?
    I hope that is not randomly!

    please tell me.

    By Blogger muhajer, at 8:31 pm  

  • muhajer:
    1) No - the complete version is too large to be possible. It will not fit within the 4GB single file size limit of a file on your storage.

    2) Articles are mainly selected by a google-esque algorithm, and they are ranked by the number of articles linking TO the page, and also a minimum article size.

    By Blogger Klaus Post, at 7:28 am  

  • Klaus


    HELLO,

    _I AM MUHAJER BUT I FORGOT MY PASSPORT FOR THAT I COMMET AS ANONYMOUS_

    THANK YOU FOR REPLY,

    IS IT IN 5000 BYTES PER ARTICLE

    IT IS VERY SMALL
    WHILE
    MEDIUM ONE IS 1\4 OF THE BIG ONE
    AND
    IT IS 1\4 OF THE BIG ONE SIZE

    FOR THAT I THINK IT MUST BE IN COMPLETE ARTICLE SIZE NOT IN JUST 5000 BYTES (1000 WORDS) PER ARTICLE!!

    WHER THE SIZE (800 MB) GOES THEN ?!

    I MEAN THAT EVERY ARTICLES (IN MEDIUM ONE) IN ITS COMPLETE LENGTH,



    REGARDS

    By Anonymous Anonymous, at 9:29 pm  

Post a Comment

<< Home