sh0dan // VoxPod

Wednesday, November 17, 2010

Updated Wikipedia mDict conversions

As you may have noticed, I have not spent any time on the wikipedia mDict conversions, since I'm very busy with Rawstudio. A very friendly user called "Or" has converted the lastest wikipedia dumps. Here is his message:

---

Thanks for sharing the source code. An update, based on your work, for October 2010 version is available in: http://ahuv.net/wikipedia


Links to torrents:

English: 3,483,000 items - Here
Spanish: 1,361,000 items - Here
French: 1,239,000 items - Here
German: 1,033,000 items - Here
Russian: 720,000 items - Here
Portuguese: 681,000 items - Here
Hebrew: 111,000 items
Arabic: 183,000 items
Persian: 109,000 items

Wednesday, March 31, 2010

Rawstudio 2.0 design goals

I have written a blog post on rawstudio.org on the Rawstudio 2.0 design goals.

Click here to read it.

Tuesday, October 13, 2009

Intrinsics in GCC

I have recently done quite a lot of assembler for Rawstudio, and when I found out that GCC also has support for SSE intrinsics, I finally set out to learn how to use them.

I had done quite a lot of inline assembler using GCC's AT&T syntax, and it works ok, though the syntax is pretty horrible. It also has some serious input restrictions, with only 5 general purpose registers available, when you do x86-32 versions. This wouldn't normally be a problem, but when you do simultaneous 32 and 64 bit versions you don't know the size of a pointer, so passing an array of pointers becomes very tedious.

So for the Rawstudio vertical resampler, I decided to take the plunge and look into assembler intrinsics. The first version was just to learn the basic syntax, and involved a rather naiive int -> float -> int conversion. The generated assembler on x86-64 was decent, and matched the reference (integer) performance. The second version was strictly integer, with 24 elements per pixel, and it far outperformed the C implementation.

One specific issue I encountered was a problem with doing SSE2 operation on 16 bit unsigned data, since there is no way of multiplying anything with more precision than 16 bit _signed_ data, but I will touch on that in a separate post.

Back to intrinsics, and I must say, that even though I have been very sceptic about it as a concept, I must admit, that it allows for much greater complexities, with very little efford. While you have to let go of the exact assembler generation, it does make C/C++ integration much easier, and you spend a lot less time chasing pointer errors and doing tedious loop code.

My next project was a much more ambitious DNG Color Profile processor. It involves an RGB -> HSV conversion, applying a trilinear interpolated 3D lookup table to the HSV data, processing Whitebalance, Exposure, Hue and Saturation, HSV -> RGB, so a very complex task. The reference implementation was completely done in float, so for starters, I thought I'd do the same.

The implementation processes four pixels in parallel, using one XMM register for each component. This proved to work very well, since you get both the advantages of planar (doing the same operations on all 4 components at the same time), and interleaved processing (have all components 'nearby').

I did however notice a few gotcha's:

1) Use _mm_set_X(a,b,c,d) sparingly.
GCC tends to use a "movss" combined with "pshuf" if a = b = c = d, and a combination of "mov" + unpack if they are not. If you are using contants, write them an an aligned variable and use _mm_load_X(ptr) instead, that has a much shorter dependency chain.

The only case where I found _mm_set to be faster was to transfer lookup values to xmm registers.

2) GCC intrinsics on i386.
A rather silly thing about intrinsics in GCC is that they require the "-msse2" switch to be present when compiling on i386 machines. The problem with this is that this switch also allows GCC to emit SSE2 code from ordinary C code, which will obviously crash on non SSE2 capable machines. My good friend Anders suggested that we should put the SSE2 code in a separate C-file and link them together. While this workaround should be able to do it, it seems quite silly that you cannot do runtime detection of SSE2, and just go from there.

3) Debugging intrinsics
Coming from Visual Studio, debugging in GDB is a real pain in the ***. Futhermore it's support for intrinsics, or any assembler for that matter, is virtually non-existing. Breakpoints on intrinsics are largely ignored, you get no intrinsic name -> register map, etc. I had to re-sort to using printf's most of the time, though that was actually quite a bit easier in intrinsics, compared to inline assembler.

The generated 64 bit code looks quite nice, with good instruction pairing - the irony being that the only processor that doesn't operate out-of-order is the Intel Atom, which doesn't run 64 bit code.

Other than that, the 32 bit SSE2 code obviously look hideous, with frequent overflows to the stack, but to be honest the code wasn't designed for 8 XMM registers, so that's to be expected.

In the end, the assembler ended up at about twice the speed of regular C-kode - the rest is probably mostly because of the large number of table lookups, that doesn't get faster by doing SSE. I can't really see how I could have done this assembler in this time without intrinsics, because the sheer complexity.

Sunday, May 31, 2009

MDX WikiParser v1.1

Here is a revised version of the Wikipedia to mDict converter.

Download: Wikiparser v1.1.

Changes from v1.0 to 1.1:
  • New DB design. Faster, takes a bit more space.
  • Better CPU scaling.
  • Fixed locked schema name - now you truly can call your DB shema what you like.
  • Fixed table formatting bug.
  • Result is 200% faster indexing and 50% faster processing on Quad-Core.

Monday, May 25, 2009

Wikipedia for mDict - May 2009

Here are the updated Wikipedia for mDict.

Languages:
  • English
  • German
  • Spanish
  • Portuguese
  • Russian
  • Chinese
  • Japanese
  • Polish
  • Italian
  • Danish (direct download below)
Download from LegalTorrents.

Changes:
  • Title added to all pages.
  • Spanish version added.
  • Chinese version added.
  • English "Jumbo" edition added, with 2.3 million full articles.

Since the Danish wikipedia is so small, here is a direct download link for the Danish Wikipedia.

If you do not have access to BitTorrent download, here is a mini version of the English Wikipedia (155.000 articles, very abbreviated).

If you are interested in helping doing further updates, please contact me.