sh0dan // VoxPod

Monday, May 19, 2008

Java revisited

My first "real" language I programmed in was Java, back in the 1.0 to 1.2 days. I've since spent a lot of time with C++ and similar languages, but I sometimes revisit Java, when I feel I have a task that would be better suited for a lighter language.

I've recently gotten a new Windows Mobile phone, and I've previously written about development for it. This time, my personal challenge wasn't development for the platform, but rather attempting to create dictionaries for an application called mDict, that is free dictionary software for Windows Mobile.

I found a 5 year old dump of wikipedia for it, and it seemed to work quite well, so my challenge was to create en updated version of the wikipedia database. The XML of the database is dumped into a bzip2 file, that can be downloaded from the site. But the material still had to be transformed from wiki syntax to a simple html representation. So my choice was Java, since it has very strong general text support.

Netbeans IDE
I've previously used the Netbeans IDE, even back when it was called Forte. While it has always been quite good, Visual Studio has been my preference, since it is so much more responsive. But it seems like time has really been kind to Netbeans - it is very responsive, and pure joy to use. On a Core 2 with 2GB RAM, it is always responsive, and you get a massive amount of extra features. Code completion, that actually works, dynamic compilation, automatic imports, refactoring tools, automatic interface implementation - and that's just the features for writing the code - there seems to be literally no end to the niceties.

Enough with the endless praise - bottom line is, that you can really feel just how little has actually been done to the C++ side of MS Visual Studio. I haven't done any GUI work for this project, but last time I played with it, it was quite a nice experience. Again, miles ahead of Visual C++.

The DTD to SAX(2) code wizard also deserves a mention here - that's just pure brilliance. Although I cannot understand why is hasn't been updated to XSD support yet.

Java itself
It is quite an ordinary statement that Java is bloated, and in some ways it is - it includes library packages for everything, but the kitchen sink. On the other hand, it is very nice to have (nearly) rock solid libraries, without having to search the net for them. It does make it somewhat difficult to find the right things, but on the other hand, you always have to know the capabilities of your language to avoid re-implementing features that is already present in the libraries.

I must say that I didn't miss the .cpp/.h way of separating code. The nature of bytecode, vs. compiled code and headers makes things a lot easier, since you don't have to battle all the linker issues that inevitably pops up when you use external code. For this project, the only external code I had to use was a BZIP2 library, which plugged right in without any issues. For C++ I would also have had to use external libraries for SAX parsing and regular expressions, so I a lot of pain was also spared there.

At some point in development I figured that I might as well make my application multithreaded, and wow - how incredible easy that is - at least compared to C++. The built in method spinlock functionality makes it a breeze. I had completely forgotten how easy that is. Obviously applications still have to be designed for multithreading, but adding job/worker functionality is just so easy.

At a smaller note, I find it very nice to find that Java now has type safe containers. After learning about them in C++, I couldn't really figure out why they weren't in Java, but it seems like they have been incorporated at last. Typecasting to Object and back, seems daft.

Regular Expressions
I've previously bashed some of my friends and colleagues for using regular expressions, since they often seem like an overly complicated way of performing an otherwise simple task. Since they are an essential part I feel they deserve a special mentioning. I must say that they made my code a ton simpler, and for that they deserve praise - but honestly - writing them is torture

One significant shortcoming I've found is that they stink at recursive material matching. If you want to use it for matching material, where you need to find how to match ((exactly where ((the)) outtermost bracketpair)) ends)) you are into some heavy reg-exing. Futhermore, once I've got it running, it would occationally crash with a stack overflow on large texts. Nice.

But it is still the right tool for some jobs, but I cannot help feeling that there has to be a better way, to avoid the horrible unreadable syntax. Write once - debug forever. ;)


All in all, revisiting Java is has been a great experience. I hope to publish the results soon. What I hope will be the final revision of the wikipedia conversion is indexing and compressing right now.