alan little’s weblog

on languages and languages

3rd August 2005 permanent link

Not much blogging lately, because I’ve been spending the time I would otherwise have spent writing (a) reading the utterly amazing Shantaram – more on that later, maybe – and (b) looking, again, for a web framework I can use for personal projects.

Regarding the web framework: because this is for something to use in my severely limited spare time, I want something that's quick to learn, and written in a language that’s powerful, elegant and enjoyable to use.

The language criterion immediately rules out C#/.Net and anything using java or perl. That would seem to leave python, ruby, smalltalk or lisp as the ovious language choices. Lisp appears, based on brief googling, to lack decent cross platform open source implementations; it’s a language I'd be interested in looking at one day if I ever have time, but not for now.

So what are the options in python, ruby & smalltalk?

Looking at python first, because I already have some time invested in learning the language and I like it … learnable in limited time immediately strikes Zope off the shortlist. Others? Too many. Although I’m doing this for personal use, if I’m putting my scarce and valuable time into learning something, I would like it to have reasonable-looking prospects of future support & development and maybe long term commercial potential. I briefly experimented with webware, which certainly appears to work, and cherrypy which I thought looked elegant and quite appealing, but neither of them looks very convincing to me in terms of momentum & long term prospects. The latest buzz thang in the python web world is django. Django looks like it could be interesting if the immediate pre-launch buzz actually does translate into longer term momentum, but it has one absolutely crippling flaw that disqualifies it from any serious consideration until they fix it, because …

… my other absolute must criterion, that the boring clumsy restrictive cobols-for-the-21st-century (Java, C#) do seem to get right and the fun, powerful, elegant open source languages by and large don’t, is the ability to work in a sensible way with text in languages other than English. Some of the projects I have in mind involve working with text in Russian and Sanskrit; and one thing I’ve learned from six years working in commercial IT in Germany is that the first thing you do to any prospective vendor’s software is throw a few umlauts in, then laugh and send them away when it immediately collapses in a heap.

The authors of django don’t initially seem to have given much thought to

what every working programmer should know. All that stuff about "plain text = ascii = characters are 8 bits" is not only wrong, it's hopelessly wrong, and if you're still programming that way, you're not much better than a medical doctor who doesn't believe in germs.

… although at least they now realise they have a serious problem with inability to handle non-ASCII data. Ian Bicking, too, has been getting frustrated lately with problems with python’s silly default string encoding and the way setting it to something sensible instead is made deliberately difficult and obscure. Although in my experience, once you have set it to something sensible – utf8 instead of ascii – assuming you have the access rights to do so in the python installation you happen to be using – then text handling in python is reasonably trouble-free.

UPDATE: Martijn Faassen points out, as I did in Ian’s comments, that changing the default encoding in your own python setup, and then coding happily along on the basis that the default encoding is something sensible, will cause your code to break when other people with stupid default encodings try to use it. And Glyph Lefkowitz in an excellent article tells us why we should all get used to the idea of explicitly handling Unicode and abandon the childish assumption that 8-bit byte strings are of any use whatsoever for representing text. He’s right, but that doesn’t mean lots of people will pay attention and do as he says.

One of the things that deterred me from taking a closer look at Ruby on Rails when I first heard about it, is that ruby’s handling of unicode appears to be significantly inferior to python’s. This isn’t just my impression – a search on comp.lang.ruby soon reveals lots of ruby-knowledgeable people saying it too.

With ruby, though, it isn’t just a case of American programmers forgetting to think about the other 95% of the world. It’s far more interesting than that – the Perfect as the enemy of the Good. Ruby’s author, Yukihiro “Matz” Matsumoto, being a very smart Japanese programmer, is probably more aware of and better informed about internationalisation and character set issues than almost anybody. The problem is that being both very well-informed and Japanese, he is acutely aware of Unicode’s faults and failings and possibly shares the commonly held Japanese view of it as a grossly inadequate Eurocentric racist kludge. (From what I understand, the early versions were grossly inadequate for Chinese & Kanji; that was fixed to some extent in later versions but some serious problems, plus probably quite a bit of lingering mistrust caused by the early versions, still remain).

Matz apparently has ideas & plans, possibly even a prototype, of a Grand Unified Solution To Everything in the field of text handling that will be vastly superior to Unicode. Sadly, though, it Isn’t Ready Yet; and until it is, current production versions of ruby are left to struggle on with a half-baked stopgap implementation of utf8. Which is kind of sad for what otherwise looks like a very nice language.

And Smalltalk? I did a Smalltalk project years ago, and I remember the language being very nice to develop in but impossibly slow at runtime on circa-1990 PCs. I assume Moore’s Law has long since fixed that. Python was the first OO language I worked with since then that was sufficiently nice in its own right that I didn’t resent it for the crime of Not Being Smalltallk. Smalltalk these days has a reputedly good cross platform open source implementation in the form of Squeak, and a strong contender for the current title of Most Interesting Web Framework in the form of Seaside. I downloaded them both and they worked first time. But. Squeak is such a wierd self-contained world, I really don’t feel I have the time or the motivation to get into learning my way around it just for the sake of a web framework. Quick googling also makes it far from clear how complete, stable and well-integrated its Unicode support is. So not just now. I appreciate that this may be my loss.

related entries: Programming

all text and images © 2003–2008