Monday, 1 November 2010

Sir, Please Step Away...

A day or so ago, Slashdot pointed me to an interesting (or "interesting"?) article by Poul-Henning Kamp (PHK in further text) at ACMqueue. In the article PHK argues that programming languages should move on from being coded strictly in ASCII, and on to something more expressive, e.g. Unicode. While I may not have written much code in the last several years, I have written a lot of code in many more years before that - all in ASCII, of course. I also cannot claim to be any sort of expert, guru (or even "guru"), let alone visionary, but I do believe that I have a good enough overview of the dark art of programming to be allowed an informed opinion in this matter. Even one that is opposite, and not very flattering to the PHK's proposal. And, as you must have guessed already, I will now tell you all about it...

The first, and probably the most obvious objection to PHK's article is that it is not very rich in either good reasons for adopting a more complex way of coding, or in examples that may be seen as teasers to draw in support for his viewpoint. The former is generally expressed as the dichotomy between "poor" ASCII, with its miserly set of 95 characters, and the alleged richness on offer if we adopt Unicode or similarly complex and "rich" way of expressing computer code. The latter is expressed by quoting a couple of Unicode characters or strings, and even that is done in hexadecimal notation. There's also mention of colour, and a few other styles. This is very, very far from a strong case for a "richer" source code.

And now, a few much more substantiated objections from yours truly...

First, a non-programmer one: the natural language, in this case English. English language, as we should all know, needs but 26 letters, and a handful of signs of punctuation marks. With this meagre set outstanding literature was made possible, as well as science - including software engineering. It is unlikely that Chinese, with its thousands of glyphs, is any better (or worse, for that matter) for expressing human thought (and feeling, for that matter). It is equally unlikely that computer languages will be made any better by simply allowing them to be expressed with more than 95 characters (195, 11950, ... where does it stop?). If anything, it will make them harder to learn. Don't forget, a Chinese speaker (or rather - writer) may not need ever master every single one of the thousands of glyphs that form Chinese alphabet. On the other hand, a programmer, if he is to master the language he's using probably does need to learn every single little detail of its syntax - and that includes every single character available.

Next, while I will happily agree with PHK that character-only, black-and-white displays are a thing of the (ancient) past - and that it's good that they are - and that we may not all still use the (in)famous ASR-33 terminals, even PHK should be able to see that the keyboard in front of him (and me, and you) is, for all intents and purposes, the same one as on ASR-33! We may have a couple more modifier keys, a numeric keypad (which just duplicates some keys), and a row of "function" keys, but we all in fact still do use the ASR-33. What this means is that we (still) have to jump through hoops to enter any of the "special" characters outside simple ASCII. Your options are, roughly: assign (and remember) extra key combinations to existing keys - some of them may require two modifier keys, too (e.g., Ctrl-Shift-P for ¶); hunt for the desired "special" character in the Character Map-style applications (as I did just now for that ¶). There may be other (and slightly better) ways, but they would all have to somehow bypass the simplicity of the standard QWERTY keyboard. Or, we'll all (or at least all programmers will) have to buy new, as yet unreleased, special keyboards with much more than 100-odd keys. Goodbye, coding on a laptop!

PHK also calls for the line length conventions to be broken. With the large screens of today, why not let code stretch far more than a few dozen characters? Well, one good reason not to is that very long lines are very hard for a human to parse. For one, reading a very long line may (will?) require a person to turn their head from side to side - for each and every long line, row after row after row. So, reading code becomes not only a difficult mental task, but also physically demanding. Add neck injuries to the carpal tunnel syndrome! Alternatively - probably because he realised the previous problem, even if he didn't say so - PHK calls for extra width to be used to display subroutines alongside the code that invokes them - as if that was not already both possible and done. Every programmer's editor worth its salt allows you to open multiple windows and arrange them on screen in any way you wish (including the possibility for two or more windows to scroll synchronously). So nothing new there, then. A pointless point, if you will.

What's left then of PHK's arguments and suggestions?

Well, not having much of an argument for his case in the first place, what's left to comment on is the suggestion for use of colour (as in colour coding of syntax), and presumably other text attributes (bold, underline, italics, strikethrough, or maybe a combination thereof?). These are all (or at least most of them) already in use in any decent programmer's editor. I guess PHK would want them standardised, or at least standardised per every new "rich" programming language. Aside from the fact that applying such attributes requires extra keystrokes (and good programmers are lazy and dumb, as you probably know already), there would be an uproar from all the people who chose, and got used to, all the different styles of colour coding today. Oh, and PHK never says what sort of thing different colours or styles would denote: maybe a "stop" would make the code stop quicker, and in a less safe way than just "stop", or even "stop" (the last one maybe meaning "please stop? pretty please?")? And, what about all those colour blind people? How about dyslexics who already struggle with English alphabet?

Here is a good (cheeky) example of what new PHK-style code might one day look like. Maybe it will soon take place alongside all the other possible (and fun) renditions of the famous "Hello, world." program. That's all as may be, but I know a few things for sure. Or at least I have hopes that some things will never be coded in a colourful, Unicode-rich, a mile-long-line-of-code way. Would you trust your life to an auto-pilot, cruise-controlled car, or a robot surgeon if it was programmed in a language where "stop" is different from "stop". Or, a bit less (or more?) scarily, would you subject your child to a course of such a programming language? Even if it wasn't dyslexic or colour blind. I know I wouldn't...