Saturday, March 31, 2007

Five things I hate about Python

It's been a meme of late to blog about 5 things you hate about your favorite programming language, in order to qualify you to complain about other languages. (It proves your "objectivity.") What I really wanted to do is blog about things I like about Python, but to go along with the current theme, here is my list of the top 5 things I hate about Python.

  1. Lack of support for blocks - I have a bit of an esoteric desire here: I want to use Python-like syntax to build a DSL for compiling into hardware description language, essentially using Python as a macro language on top of the DSL. I want to be able to create my own control structures (hardware_if:...., etc.). Python doesn't let me do this.

  2. Stdlib organization (or lack thereof) - Maybe this is inevitable, as the stdlib is made up of all sorts of things integrated from other packages, but it would be nice to have something closer to what the C# and Java people rave about in their respective languages.

  3. Inefficient tail calls (tail recursion) - Maybe not important to too many people since Python has such cool iteration support, but it would be nice to be able to write a tail-recursive function and know that it will be optimized into a loop.

  4. No "nonlocal" rebinding of functions - This will be fixed in Python 3, which is cool, but it's not there right now. Basically, you can read variables from enclosing scopes right now, but you can't "rebind them", that is, assign the name to a new object. All assignments are either in the local scope (default) or module scope (if you declare the name "global"). And honestly, the main reason I hate this is that it gives Paul Graham one more negative thing to say about Python relative to LISP.

  5. No dictionary comprehensions - And it's not gonna happen any time soon. List comprehensions are wonderful, and I'd like to build a dict in the same way. I know I can do dict((k,v) for k,v in seq). But I don't like the double-left-paren there. OK, so it's a nit. But wouldn't it be nicer just to say {k:v for k,v in seq}?



Well, there it is. 5 things I hate. Now I can get on to writing about the things I like.

Sunday, March 25, 2007

Dynamic Language Weenies?

I saw the linked article and couldn't help but get irritated. I know I shouldn't get this worked up about programming languages, but I did anyway. (Perhaps it was the unnecessarily abrasive tone of the author....) Rather than making you wade through the rather misinformed article, let me summarize and refute some of the author's main points here.


[Edit 3/31/07 @11:58pm - It looks like some of the points I complain about in this article have been changed in the original post. I haven't gone over the article line-by-line, but the article I am responding to was posted on 3/25/07, while the one currently posted is dated 3/26/07, so please take that into account when reading this (3/25/07) article. Some of the main points that were changed in the original post had to do with the use of the confusing terms "weak" and "strong" typing. There may be others.]

[Edit 4/5/07 @5:55pm - I should make it clear that when I refer to "static languages" in the article below, I do so in the same sense that the original author refers to "static languages" -- languages in the style of C, C++, Java, C#, Pascal, etc. I am aware of statically languages such as Haskell and ML which convey many of the productivity benefits of "dynamic" languages in a statically typed environment. ]

Brief aside #1: Non-weenie credentials

I have worked as a commercial hardware and software developer in Real Jobs now for about 15 years. I have used, in production scenarios, C, C++, C#, Dynamic C (embedded programming), VHDL, Verilog, SQL, Javascript, and Python. I have written embedded microcontroller programs, C compilers targeting FPGAs and exotic dataflow (MONARCH) architectures, multiprocessor simulators, enterprise workflow applications, and high-volume web sites. I have implemented digital logic for the StarCore SC140s DSP core, as well as designed various IP digital logic cores, including a Viterbi decoder and an SHA-1 hashing engine. I am not a weenie.

Somewhat less brief aside #2: Muddled thinking about typing

Next, and this gets me every time, the author confuses the (at least) three axes of typing. The first axis is the strong/weak axis, and generally puts languages such as C, perl and rexx on the weak end and most everything else on the strong end. The deciding question here is whether the language implicitly converts unrelated types without warning, allowing you to add the integer 0 to the string "10" and arrive at the result "010" (or is it 11? I forget.). Strongly typed languages will cry foul, where weakly typed will simply do what they think you mean and continue.

The second axis is static versus dynamic typing [Edit 4/3/07], also known as early versus late binding. This has entirely to do with how names (variables) are resolved in the language. In statically typed languages, a name (variable) is associated with a type, and that name (variable) can never reference a value of any other type. In dynamically typed languages, a name may be associated with values of different types over its lifetime. Languages such as C/C++, Java, Pascal, Haskell, OCAML, etc. fall into the static category (with some dynamic capabilities in C++ and Java through runtime dynamic type casting), while languages such as Ruby, Python, etc. fall into the dynamic category. Many languages have support for both, including Lisp and the aforementioned Java and C++.

The third axis is manifest versus implicit typing, and it is a fascinating axis. (Note that this axis is really only applicable to statically typed languages, so it might not really even be an axis in its own right, but I think it's worth looking at here.) Implicitly typed languages such as OCAML, although they are most definitely statically typed and compiled, actually perform quite advanced type inference on the code to determine what types you intended your variables to be, generally by analyzing which operations they participate in. Remarkably, an OCAML compiler is able to produce extremely optimized, statically and strongly typed code, even in the absence of explicit type declarations. RPython (part of the PyPy project) is example of an implicitly typed subset of Python whose compiler is able to produce highly optimized, statically typed code.

The author of the "weenies" article conflates all three axes into strong versus weak, and puts C/C++ and Java on the "strong" side, with Ruby, Python, etc. on the "weak" side, while ignoring other languages such as Haskell, LISP, OCAML, etc. Which hopefully you can see is a gross oversimplification. If you're interested, my current language of choice, Python, is a strongly, dynamically typed language.

Aside #3: Ignorance of other strong advantages of dynamic languages

The author left out what I consider to be two of the most important features, productivity-wise, of my current chosen language, Python: built-in polymorphic containers and high-order functions. (These are also present in most dynamic languages, but I'll discuss them in the context of Python, because that's what I'm familiar with.)

Built-in polymorphic containers

Python has, built into the language, the following containers: lists, tuples, dictionaries ('dicts'), and strings. And when I say that they are built-in, I mean not merely that the language includes facilities to manipulate the structures, but that it also includes a convenient literal syntax to define them. A list of the integers from 1 to 10, for instance, is represented as [1,2,3,4,5,6,7,8,9,10]. Lists have methods that mimic the union of arrays and linked lists in other languages, and even support recursive definition (not that I've used it):

>>> lst = [1,2,3]

>>> lst.append(lst)

>>> lst

[1,2,3,[...]]

>>> lst[3]

[1,2,3,[...]]

>>> lst[3][3]

[1,2,3,[...]]

Tuples are similar to "immutable lists", and are often used to return multiple values from a function:

a,b,c = foo(d,e,f)

This also illustrates a corrolary property of tuples and lists, the "destructuring assignment", that allows you to "unpack" structured data in a very succinct way.

Dictionaries are nice syntactic sugar for hashtables, and allow you to use any "hashable" object as a key to index any Python object: {1:'foo', 2:'bar', 'the third item':'baz', (1,2):'baff'} This also illustrates that all these containers are polymorphic (though in practice the types in a container are usually restricted). Strings need little explanation, except to say that they are built in, unlike C/C++ strings.

Why do all these types make things easier? Mainly because they're already included. To use a list in C++, you have to #include the right header and declare the type of things that go in the list. To use a vector in C++, you also have to #include the right header (a different header, if I remember correctly) and declare the type of things that go in the vector. And good luck if you want to use a hash table. For that, you not only have to #include the header and declare the types of key and object, but you also have to provide a hashing function. Wouldn't it be nice if the language took care of all that for you? (Yes, it is very nice.) If you're using C++, you're also stuck with no literal syntax for specifying anything but the simplest structures, and the enormous pitfall of confusing STL string<>s with char[]s. Dynamic languages (like Python) so significantly lower the bar on creating data structures that you'll often see lists of dicts or tuples where C++ or Java would be littered with utility classes, type declarations (and in pre-generics Java, typecasts). I mean, come on -- do I really need to declare a pair class if I want a list of points? And a separate RGB class if I want color values? Give me a break.

High-order functions

Simply put, this is the ability to treat a function as an object in your code, and the utility of this feature is difficult to overstate. C++ really tries to do this with templates, and Boost::Lambda gets 95% of the way there, but wouldn't it be nice if you didn't have to jump through so many hoops? Python includes features such as map (apply a function to every element in a container), reduce (apply a 2-argument function to every pair of elements in a container until it's reduced to one element), filter (find all elements in a container for which a predicate function returns true), and lambda (define an anonymous function inline). C++ has support for these, but you have to create a functor object (which Boost::Lambda makes mercifully simpler). Actual C++ functions are thus second-class citizens. If you are a C++ or Java language programmer, it may never have occurred to you to write a function that takes a function as a parameter and returns a function as its result. If you are a dynamic language programmer, you probably wrote three of these last week. It's higher-order thinking, and it's simply not supported as well in most static languages.

I should probably pause for a moment and make the point that the C++ templating system is a Turing-complete, dynamically typed, functional programming language that happens to be interpreted at compile time. (I have a compile-time factorial program that I can show you if you don't believe me.) Its syntax leaves much to be desired, but it's semantically much closer to the dynamic language camp than the language onto which it was bolted on, C++.

OK, on to the author's main points, which he presents in a claim/reality format:


Claim: Weak Typing, Interpretation and Reduced Code Volume Increase Development Speed

Reality: No they don't, either individually or together. ....

My reality check: Dynamic typing, interpretation, and reduced code volume do indeed increase development speed.

Dynamic Typing

Have you ever tried to write a really static C++ program? You know, where you actually declare all the methods that don't modify your class as "const" methods? I tried it. Once. Might have tried it again if I didn't have to work with other people. Dynamically typed languages do increase development speed, although their impact is somewhat mitigated in larger projects where enforcement of interfaces becomes more important. Where they really shine, however, is in their "genericity." C++ tried to do generics with templates, and it succeeded to some extent. I'm sure there are other examples in other languages. Dynamic languages give you what are essentially C++ templated functions for every function you write.

Interpretation

Interpretation helps, not so much because compile time is prohibitive in static projects, but because the REPL (read-eval-print loop) is so freaking easy. Want to try out something quickly? Paste it into your interactive shell. Static languages are beginning to understand this, with some IDEs providing a little interactive shell. But how long did it take to "invent" this feature (which was present in Lisp in the 1960s)? Interpretation also facilitates the exploration of new language features in a way that statically compiled languages have a really hard time keeping up with. Take it from someone who has written both interpreters and compilers: it is easier to add a feature to an interpreter than it is to a compiler. OCAML does some amazing things in their compiler. You're going to have a hard time convincing me they can extend the language easier than the PyPy team, however.

Reduced Code Volume

Reduced code volume certainly does reduce development time trivially -- less typing. More importantly, however, it allows you to fit larger concepts onto one screenful of code. The units at which you are programming are larger. Also important to note is the correlation of bug count with source lines of code, independent of language used. That means that, roughly, 1000 lines of assembly has the same bug count as 1000 lines of Lisp. Which one do you think accomplishes more? Reduced code volume is easier and faster to code, debug, and maintain. I can't understand how the author could even imagine this not to be true.


Claim: Support From Major Companies Legitimizes DLs

Reality: No it doesn't. Companies know that fan boys like you are easy marks - an enthusiastic and indiscriminate market segment ripe for exploitation. They also know that you might spread your naive enthusiasms into your workplaces, opening up a corporate market for supporting tools.

My reality check: OK, fine. Companies are driven by profit, so I can accept that corporate profit chasing has little to do with the quality of a language. But this cuts both ways. Java has been pushed by Sun, and C# by Microsoft. Neither would have anywhere near the market share they currently have without their corporate backers.

But let's leave aside corporations "supporting" the languages. Let's look at those who actually get things done. Yahoo! stores was originally written in Lisp. BitTorrent in Python. Google and NASA use Python extensively. The OLPC project is using Python as their core O/S language. 37signals uses (and invented) Ruby on Rails. Reddit is Python (was Lisp). YouTube runs on Python. And tell me, how many thin-client applications use Java applets (static language) versus Javascript (dynamic language)? And that's even with the hellish problem of browser inconsistency.

Claim: As the Problems Change, People Use New Languages

Reality: As languages change, people remain the same. Software development is now, and always has been, driven by an obsession with novelty, and that is what drives language adoption. If there is a new problem to solve, that will simply make for a convenient excuse. Your misplaced enthusiasm simply perpetuates a cycle of self-defeating behaviour that prevents software development maturing into a true profession.
My reality check: Yes, people remain the same. However, the resources we use do not. CPU cycles and memory are relatively cheap today. That's why no one (except some embedded developers) can get away with saying they need to program in assembly language. Runtime performance is objectively less constraining now than it was 10 years ago for the same problems. Which means that all the things we wish we could have done in 1997 are available now. Like dynamic, interpreted languages.

Language is a tool for expressing ideas. Some languages express different ideas more easily, or with greater difficulty, than others. Try saying ninety-nine in French, if you don't believe me (quatre-vingt-dix-neuf, literally four twenty ten nine). Programming languages are no different. Things have been learned about better ways to express yourself since C++, Java, and C# were invented. C++, Java , and C# also chose to ignore certain things that were well-known in programming language research when they were invented due to design decisions that were made in a technological context dissimilar to today.

And for one further reality check, no, language adoption is not driven by novelty. Java introduced nothing whatsoever that was new. It started with known features of C++, removed a bunch of stuff, added a garbage collector that had been understood since the days of the VAX, and threw an ungodly amount of marketing behind it. Java is no longer new, but it is still widespread. C is certainly not new, and its popularity remains astonishingly high. Language adoption is driven by a wide range of factors, including but by no means dominated by novelty.
Claim: You Can Assess Productivity By Feel

Reality: No you can't. You're just trying to justify personal preference by hiding it behind a legitimate but definitionally complex term. If you've never taken measurements, you have approximately no idea what your productivity is like either with or without your favorite dynamic language. You certainly can't assess the difference between the two.
My reality check: The article's author hasn't taken measurements, either. But Lutz Prechelt at least has some data, where the linked article presents none. In fact, without exception, all studies of which I am aware [Ed: 4/5/07] which have compared productivity in languages between compiled, manifestly, statically typed languages and interpreted, dynamically typed languages have the dynamic languages easily winning out.

But what the author ignores is that the "feel" of a language, while not providing objective evidence of its productivity, is an influencing factor in its productivity due to the increased motivation to work in a dynamic language. If I like how a language "feels", I will use it more. I will be a more motivated employee, producing more. If I do open source work, I will work more on it, producing more libraries and facilitating code reuse, which even the most jaded non-weenie must admit is a Good Thing.

Claim: Syntax Can Be Natural

Reality: All programming languages are arcane and cryptic, in different ways and to varying degrees. What is perceived as "natural" varies tremendously between individuals, depending upon their experience and background. Your mischaracterisation of a syntax as "natural" is just an attempt to retro-fit a philosophy to your personal preferences.

My reality check: No syntax is completely natural, but some have more in common with non-programming languages than others. For instance, Haskell invented a wonderful syntax for specifying lists: the "list comprehension":

[x + 2*x + x/2 | x <- [1,2,3,4]]


OK, that looks weird if you've never seen it before. But does it have an analogue outside of programming? How about

{ x + 2*x + x/2 | x in {1,2,3,4} }

That's just about pure mathematical notation for a set. Python took a compromise approach and writes the more "verbal":

[ x + 2*x + x/2 for x in [1,2,3,4] ]

How do I say this in C++?

#include<list>
std::list<int> mklist() {
std::list<int> l;
for(int x = 1; x<= 4; x++)
l.push_back(x + 2*x + x/2);
return l;
}

Which feels more "natural" to you? I see in programming language design two big threads, of equal power but radically different approach, which I will name by certain scientists who inspired the respective approaches. One is the "Chuch" thread, where languages express mathematics. The other is the "Turing" thread, where languages command machines to accomplish tasks. Roughly, this puts languages into "declarative" and "imperative" camps. Dynamic languages pull ideas from both camps. Static languages (at least C/C++, Java, and C#) are heavily imperative, and have little support for declarative concepts. Sure, neither is particularly "natural," but dynamic languages have more expressive capabilities and can approach a "natural" syntax more easily than manifestly static languages.

Claim: A Strength Of My Language Is Its Community

Reality: If it it[sic], then you are in deep trouble, for your community appears to be dominated by juveniles who only take time out from self-gratification long enough to wipe the byproducts off their keyboard, then mindlessly flame anyone who does not share their adolescent enthusiasms.
My Reality Check: The community is a strength, no doubt about it. But communities are made up of all types. For every blithering idiot, there may be five or ten solid programmers pounding out production-quality code. I had to make a choice tonight -- write this article or work on my application. Maybe I made the wrong choice. Many of the juveniles of which you write don't have that choice, being without the requisite skills to create an application. Of course, that raises the question of what you were doing writing the article....
Claim: No Harm, No Foul

Reality: No Brain, No Pain.

And the dynamic languages people are juvenile?....