Indirection Mechanisms and Expressive Power

What defines a language's expressive power?

Expressive power is hard to pin down in programming languages. In this post, I want to explore how indirection mechanisms lend expressiveness to a language. By indirection, I mean something that happens at compile time or run time which was not explicitly specified in the code. By mechanism, I mean something with well-defined, rule-governed semantics.

Indirection mechanisms are typically what makes a program shorter, easier to write, and easier to reason about. Instead of writing machine code directly, you are relying on the compiler or runtime to do the heavy lifting.

I posit that the development of indirection mechanisms closely follows the development of multiplicative expressive power. One line of Python is worth far more than 100 lines of C and more than 1000 lines of Assembly. And rather than debate about expressive power in different languages, I'd like to simply catalog some of the more obvious and important indirection mechanisms out there.

While many languages might share a given indirection mechanism, I don't know all languages, so I am just going to go with my personal experience and group them roughly by programming languages I am familiar with. I would love to hear of more mechanisms, if you know of them.

Assembly

Depending on the features available, Assembly can be seen as simply a more convenient language to write in than machine language. But this convenience comes from a indirection.

  • Named instructions - The assembler looks up the name of an instruction in a table and converts it to the machine code. It is much more convenient to remember a mnemonic sequence of characters than an arbitrary number.
  • Labeled statements - Instead of calculating a jump offset from the current instruction, you just name a line of code and the assembler will calculate it for you. It is much less error prone and immune to changes in the program.

C

C has many indirection mechanisms. Most of them were not invented in C. However, from a historical and pedagogical perspective, C is a good contrast to Assembly. Both are relatively low-level and ubiquitous, yet C is a huge leap beyond Assembly in terms of indirection.

  • Automatic number coercion - Numbers in C are automatically coerced "up" when operated on, whereas in machine language, the values would have to be explicitly converted. You reduce mathematical errors because the machine can do a better job than you.
  • Function call semantics - Function calls compile to a lot of code, including copying the arguments to the stack. Function calling in C is more convenient.
  • Recursive expressions - In machine code and Assembly, instructions are written linearly in sequence. In C, however, expressions can be nested. The compiler will convert them to the appropriate sequence of instructions. Nested expressions are easier to understand.
  • Structured programming - Conditionals and loops provide a simpler, less error-prone way to code than machine code jumps.

Lisp

Lisp's main evaluation semantic is the source of much veneration and arguably is the Ur indirection mechanism. I am talking about the eval function as defined in the original Lisp paper by McCarthy. There are more mechanism, but they are too many to list here.

  • Eval - Most values (so-called atoms) evaluate to themselves, symbols are looked up in the current environment, and to evaluate a list (a function call), evaluate its elements, then apply the first element (hopefully it's a function!) to the rest of the elements.The indirection comes from the late-binding of the symbol to its value. The environment can change, so the function call can have different meanings at different points in the program. This greatly facilitates incremental development.
  • CLOS - The Common Lisp Object System defines many indirection mechanisms for doing object-oriented programming. They include method combinations (before and after methods), class-based polymorphism, and method dispatch. The most interesting indirection mechanism is that a class can define its own indirection mechanism.
  • Garbage collection - Instead of writing routines for managing the memory of your program, the runtime will take care of recouping memory when it is no longer needed. A great productivity boost.
  • Macros - Lisp macro calls look like function calls, but they are expanded into code at compile time. This indirection mechanism essentially allows you to extend the language within the language.

Smalltalk

Smalltalk defined object-oriented programming as we know it today, based on a simple indirection mechanism: message passing. It also was the first language (that I know of) to make significant use of development tools. Like Lisp, I cannot list all of the indirection mechanisms in Smalltalk. They are too numerous.

  • Message passing - The main semantic of Smalltalk is message passing. A message consists of the name and the arguments. The name is looked up in the vtable of the receiving object. If it is found, the associated method is returned and called on the objects. Otherwise, the name is looked up in the vtable's parent, and its parent, to the end of the vtable list. This vtable lookup is both dynamic (it allows for changes to the vtable at runtime) and provides polymorphism (different kinds of objects can respond to the same message).
  • Semantic refactoring - Development tools before Smalltalk were usually confined to enhanced text editors. Smalltalk let you refactor code, meaning changing code not at the textual level but at the structural-semantic level. For instance, performing a "rename" operation on a method would not only change the name of the method in that class, but also change the code for all of the callers. This is another indirection mechanism which is now taken for granted in the Java and .NET worlds.

Self

Self is a prototype based object-oriented language.

  • Slot access - In Self, to access a slot from an object, the name of the slot is looked up in the object. If it is found, the value is returned. Otherwise, the parent slot is accessed, and the slot name is looked up in the parent recursively. This allows one to build objects that are mostly identical to an existing object except in a small way.

Haskell

  • Typeclass polymorphism - Haskell, unlike Smalltalk, keeps the values (objects) separate from the vtables. It is the job of the compiler to reunite these two elements when necessary. This allows data types to be defined separately (even in different files) from the code defining the type's inclusion in typeclasses. The Haskell compiler, following well-defined semantics, will convert calls to a typeclass method into a call to a function with the vtable as an implicit argument. This enables a limited form of polymorphism, where the same name refers to different functions depending on the types.
  • Pattern matching - Pattern matching in Haskell means destructuring a value into component values while optionally binding names to values. It lets you write in one concise statement the equivalent of several conditionals and let statements.

Clojure

Clojure also deserves a mention for its championing of various mechanisms for controlling concurrent state.

  • Concurrent state (refs, agents, atoms, etc.) - The Clojure runtime provides many indirection mechanisms to deal with state. Refs, for instance, provide transactional semantics over different values. This allows one to write parallel programs without explicitly using locks.

Indirection mechanisms are semantic abstractions afforded by the language. Like all abstractions, they are prone to leakage. How a language deals with the leaks might be just as important a choice as which mechanisms the language supports. For instance, what happens when you pass an object a message it doesn't understand? Some common options include: compiler error, thrown exception, silent failure, magic, or undefined behavior. These are damage control at best.

A lot of the leakiness of abstractions has to do with how mechanical the indirection is, and how simple the rules of the mechanism are. A complex mechanism is hard to understand, while a non-deterministic mechanism will sometimes give unwanted behavior. And when things go wrong, it is hard to distinguish between complexity and non-determinism. We should therefore strive for simple, deterministic abstractions. And we should also strive to bake the failure into the model, and build systems to deal with abstraction failures upfront.

To conclude: Different languages make different choices of which mechanisms to provide, and hence the endless variety of languages we see today. It is my hope that we will find more such abstractions which allow programmers to be even more productive in the future. Further, perhaps there are ways to make existing mechanisms more robust and which fail in more appropriate ways.