The Content of Your Code

Summary: Code style is important, but way less important than content. Yet everyone talks about style because it's easier. Let's talk about content. I'll start with some bullet points.

In fiction writing, there is a fine, but visible, line between style and content. Style is your choice of words and grammar while you're telling a story. Content is the parts of the story itself. It's the characters, the plot devices, the motivations, etc.

Style is important. Classic works of literature usually have a good style. But content is more important. Good style can enhance a story. But a story can be retold many times, each with a different style, because a good character and story can stand on their own. No amount of style is going to save crappy storytelling with uninteresting characters. It didn't work for The Matrix 2 & 3, and man, style did not help at all in the Star Wars prequels.

You can roughly divide programming decisions along a similar line. Coding style versus coding content. So much advice falls on the style side. It talks about variable naming, function length, what parts of a language not to use.

Where is all the stuff about code content? How do we choose an algorithm? How do we pick a data structure? Basically, how do we translate a real-world problem into a computational model? How do we determine if a program correctly models the problem? How do we judge if one model is better than another? The answer is so simplistic. A model fits a problem if the structure of the model matches the structure of the problem.

Structure, structure, structure.

Here are some bullet points:

1. Choose the right collection

Here's a good example that we should all be familiar with. How do you choose between an array and a map? Well, if your problem is to do things in order, an array is the better choice because it is naturally ordered. If your problem is "I have an x and I need a y", a map is probably better, because maps associate one value with another. The data structure's properties mirror the properties of the problem.

2. Factor your code

Refactoring is improving the style of your code without changing the content. But factoring is changing the structure of your code to reveal the underlying structure of the problem. This is the only reliable way to get a one-to-one mapping between code and reality.

3. Determine the essential structure of the problem

I have written before about finding the essential idea in a problem. Object Oriented Programming advice tends to recommend picking each of the objects in the real world and creating a class for each. So, if you're modeling students and courses, regardless of the problem you're solving, you should have a student class and a course class.

This practice comes from the early days of OOP, when it was still used a lot in simulations. I can see the benefit of representing each thing in your simulation as an object. But we're not building a simulation. A university registration system is not a university simulator. We are not simulating students. We are not simulating courses. We need to be looking at the problem we're trying to solve.

How is it already done?

I really think the best way is to look at the process that is already being used. If you're hired to replace a manual, pen-and-paper system, you have a head start over a new system. Computerizing an existing process is easier because the problem is already well-understood. Go ask the registrar's office how they are doing it.

Let's say that each department keeps a large list of all the courses they give each semester. For each semester, they start a new notebook and make a page for each course. As students register, they write down their name. If they unregister, they cross them out. They leave room for enough students between each course, sometimes skipping a page. They put post-it notes sticking out the top so they can quickly turn to the page for a course when a student comes in the office.

Wow! Your job is now way easier. You just have to replicate that notebook in code. That is so much easier than modeling students and courses. And once it's done, you have a place for improvements that are only possible in a computer.

Finding the essence

But let's say it's a new university, trying to get a head start on old universities by organizing everything on a computer. So there's no existing process. You've got to make it up.

What do you do if the structure is not obvious? How do you determine the structure of a poem? Reading. Re-reading. Underlining. Arrows. Notes. Clarifying definitions. Basically, look for structure. Dig it all up. Then use your judgment about what is important. There often is not one single kind of structure, but a constellation of structure.

4. Factor out incidence from essence

The incidental structure just happens because of the choice of solution instead of the structure of the problem. The structure inherent in the problem is essential structure.

In the notebook used for registering students, there are some incidental implementation details that you don't want to replicate. In the notebook, they left some blank space for more students after each page so that they wouldn't run out of room. But running out of room is not an issue in a computer (no university is that big). So you can leave that part out.

But less obviously, the fact that there is a separate page for each course is also incidental. It's not important that the students in one class all be stored in a single place. What's important is that at any time, a student can register for the course (random access!) and that at any time, a teacher can list all students in a particular course (random access!). We need a way to project a list of students quickly enough in a course from however it is stored.

But it turns out that, if you factor correctly and find the essence, your solution should be generic. Why? Because structure is generic. It is pure content. The correct solution to this problem is to make a system to manage many-to-many relationships. Relational databases can do this easily with one table. You could make a ManyToMany<A, B> class. Those are implementation details that are incidental. What's essential is the many-to-many part.

Conclusion

We need more discussion about the content of programs. Style is important, but we need people to create acronyms and rules of thumb for choosing program constructs. The Design Patterns book (and movement) were important in this respect. It documented patterns of common structure. But it failed to do a good job teaching the how, and added an air of mystery.

Although this process is language-agnostic, Clojure is great for finding essential structure. One thing I like about Clojure is that the data structures are described in terms of their usage structure. And Rich Hickey has expressed many times that to understand a problem, to design a solution, we must pull things apart into its essential parts.

If you'd like to learn Clojure and see how it might help you think in terms of structure, I can recommend my LispCast Introduction to Clojure video course. It builds up skills from complete beginner to decomposing a problem into a generic solution.