In my opinion, the #2 most important idea is something that came directly out of computing. But I’m not so sure. Do you know? Let me know, too!
I’m not too sure about whether this is actually a Computer Science idea, but I sure think it’s important. I’m going to just assume it is a Computer Science idea for the remainder for this episode. But if you know different, if you know that it is not a Computer Science idea, I would love to be corrected because I assume it is. I presume it is.
The idea is this. You can build a reliable system out of unreliable parts. That’s the second most important idea in Computer Science. Traditionally, you would build systems to be reliable mainly in two ways.
One, is make the parts more reliable, make them work better for longer, make them out of better materials to greater and greater tolerances, make sure that they follow the spec even more than you would because they have to work as a system.
You just keep making sure all the parts are working better and better. When you put them together, the whole, in theory, works better and better. In general, that’s the case.
The other way is redundancy which is have a secondary support for any kind of function of your system so that if one of them fails — which is inevitable — if one of them fails, you can fall over to the backup.
Now, what I believe is new in Computer Science and computing, in general, is the idea of building a reliable system on top of an unreliable system. There’s two examples I want to talk about.
The first one is the Internet, the idea of a self-routing packet network. Instead of having a network that is super reliable, meaning, every packet gets delivered to a very high rate of failure or have — I’m sorry, rate of success — meaning like 99.999, seven nines of packets actually make it to their destination.
What we have is a network that assumes that packets won’t get there. That they won’t be delivered quickly, that they won’t…that they’ll get lost, they’ll get dropped, and that the network itself, the lines, the wires that make up the network, and the nodes on the network, they’re all going to fail.
What you want is a network that can peal itself, that can reroute dynamically all the packets that fail, and then you can build protocols on top of the unreliable packet system to increase the reliability.
The internet is built with failure at the bottom. The basic unit, the packet, is unreliable. Yet by layering on a good protocol, we can make the system reliable. A reliable system out of unreliable parts.https://t.co/maVlH0shRH
— Eric Normand (@ericnormand) September 24, 2018
If you have something like TCP, it’s build on top of IP. IP is just the protocol for sending packets, TCP has a whole protocol built out of IP about sending packets, and having a handshake, and having acknowledgment packets that, “Yes, I got it” or “I missed one,” and out of order, it came in out of order, reordering stuff, all of that is part of TCP.
I’ve looked for ways of doing this that predate computing, but I haven’t found it. It’s not to say that no one did it, but I haven’t found it. Please correct me if I’m wrong.
The second example I want to mention is Erlang. Erlang has built into its core this assumption that things will fail. The assumption is that if you want to build something reliably, you have to assume that the parts are going to fail in an unknown way. That informs how you construct your software because if things are going to fail you need to build in the reliability at every level.
Erlang systems, if they’re well built, it can have something like seven nines of up-time which is crazy when you think about it. That’s so reliable. But this is why nodes are failing, networks are failing, the software itself has bugs in it.
Erlang systems, if they’re well built, it can have something like seven nines of up time which is crazy when you think about it. That’s so reliable. But this is while nodes are failing, networks are failing, the software itself has bugs in it.https://t.co/8Xiv6mLdJo
— Eric Normand (@ericnormand) July 11, 2018
How is it possible that that happens? Well, there’s a number of things you can do to build reliability on top of unreliable parts, and that’s the whole Erlang E fails. I think that this is a remarkable idea that you can use cheap parts and through architecture and correct design, build something super reliable.
All right. My name is Eric Normand. If you agree or disagree, please let me know. I’m not an expert in this particular super reliable systems field. If you come from some other background and know that this was happening, in Masonry or something, and I just didn’t know about it, please let me know.