What do product and sum types have to do with data modeling?

Product and sum types allow us to exactly model any number of states with a lot of flexibility.

Transcript

Eric Normand: What do product and sum types have to do with data modeling? By the end of this episode, I hope to answer that question, because I missed it when I talked about product and sum types the last time.

Hi, my name is Eric Normand and I help people thrive with functional programming. A few episodes back, I talked about product and sum types and explained what they were. I'm not going to go over that again, because you can go listen to that one and get it all.

I did get a couple questions about why I mentioned them, and what they have to do with data modeling at all. I realized I had explained what they were, but I totally forgot to talk about why you would want to use them at all. Now, I'm going to try to explain that.

Let's look at a simple case, all right? You're modeling this domain, and there's certain cases that you have to capture. Let's say that there are 10 cases.

Now, one thing you could do is simply enumerate all of the cases. Let's say, it's 10 states that the system can be in. You can enumerate them so you have one type that has all 10 cases. That would be a sum type that has the 10 different constructors, let's say.

There's other ways you can hit the number 10 as well. You could have two 5s. You could have two types, let's say a five each, and then have an either between them. An either is a sum type, so it's going to sum the five and the five. That gives you 10.

Now, another way you could do it is, you could have a two times five, so not five plus five, but two times five. Meaning, you use a product type where the first, let's say a tuple — a tuple is a product type — so, the first element of the tuple has two cases and the second element has five. Now you've got the 2 x 5 and you've got 10 again.

This is the kind of thinking that you can do once you realize that there are these product and sum types that you can use.

Why do you want to target the number of cases exactly? I've done a very detailed analysis for my book of this. It's not going to be in the book, because it's boring.

If you've got too many cases in your domain model, that means a case that doesn't really exist in the real world but it exists in your software, like it's possible. Let's say you did 5 plus 6, and so that's 11.

There's 11 cases, but one of them shouldn't be used. At some point, you're going to either use it or have a conditional to make sure it's not being used.

Conditionals add complexity to your code. If you have extra cases, you're adding complexity.

What if you had 9 cases when there's really 10 cases in your domain? If you only have nine cases in your code, that means that you're probably going to be overloading one of those cases to make up for it.

You're probably going to be using one of those cases in two different ways and your code is going to have to have a conditional to figure out which one of the cases it really should be.

In both cases, in both of those situations, you are missing the perfect fit between your domain model and the domain itself, and adding complexity because of it.

An example of something very simple, an almost silly example of a time when you might have a misfit is...and this happens a lot. You have something like you're reading a sensor, like a thermometer.

Sometimes when you do a read, it doesn't give you a number. But the API you're given, doesn't have a way to not return a number because it's C or something. The return value is always a number.

What does it do? It overloads zero to be either zero or, oops, I didn't get a reading when you asked me for it, because the sensor was down, or you're trying to ask too quickly, or something like that.

It's overloading zero because there's this case that doesn't fit within the type. The type that they chose was int. There's this extra case that they want to represent and that's not in the type, so they had to overload one of the values of int with this number.

Now, maybe they could have chosen a better number, something that you're unlikely to ever read, like negative one million, or something like that.

Even if you had that, you still have a conditional that you're going to have to deal with. If you want it to convert it to a better type, you'd still have a conditional in there, at least it would be in one place in the conversion code.

These things happen in real life. I have a memory of a system where you could accept both credit card and PayPal. With credit card, they were using Stripe and PayPal has its own ID.

The Stripe's ID and PayPal's ID were stored in the same database table. To know if someone was a Stripe customer, it would take that ID and run a regex on it to see if it looked like a Stripe ID. To save it as PayPal, they'd run another regex on it to see if it looked like a PayPal ID.

Everywhere where you had to use this, you wanted to get the ID or figure out what to do with it, you had to run this regex on the field.

Instead, they should have realized that if they had known about product and sum types, they would have realized that this is actually a sum type. Instead of overloading this one field for the two, we should have an either PayPal or Stripe, or some other system like that, that would better fit the domain.

In fact, I have heard that in a recent upgrade, they have made that change because it was causing them a lot of pain. A lot of code which was just checking for, "Is this a Stripe ID or a PayPal ID?" Very duplicative code.

If you have these products and sum types, notice you've got plus and times. Product is times and sum is plus. You're able to target any number that you want, in different ways, in multiple ways.

Like I said at the beginning, you can have all 10 cases enumerated in one type, or you could break it up into two 5s and sum them. Or you break it up into three and seven and sum them. Or you could have two six and another two. There's all sorts of ways that you can break it up.

That's what the product and sum type gives you. It gives you this flexibility to really dig deep into the domain and model it both correctly, because you can just do the math and know, yes, these have all the possible cases, and I'm not missing any and I don't have any extra. You have the flexibility to choose whatever way you want it.

If you want to break it down into two, like I said, two fives, and then sum them, you can do that. You don't have to have the one 10. If you want to break it down like I said before into 2 and 5 and multiply them, then you can get the 10. You can do two fours and then add a two, two times four and then add a two to get the 10.

Once you open this door, you can see how easy it is to target a specific number of cases and it lets you analyze whether you're actually getting the right number of cases and avoid the problems of using a product type and not realizing that...

Let's say you wanted to target nine things. You used a two times five thinking "Oh, that 10th one, I'm never going to use it. It's going to be fine."

You don't realize the problems. What if you did three times four and you had three extra cases?

That's even worse. You're adding complexity to your software. You're creating conditions where a value is possible to be created, but not meaningful in your domain. You're asking for trouble. If you can avoid it, you might as well avoid it at the beginning.

Just to recap, we have product and sum types, which let you easily model any particular number of cases that you might have. They let you easily analyze how many cases you do have. They're very useful for having all the flexibility to target particular numbers, and being able to know how many you actually have.

You have both. It's both easy to create and very flexible, and easy to see how many you have, so in analysis.

All right, if you liked this episode, you should go to lispcast.com/podcast. There you'll find all the past episodes, including the one where I explain product and sum types and all the other ones with audio, video, and text transcripts.

You'll also find links to subscribe and to find me on social media. Get in touch. I love getting these questions. I love answering them on the podcast.

My name is Eric Normand. This has been my thought on functional programming. Thank you for listening, and rock on.