What is idempotence?

Idempotence means duplicates don't matter. It means you can safely retry an operation with no issues. The classic example is the elevator button: you press it twice and it does not call two elevators. We explore why we would want that property in an email server.

Transcript

Eric Normand: What is idempotence? Why does it help so much with programming in distributed systems? By the end of this episode, you will know how to implement idempotence in your own system.

Hi, my name is Eric Normand, and I help people thrive with functional programming. Idempotence is important because it captures the essence of the safe retry. Without safe retries, you really cannot implement safe distributed protocols.

What is idempotence? The essence of it is that if you ask twice, it's the same as asking once. It has the same effect. The classic example is the elevator button. You go into a bank of elevators. You press the button. It lights up. It calls the elevator. Then someone else comes in, and they go and they press the button, too, the same button. It's already lit up.

We know that that doesn't have an effect. We still want to do it for some reason. It's a just-in-case. Maybe they're right, maybe the signal didn't get to the elevator. It's worth trying because it can't hurt anything. That's the kind of thing that we want to instill in our distributed systems. Technically, it is an algebraic property.

When you're talking about pressing a button, this is an active effect you're having on the world. Whereas in algebra, it's a property of pure functions, mathematical functions. It means, if you capitalize the letters of a string twice, it doesn't matter. The first time is enough. Technically, if you apply F to a value, let's say F(x), it's the same as applying F to applying F(x).

You do the double application of F. It has the same effect as the single application. You could just say that it means duplicates don't matter. I pressed the button twice. The second one doesn't matter. If I apply the same function twice, the second time does not matter. The first time matters. The second time, the third time, the fifth time, those do not matter.

Why is this important? In a distributed system, especially in a distributed system, we have this problem where messages over the network are unreliable. Basically, if you send a message, it might not get there and you won't know. You cannot know if it got there.

Sometimes, you know if it didn't get there. You get some connection-broken message, but sometimes you just don't hear back. It timed out.

Did it get there and the acknowledgement timed out, or did it never get there? Did the other system crash? Did it crash before it sent my email or after it sent my email? You don't know. It crashed, it's too late. Email is actually a good example because you don't want to send the same email twice.

Let's say you have an email server and you send it a message saying, "Please send this email to my customer." You don't hear back. You just don't hear an answer. What do you do? What happened? Do you send it again?

What if it did already send the email? Is it going to send the same email a second time? If it didn't send it and I don't send the message again, then the customer won't get the email.

This is really a real business problem. Idempotence would solve that. If I could send the same message again, and it won't break anything, it won't have a second effect. Just like that elevator button, I could send this message all day. I could send it a hundred times, and the email would only get sent once. That's a good thing.

What it lets you do is decouple. It's decoupling the number of effects that happen with the number of times that you request that effect. I can request it a hundred times, but it will only get sent once. That is something that you really want to have. You want to be able to retry safely with limited information.

I don't know if this went through, but I'm going to try again. That is a very nice property to have in your system.

How do you implement idempotence? The simple way if we look at this email server, is you need some way of identifying the email. Some way of saying, "This is the ID of the email that I want to send. If I send you the same email with the same ID, an email with the same ID again, don't send it a second time."

The server that's receiving it has to remember all of the IDs of the emails it has ever sent. That is for total complete idempotence. Usually, that's not practical. You can't remember every ID because it could be in the millions. They could date back for many years. It is unlikely that you're going to get a request that takes years to arrive.

In a practical case, you might have a window that says, "Well, we keep three days of IDs." That means that you can resend the same ID within those three days, and we won't send it a second time. You have to find some practical limit that balances the memory requirements and the retries that you're doing in your system.

Notice, it's very important, this concept of identity is very important. If you don't have a concept of identity, what does it mean to send the same message again? If I want to send two emails to this person, I need to be able to send two emails to them. I need some way of saying that they're different. If I want to retry, I want some way to say that this one is the same as that one.

You need some identity on your requests. If you're looking at an elevator button, there's probably an identity, deep in the electronics of this elevator service. It knows what button I pressed. It's third-floor-up, or fourth-floor-down. There's some identifier for that button, which allows it to light up, first of all, and stay lit until it needs to be turned off.

That identifier is probably used in multiple places. It puts in a request, "Oh, we need an up-elevator on the third floor because we know that button and what it means." It's also used to say, "Hey, I'm already sending that third floor elevator. I don't need to do that again." It's using that identifier.

Let's recap on this. Wait, no, I did not say how. I did halfway of how. The first half is you need an identity. The second one is once you have that identity, you use a data structure with an operation that is already idempotent. A common idempotent data structure with an idempotent operation is a set, like an in-memory set.

If you have a set of numbers, you give each email a unique number. As the email server sends off emails, it remembers the number in a set, and just adds it to that set. If you add the set, add it twice, you've got idempotence already.

Same for an elevator. If you have a button that has an ID, let's say it's like the string third-floor-up, or third-floor-down, or fourth-floor-up, fourth-floor-down, you save that into a set that says it's active. It's been requested. That means you can send it twice, and it won't have any effect, to send it twice.

Now, of course, this does not take into account the actual action, the effect that happens by pressing that button, which is to send an elevator to that floor. Same with the email. It doesn't take into account sending the email or not sending the email.

To determine whether you want to send it, it's pretty simple. Before you add the thing to the set, the ID, you ask the set, "Do you contain this ID?" If it does, then you're done. If it doesn't, you send the email and then you put the ID in the set. There are other data structures that are idempotent. If you have hash maps, those are idempotent.

If you add the same key and value twice, then it has no extra effect. Another thing that you could consider idempotent is something like adding zero to a number. If you need some kind of idempotent addition, you can do that.

There are other data structures that have idempotence in them. They're more complicated and they're specialized usage. I'm not going to go into them.

Imagine they're like sets with other properties. You can add things in, but maybe they don't grow as fast as a set would grow. They're like more probabilistic, the kinds of data structures.

I mentioned strings being uppercased. That's something that's idempotent as an operation. You could use that if you need to write, like uppercase name means something different from regular case name.

That means that you could do it twice, and it wouldn't have any extra effect. You probably use this in something, like your email system might lowercase all email addresses before it compares them. What if it's already lowercased? That doesn't matter. It's just going to lowercase everything. If it's already lowercased, there's no problem.

Let's recap. Idempotence means duplicates don't matter. It's an algebraic property of certain functions, certain operations, but we extend it to actions in the world. We extend it to effects that we can have on the world, where we're saying requesting that effect twice is the same as requesting it once. Those duplicates don't matter there, also.

We need it in distributed systems so that we can have safe retries. It lets us decouple what gets done from how many times we request that it's done. You can easily implement it using idempotent data structures and operations. It requires a sense, a notion of identity in the messages.

Do yourselves a favor and look for some services that need to happen exactly once. Could be something like sending an email. Could be writing a message to a log. Could be some user setting in your user-panel, and wrap them in something like a data structure that makes them idempotent.

Do me a favor please and share this with friends. If you found it valuable, they might find it valuable, too. Also, if you found it valuable, you probably want to subscribe. That way you'll get all of the other new episodes as they come out. You won't miss that value that you have already discovered.

I like to be in deep discussions with smart people. Please email me. I'm eric@lispcast.com or get in a discussion on Twitter. I try to use Twitter as a discussion medium. I'm @ericnormand with a D there.

Also, you can find me on LinkedIn. I'm trying to get better at LinkedIn. It's a little hard for me. If that's where you like to connect, let's connect and start having a conversation.

All right. See you later.