How do we evaluate a data model?

Recorded by Eric Normand. Published: October 31, 2022.

This is an episode of The Eric Normand Podcast.

Subscribe: RSS Apple Podcasts Google Play Overcast

We talk about how you can evaluate the two parts of a domain model.

Transcript

[00:00:00] How do we evaluate a data model?

[00:00:03] Hello, my name is Eric Normand. This is my podcast. Welcome.

[00:00:10] I'm continuing to pontificate to think about these issues, uh, surrounding data models, data modeling. And I want to talk about how to evaluate a data model. Certain, some data models are better than others and there's, it is kind of complex and so I wanna work it out, talk it out, see if I can figure out how to, how to teach this.

[00:00:47] So my. recording. I talked about how there, I talked about the three, uh, three part model of data models. [00:01:00] Okay. There's the real world situation, which has a lot of information in it, is really comp. There's the conceptual model, which is much simpler, and there's the encoding of that conceptual model also that is what I'm calling the domain model, and they're in this weird relationship where you have to go through a conceptual model in order to go back to another model.

[00:01:34] Now, I also hinted at the fact that there are, because there's two steps you have to go through, you have to translate it into a conceptual model and then encode that state of that conceptual model. There's two steps, so there's actually two ways that it can go. Right or wrong. Right? It can go wrong because your conceptual model is bad and it [00:02:00] can independently go wrong because the.

[00:02:04] Encoding is no good for that conceptual model, and I want to tease these apart because I feel like, like I said before, the creation of the conceptual model is actually the hardest part, and I want to have a way to evaluate how good or bad it is

[00:02:28] in. Uh, this aspect, this idea that there's this conceptual model that can either be good or bad is one of the things that's often left out of the, the modeling, the, the modeling discussion. We often talk about the design of our code, the structure of the code, how things are laid out, and even. More [00:03:00] about how things are defined in terms of other things, like in stratified design.

[00:03:07] And that's actually a good paper to bring up because there's a, a, a launa, I would say, of why their, the model they show is so, They talk about it being good because it's stratified. And so you can change things at the different levels in the, in the different layers in the design. Uh, and so you get this flexibility of what kinds of things you can change depending on how you, what, what needs to change.

[00:03:45] Right. Which I, I mean, I totally agree with that. That's obvious. It's obviously true when you read it, but the

[00:03:59] [00:04:00] trouble is, or I'll give the example, they give this example using

[00:04:05] mc Esher style art Mc Esher paintings and drawings where it has a lot of tiling and scaling and rotations. They give a really cool example of, of a, of an encoding of that, that someone, someone programmed this system where you could build graphics that are similar in style to an Mc Esher by doing stuff like Flipp.

[00:04:44] An image putting two images next to each other, which makes a new image, scaling images, and then you could compose those up into other primitives, like, you know, gimme a four by a two, by two layout of the, [00:05:00] this image. I mean, that's all very cool. Uh, the, the trouble is, What was really important was not the fact that it was done with stratify design, the reason that does help the expressivity, and we do get into that in, in the domain modeling book in part three.

[00:05:22] It's got a lot of cool algebraic properties that make it work well as an api, but the real reason it worked so.

[00:05:36] was because they chose stuff like put two images next to each other, flip an image horizontally or vertically scale an image. They looked at Mc Cher drawings and analytically saw [00:06:00] that scaling is important. Tiling is important. It was not simply, it didn't simply fall out of the, the structure of the call graph, which is what the paper kind of implies.

[00:06:19] They actually found really useful septs that created a conceptual model of how Mc Esher paintings.

[00:06:32] And that is the hard part because a lot of people would say, Oh, I'm doing graphics. I need an array of pixels, and I need a way to plot a pixel and and change its color. And that does not get you closer, or it doesn't get you very close to drawing. MCs, shirt paintings. You have all your work ahead of you.

[00:06:59] And now [00:07:00] you're just looping through, you know, you're doing a for loop over, over some pixels trying to make MCs your paintings. You don't have any leverage. But if you identify this concept of. Ah, if I take two, if I take an image and another image and I put them next to each other, now I can do tiling and it, and tiling becomes, uh, an an easy to express thing.

[00:07:29] I'm gonna be able to tile so, so easily and make new things out of this tiling, but it's the identification of tiling as the important concept that is only facilit. By the rest, which is the, the style and the, the, the design of it. How, how things fit together, how they call each other, that kind of thing.

[00:07:56] So how do we

[00:07:59] [00:08:00] make it break this skill down? Uh, the skill. Identifying these important concepts in the domain and enco them into your program. How do we make that teachable? How do we take that skill and, and just maybe make sub skills that are each teachable? And one thing that I think is important is to be able to evaluate, uh, a given domain.

[00:08:37] To see if it really, if it is giving you what you need, right? If it is fitting the domain. So, uh, fit is one of the ideas a have fit as a quality that [00:09:00] you can take a conceptual model and the domain. Okay. And check them for fit. Does this, does this encoding really fit the domain to really fit the, the conceptual model?

[00:09:19] So a conceptual model to, to dive deeper into fit for a second. The conceptual model is gonna have a certain number of possible states that your system can be in. And this is. Um, well understood in modeling and mathematical modeling and simulations, there's just a certain number of states, you don't even have to know all of them.

[00:09:47] You just know that it's some finite number of states that they're in.

[00:09:51] Your encoding can also be in a certain number of states, and we went over that when you were talking. [00:10:00] product types and some types, uh, meaning combinations and alternatives, uh, collections. All these pieces have certain number of states that they can be in, and what you're looking for is the ideal, oh, let me put it this way.

[00:10:23] The ideal is perfect overlap between.

[00:10:28] Conceptual model and the encoding meaning there is exactly one state in the conceptual model for every state in the encoding, and there's an unambiguous way of translating back and forth between the two. Now that is the ideal fit. That's perfect. Every single state [00:11:00] can be represented in the encoding and every state that's possible in the encoding has a meaning in the conceptual model.

[00:11:12] Rarely do you have perfect fit, because often there's, there's a trade off with perfect fit and like reuse and verbosity. To get something perfect like that, you often have to make it very inconvenient to program in.

[00:11:33] We talked a little bit about this when we talked about collections. Um, it's often the case that you'll use a collection even though you only have, you know, max three toppings on your pizza. Right. You'll still use a collection which could have any number of ingredients, and so you have this thing that's like your [00:12:00] data structure can encode it.

[00:12:02] It can encode four toppings. It can encode a thousand toppings, but you are going to somehow limit it to three. And so you need to avoid this situation where you have a thousand. A thousand toppings in that collection. You need to avoid that because it doesn't have a meaning in your model. You can't make that pizza.

[00:12:27] Oh, and oh man, that's just brought up a, a big topic in my mind that I don't want to get to , I don't wanna get into right now, because you could argue like, Well, yeah, you, you know, your business says you, you have to have at most three toppings on your pizza, but this other business could have. And like I probably up to five.

[00:12:52] Why not five? Right. And sure, Yes. We won't get into that about how you deal with situations like [00:13:00] that. Uh uh, I want to, I want to put that off to another episode. Okay. So this is the idea of fit that you want a one to one. and there's an analysis of the kinds of failures that this, this mapping has. One is that you have states from the conceptual model that you cannot represent in your encoding.

[00:13:31] Okay? That seems to me to be a big problem. Like, I mean, just imagine there's some pizza that looks like you can order it, and everyone wants to order. But you can't actually write it down in your, in your system. That just seems really, uh, unfortunate and, uh, like a, a show stopping bug. Basically. The other way [00:14:00] is that you have, uh, things that you can write down that don't make sense as a pizza.

[00:14:13] So we already talked about the pizza with a thousand toppings, which is possible to write down if you're using a, let's say, an array for toppings, because there's nothing synt tactically stopping you from that. You would probably have to do some kind of run time check unless you had I, I don't know, some special type system, like dependent types.

[00:14:40] Um, and so you have this problem of something that you can write down doesn't make sense. It's basically an error to have that. And so there's these two problems and you can [00:15:00] imagine the Venn diagram where you have your conceptual model states and your, uh, encoding states. And there's, you want a big overlap with, you know, nothing left out.

[00:15:12] So you don't want anything where it's just in the conceptual model, but not in the encoding. And you don't want anything where it's just in the encoding and not in the conceptual model. Now, if it were me, in my experience, it's better to have stuff that you can encode that doesn't have. then to have something that has meaning that you cannot encode, right?

[00:15:40] If there's a pizza that you want people to be able to order, you better be able to write it down and it's better to err on the side of having and things that you can encode that don't really mean anything and just throw an error. It's better to be on that [00:16:00] side if, if you have to make the.

[00:16:02] And we often do for expediency, or we're inheriting an existing model and we can only modify it in small ways. We can't, you know, replace it. And so we're gonna have to deal with like, Oh, when we had this new feature, it's gonna, it's gonna make these other states possible to write down, but we don't. Math that they don't mean anything.

[00:16:25] We'll never translate them to anything, and we'll somehow prevent it from happening in the code. Okay, so that's, that's just simply the fit. Okay. There's another dimension which I've mentioned briefly, which is that the conceptual model, the mapping to the domain model might be really awkward. , right? There might be, it just might be like really hair [00:17:00] code, um, might be awkward for the programmer to write, but also awkward for, uh, the computer.

[00:17:10] You know, there's, it could be like, Oh, this is a n cubed algorithm to write this thing down, and it's just not accept. Right now, we talked about some awkwardness before. Uh, well, no, maybe we didn't talk about it. I was just writing it in my book. Yeah. So like, one thing is like if there's, if you can make 1500 different pizzas, you could just give each one a number, Right.

[00:17:36] Of one to 1500. And now you're, um, Now you're, you're done, right? You have a state for each thing. You have like perfect fit. If you just assume that you know this number, uh, won't, you know, you have like runtime checks to make sure you're not using a number that's too high and [00:18:00] too low, but it's just inte integers, so you just have a number for each one.

[00:18:06] So you have perfect fit. But it's super awkward. It's awkward to do, You know, let's think of some specific. How does someone order what they want? Uh, it doesn't, it doesn't really capture any of the important ideas of, of the pizza. Like, um, it doesn't capture that a pizza has a choice between three different sizes.

[00:18:30] It just kind of throws a number there. Uh, it makes it really hard to. ask questions like how many small pizzas were ordered and how many large pizzas were ordered. Well, there's nothing in a number that tells you if it's small, medium, or large. How many pizzas had mushrooms? Right? Is this a popular topping?

[00:18:54] Like these are, are really hard to answer and I, I would just call it awkward. You [00:19:00] could do it, but just. the code to have to write all these, these, um, these questions, these queries. And if you did it, you would probably be doing the work of, of modeling it anyway, , right? Like, you'd probably have to say like, Well, I need a table of all the ones with, with, uh, tomato sauce on them.

[00:19:28] Uh, how do. Like, uh, so like you're already thinking about the important concept of tomato sauce and doing some modeling work. Okay. Uh, so those are the two things. It's like fit and convenience, right? Uh, there's other things like simplicity, precision, I think fit is. Probably [00:20:00] the, the best, most general purpose one.

[00:20:06] I'm just gonna stick with that. I mean, for now, unless I find something better. But this idea of, of fit, I mean like you just imagine the, the Venn diagram, the, the overlap between the two. You know, you want the overlap to be 100%. And yeah, no, I think that that's pretty good. And then the convenience, like how hard is it to write?

[00:20:32] It actually hints at the level two thinking where you start talking about operations. Because if the operations are so important for seeing if something is convenient, just start with the operations, right? That that's where you want to. Uh, but it's, it's just a hint cuz you still need to be able to date model and it is more concrete.

[00:20:55] So we should start there. Okay. Now for the hard part, , [00:21:00] how do you know that? I mean, in our case, in this example, the pizza model is kind of already given to you, the conceptual model. , Right? A small, medium, large. Like everyone's been to a restaurant where things are already divided up in that way, uh, to see the, to, to order stuff, right.

[00:21:27] Um, but I've also been to pizza restaurants where all the pizzas are. Named, They all have names, right? So it's not so clear that there is a system to them, and often there isn't because there's like little, little contextual things you need to know, uh, about each one. Like sometimes they'll just give the name and then some of the, like a little description, but it doesn't tell you that there's no cheese on this pizza.

[00:21:59] Like [00:22:00] on the marinara, there's no cheese. They don't tell you. Hm. They, the, they just say, Oh, it's got, you know, basil and, and tomato sauce and you're, you know, there's nothing in there that says no cheese. Um, so it's maybe, it's not so obvious when you're starting a new pizza restaurant that there is a system like you don't, maybe you don't want a system and you do want 10 named pizza.

[00:22:32] just like 10 numbered pizzas like we talked about before. Maybe with 10. It's not so awkward, so it's fine. But we have this system where we have 15 hundreds. It's already pre-done where you could choose three sizes, different kinds of sauces. Whether you want cheese or not, and four different toppings that you can put on them.

[00:22:56] It's got a ton of different states it can be in, and so someone [00:23:00] had to do that. Someone had to conceptualize what are the things that distinguish one pizza from another. And what is their structure, right? They had to think, Well, you can't have a pizza that's both small and large, right? You can't have tomato sauce.

[00:23:26] That that one's kind of obvious, right? A baby would know that, but what about you can't have tomato sauce and pesto sauce, right? It is, that's not so obvious, but someone made that rule. Someone put that structure into the menu, into the conceptual model, and that had to be, uh, that had to be decided somewhere.

[00:23:57] And so what is the process of [00:24:00] deciding this conceptual model of figuring it?

[00:24:05] What's the process of, of finding that? Wouldn't it be nice if we could make an image that's two images next to each other? Wouldn't it be nice if we could make an image that was the flipped version of another image? Because look how much mc Esher flipped.

[00:24:28] This is the kind of thing. It just kind of, it seems the most magical, and I am, I've, I'm trying my best to come up with ways of, of doing this. One skill I think that's useful that we don't do enough is to try out different models. , Right. Don't stop at one. Don't So stop at [00:25:00] one. Yes. That's a big problem. But often people, uh, will even just start with something that's way too obvious or way too done before, or, or let's even say way too.

[00:25:15] because they're afraid they'll get stuck. So one example would be, well, let's not, let's not, uh, be too specific yet when we're first starting, let's just make an image, be an array of pixels. , and then we'll just write into that array. W the stuff we need an array of pixels is perfectly general, right? And we'll make it mutable so that we can, we can write on it in any way.

[00:25:50] And so they've baked in this, um, model that does [00:26:00] not give you any leverage because they're afraid to be. They're afraid to. They're afraid to, to paint themselves into a corner, but really you have to make decisions. When you're domain modeling, you have to, You have to say, yes, technically, physically, we can put a spoon of tomato sauce and a spoon of pesto and spread it out.

[00:26:29] Yes, we can do that, but we're not going. Yes, we can flip images and draw them, you know, basically draw them backwards when we are drawing an image onto a 2D array of pixels. , but we're not going to do that. We're going to represent the picture at a higher level, not as pixels on a grid, but as some transformation of another picture.

[00:26:59] [00:27:00] Uh, and you have to make that decision. And at some point you might regret it , right? But that. How, you know, that's, I don't know. I don't know what to say. You gotta take the risk because the, there's the other risk, which is that your code will be so, um, hard to write because you haven't built in the leverage that you're going to, You're gonna regret not.

[00:27:41] It's built in the leverage from the beginning. Okay. The, the other one that I wanna bring up is the idea of a metaphor. Uh, I've given a talk about this, about building, I call it building composable abstractions. I gave the talk at the, [00:28:00] uh, closure conge in Austin. When was that? 2016. And the idea was that there's.

[00:28:11] There's this magic that happens when you choose a metaphor that you are able to talk about it instantly. It's like you have a first implementation of a thing because the metaphor is the implementation. You can kind of run it in your head. You can compare it against what you're seeing and be surprised if the thing you're, you've.

[00:28:35] Doesn't work as you expect it, and you can answer questions about it and it can be shared with other people cuz they can build the metaphor as well. So if you build this metaphor of how your system is supposed to work, you're tapping into your physical intuition about things. Now this won't always work.

[00:28:58] I think that [00:29:00] that making this metaphor, Often happens and often it doesn't, Like it doesn't work for two reasons. One, uh, people aren't that good at it. So it's like, okay, now it's another skill to teach. And two, it's not something that you can guarantee that the metaphor will be Right. Right. It's, it's mainly as a way of generating ideas for structures of.

[00:29:33] And one more thing that I kind of hinted at, but I wanna make it more explicit, is I believe that getting good at encoding, so that you can go directly to your programming language faster, will help you have a richer feedback loop, a [00:30:00] faster, richer feedback loop that can help you do the hard work of coming up with a conceptual model of analyzing that conceptual model.

[00:30:11] So going all the. Fully to the, to the data model will help you do that. And also the stuff we learn in level two and level three where we do operation modeling and then algebraic modeling, that too will help. Yeah, maybe go even faster. It's just harder to do. So you have to, you have to build up those skills first.

[00:30:39] All right. I'm going to leave it at that and I've exhausted all my ideas. Uh, if you have a better idea, please let me know because this is, um, this is really hard, hard work. Um, well, my name is Eric Norman. Thank you for listening [00:31:00] and as always, rock on.