Eric Normand: It was obvious in 1971 and even in 1958 that AI program suffered from a lack of generality. It is still obvious, and now there are many more details.
Hello, my name is Eric Normand. Welcome to my podcast. Today we’re reading from “The 1971 Turing Award Lecture” by John McCarthy. It’s a little more complicated than that, but he won in 1971 and he gave a lecture. It wasn’t published in 1972. Usually, they’re published the next year.
I guess he didn’t like what he said. He didn’t like how the talk turned out, so it was never published. He did finally publish something 15 years later, so quite a long time afterward.
In this talk, he’s trying to give a feel for what he was trying to say in 1971, plus with some updates. An oddball that we haven’t seen something like that yet, where it’s actually from much later.
We’re going to get into it in a moment. Before we do, I want to talk about my new book, “Grokking Simplicity.” Look, I’ve even got the T-shirt on, if you’re watching the video.
Grokking Simplicity is all about functional programming. It is for people who know at least one programming language and have a couple years, maybe three years of experience working on commercial software and know some of the pitfalls in it. It talks about functional programming.
I hope it is a book that starts a conversation, a discussion in the literature for commercial software. There’s a lot of academic books on functional programming, but I think we need to start talking about how it applies in the industry. Please go check it out.
It’s available on manning.com and also amazon.com, and probably other stores but I’m not aware of where else it is. I’d love to hear what you think about it. Please leave a review. If you like it, tell your friends and thank you for buying it.
I usually read from the biography, I’ll read a couple of things, I didn’t find the biography to be that insightful, but there are some facts that I like to look at, so he was born in 1927 in Boston and he’s about eight years older than the last Turing laureate who was born in 1919.
The next generation, he could have had that person as his PhD advisor or something. He was a mathematician and I look at him mainly as a logician, trying to find a logic that was appropriate for using on a machine to reason about the world.
In the way that we humans can make judgements about how the world must be or we can plan some set of actions to make something happen. For instance, he uses the example of stacking blocks, if you want to get block b on top of block c, first you have to take block a off of it and then move this over here and take this out of the box and then put it on top.
There’s all these actions you have to take in order to make the situation the way you want it. Computers are bad at reasoning about that.
His talk is mostly about that. It’s about all these systems of logic trying to solve this problem of generality. It’s easy to construct a very small logical system that is consistent, that can solve tiny little problems but then you start introducing new ideas into it, and it quickly explodes. It’s not a scalable solution.
He has worked at Massachusetts Institution of Technology, at Stanford. He’s worked with Marvin Minsky. He is the inventor of the term “artificial intelligence” very early in the field. He also has worked with, did I say Marvin Minsky? He also is the inventor of Lisp.
I didn’t know this, but the biography says something interesting that 16 Turing awards have been given to people who have been affiliated with the Stanford AI Lab, which is the project that McCarthy started. It talks a lot about all the students that came under him. This is a big part of his legacy. He is the first person to write a computer chess program.
Very early in artificial intelligence, a very important figure in computer science. Another thing that’s not mentioned in here is that he was instrumental in getting the if-then-else statement put into algal. Thank you for doing that. Before, they would have done some GoTo. He was in favor of something more structured.
Of course, now, if-then-else is common in all programming languages. Let’s get into the Turing Award lecture itself. Like I said, on the top of the page it says, The 1971 Turing Award Lecture, but he didn’t write this until 1986. I’m inferring, I’m presuming that he gave the talk in 1971.
He was trying to summarize this idea that generality and artificial intelligence and couldn’t do it, and didn’t like the way his lecture turned out. Then, [laughs] he, 10 or 15 years later, he decided, “Oh, I guess I better finally write that up.” They allowed him to publish this 15 years later.
It tries to summarize the approaches that have been attempted in the past to deal with this problem of generality. The problem with generality was not well understood. They knew that it was a problem, but they did not understand. This is the thing I read at the beginning. I’ll start.
“It was obvious in 1971 and even in 1958 that AI programs suffered from a lack of generality. It is still obvious, and now there are many more details.” You remember, this was written in 1986. For 30 years, they’ve been trying to solve this problem. Also, I want to say, I have so much to say on this. I have a Master’s degree and I studied Artificial Intelligence.
I have thought about a lot of these problems, even decades after this was published. There has also been like this resurgence in the last 20 years of artificial intelligence, no, not even 20 years. Let’s say 10 years or less in artificial intelligence and more of a neural net machine learning approach.
There’s a lot more happening going on than what was happening back then. There’s a lot to say. I’m going to try to bring my understanding to the field. Unfortunately, a lot of this, it seems the stuff that he’s going to talk about, if you’ve done any reading on it at all, is going to seem naïve, at least from our perspective now.
The reason it seems naïve is because by doing this investigation that the field of artificial intelligence was doing — trying to do stuff in a computer that normally only people can do — they brought up a lot of problems with the way we perceived ourselves and how we thought we made sense of the world.
This is an ongoing process and the effect that artificial intelligence has had on cognitive science, on psychology, on neuroscience is profound. Looking back at it, it often seems like, “Wow, people were really had no idea about how we thought.” It’s true that they discovered it.
They had all these challenges we think we’re doing…We think chess is this really hard thing, but it’s actually one of the easiest things we do for a computer to solve. I hope to interject with lots of little stories and stuff. The problem is I might forget them because to me, I’ve lived in it so long I often forget that it’s not common knowledge.
OK, I’m just going to get started. Let’s go. The first gross symptom of this lack of generality is that a small addition to the idea of a program often involves a complete rewrite, beginning with the data structures. Some progress has been made in modularizing data structures but small modifications of the search strategies are even less likely to be accomplished without rewriting.
He’s talking about how when we’re dealing with stuff in the world, we’re always learning new little facts, little things about what’s going on. These are small changes, but in software it often requires you to just start over from the data structures, the basic ideas that you’ve got encoded in your program.
We need to modularize this so that we don’t have to keep changing our data structures at least. Then, of course you’ve got different data structures. You’re going to need search strategies that are different because one search strategy that works on this data structure is not efficient so you need one that works on the new data structure. It’s not an easy problem.
Another symptom is that no one knows how to make a general database of common-sense knowledge that could be used by any program that needed the knowledge. We don’t know how to just put facts of common-sense knowledge — birds fly, objects fall when you release them — all that little stuff that even five-year-olds know. We don’t know how to put that into a database and make it useful.
When we take the logic approach to AI, lack of generality shows up, and that the axioms we devise to express common-sense knowledge are too restricted in their applicability for a general common-sense database. In my opinion, getting a language for expressing general common-sense knowledge for inclusion in a general database is the key problem of generality in AI.
This is 1986, remember, and he’s making this strong claim that we need a language to express general common-sense knowledge and we want to put that into a database. That is the key problem and if we solve that problem, we’ll unlock the next wave of problems, the next challenges on the way to generality in AI.
Now 35 years later, we do have some common-sense databases. There’s a thing called Cyc, C-Y-C, that someone had just been writing. They hired a team. They were writing common-sense statements into a database with the idea that when you hit a certain quantity of them then there would be a qualitative change in the kinds of reasoning you can do.
It’s unclear at this point whether that’s actually helpful. There are attempts at basically reading everything — reading books, Wikipedia, Web pages — and trying to get these facts out, not have a human involved to get the facts out but getting them out automatically.
It’s not clear that that’s going to lead anywhere. Once you have them, you have to represent them and that’s what he’s talking about here. Then a language, you have to be able to write them down in some format that’s convenient and efficient for search and then you do this big search with inference and stuff.
The engine that infers over these statements would also need to be devised. It’s not clear that that’s going to help at this point in 2021. OK, but he’s going to go over the problems and I think it’s very useful to understand this problem more deeply.
Friedberg discussed a completely general way of representing behavior and provided a way of learning to improve it. Namely, the behavior is represented by a computer program, and learning is accomplished by making random modifications to the program and testing the modified program.
The Friedberg approach was successful in learning only how to move a single bit from one memory cell to another. Imagine, this was back in ’58, ’59, you have this program that can solve some problem and you want it to learn to do it. You want to improve, you want to make it better. You make random changes to the program and imagine this is in machine code.
You change some bytes and then run it and see if it’s better. A very naïve approach, but you got to try those, [laughs] in case they work. It was shown by Simon to be inferior to testing each program thoroughly and completely scraping any program that wasn’t perfect. No one seems to have attempted to follow up the idea of learning by modifying whole programs, so it didn’t work.
The defect of the Friedberg approach is that while representing behaviors by programs is entirely general, because we know that software is Turing complete, modifying behaviors by small modifications to the programs is very special.
You’re trying to make this general change, or you’re trying to solve the problem in general of being able to learn any new behavior by these random mutations and selection.
You want some particular thing. [laughs] You wanted to learn some particular thing, and random and particular don’t really go well together.
A small conceptual modification to a behavior is usually not represented by a small modification to the program, especially if machine language programs are used and any one small modification to the text of a program is considered as likely as any other.
Friedberg’s problem was learning, from experience, all schemes for representing knowledge by programs suffer from similar difficulties when the object is to combine disparate knowledge or to make programs that modify knowledge.
It didn’t work. We needed to try. There was, at the time, the notion that we have this new thing called programs that seem to have at least the potential for solving any problem. It requires human ingenuity to craft the solution as a program, to write the program.
So far, any program we’ve attempted has either been solved or we could see how we could do it if we had more resources, bigger RAM, faster processing. We can’t do it yet, or it’d take too much time to write the program.
Plus, that with the Church-Turing thesis about this universality of the Turing machine, and this was it. This was all we needed to do any kind of computation, this sense that we have this thing that’s universal now. Can’t we just make small changes to this program and it’ll learn new behavior?
Turns out that representation mattered. He talks about this before, that it’s about the language you need to express the statements. Machine code, even though it’s universal, is not the best way to express this kind of knowledge.
“Allen Newell, Herbert Simon, and their colleagues first proposed the General Problem Solver in 1957. The initial idea was to represent problems of some general class as problems of transforming one expression into another by means of a set of allowed rules.”
In my opinion, GPS was unsuccessful as a general problem solver because problems don’t take this form in general. [laughs] Most of the knowledge needed for problem solving and achieving goals is not simply representable in the form of rules for transforming expression.
However, GPS was the first system to separate the problem-solving structure of goals and subgoals from the particular domain. GPS was an interesting attempt. Herb Simon, very interesting guy. He won the Turing award and the Nobel Prize.
Herb Simon was the guy who came up with the notion of satisficing, that humans don’t optimize, they satisfice. He won the Nobel Prize in Economics for that. We’ll get to him when we get to his award.
He was solving this problem of generality by taking expressions and having transformation rules that applied to expression, so pattern match on the expression and say if this is true, then these other things must also be true.
You have this huge set of allowed transformation rules, and you keep applying them recursively. You should be able to generate, basically, giving you enough time, all true statements that derive from that rule.
The problem was that these transformations are not enough. Most problems are not solved by just simple transformations. You need more ability to…We’ll get into this, but you need this ability of having variables and talking about things in general and not just specifics.
He says that it’s important, because it separated the idea of an engine for problem solving that was separate from the domain.
Now he’s going to talk about production systems. He is summarizing all these attempts and where they failed.
“Production systems represent knowledge in the form of facts and rules. Unlike logic-based systems, these facts contain no variables or quantifiers. New facts are produced by inference, observation, and user input.
“The result of a production system pattern match is a substitution of constants for variables in the pattern part of the rule. Consequently, production systems do not infer general propositions.”
Again, we’re talking about the generality problem. They can’t learn like these general problems. They have to work on particular. For instance, you could have a generate like move block C, because C is on top of B and you want to move B, so you have to move C first. Move it off of B.
It can’t figure out in general the rule, “Hey, if something is on top of it, if you want to move a thing…That’s a variable. If you want to move a thing and there’s something on top of it, move the thing that’s on top first,” like that’s a general rule, and it can’t ever deduce that.
He has another example. “For example, consider the definition that a container is sterile if it is sealed against entry by bacteria, and all the bacteria in it are dead. A production system (or a logic program) can only use this fact by substituting particular bacteria for the variables.
“Thus, it cannot reason that heating a sealed container will sterilize it given that a heated bacterium dies, because it cannot reason about the unenumerated set of bacteria in the container.”
He’s going to talk about that more later. He’s going to bring up the same example later. Just to explain again, it can only reason about…you could say it like this. You can reason about one bacterium at a time. Heating X will kill it if X is a bacterium.
OK, but it can’t say, “Oh, therefore, if I heat this dish, all the bacteria in it are going to die and so I should heat the whole dish.” It can’t make that leap to a set of bacteria that are contained in it.
Something has to like list out all the bacteria in it and then it can decide, “Ah, yes, there are no more bacteria, because I’ve heated each one.”
“Representing knowledge in logic. It seemed to me in 1958 that small modifications in behavior are most often representable as small modifications in beliefs about the world, and this requires a system that represents beliefs explicitly.
“The 1960 idea for increasing generality was to use logic to express facts in a way independent of the way the facts might subsequently be used.”
I want to pause here. Remember, he’s a mathematician, he’s a logician, and so he’s approaching this as we will have some statement of fact about the world, and we’ll use that in different ways.
We’re not making a production rule, which was the last section we talked about that has a very specific use, like to move this block from here to there. It’s much more general, like moving a block changes its location and that’s a declarative statement about the world.
For all A, if A is a bacterium, heating it will kill it. Then that is a statement about the world, that’s not a statement about any particular bacterium.
“It seemed then, and still seems that humans communicate mainly in declarative sentences rather than in programming languages for good, objective reasons that will apply whether the communicator is a human, a creature from Alpha Centauri or a computer program.”
He’s making a claim about the universality of the usefulness of this, that speaking in declarative sentences is more useful than describing a program.
Just as an example, I know that when I was a kid, we had this exercise of having to describe how to make a peanut butter and jelly sandwich. You had to give all these instructions, place the jar on the table, open the jar by twisting the lid counterclockwise, open the drawer, find a knife, take it out, close the drawer and if you forgot…then the teacher would enact it.
If you forgot a step, like your program would have a bug and the teacher could not finish the sandwich but you could describe making a peanut butter sandwich that way. Or you could have a declarative statement like, “A peanut butter sandwich is two slices of bread with a layer of peanut butter between them.”
That might not be the best way to represent it but if you say it like that, the person on the other side that’s hearing this statement is an intelligent person, has intelligence and can figure out all the steps themselves of how to make the sandwich.
It reminds me of the difference between American-style recipes, which are very step-by-step versus French-style recipes, which are a little paragraph about what’s in it. [laughs] I think it’s really telling that the French-style recipe…It’s crazy looking at the difference because they are very different.
You have to know a lot more about cooking to do the French-style recipe whereas the American style is made for someone…They know how to turn on a stove, but they don’t know how to sauté an onion. There’s things you can assume that the Americans know, but there’s things you can’t, whereas in the French they assume a lot more knowledge about what’s going on.
They’ll say, “This is a dish of…” It’ll even say, “Cook sautéed onions, carrots, potatoes, etc., together in a pot until tender. Serve in a Bowl.” It’s like that general of a statement of a recipe, whereas in the American it would be like a whole page.
It would say, one medium onion, finely chopped. It’ll say, heat the pan on medium heat, add oil to the pan, put the onion in the pan, and stir occasionally until the onion is translucent. [laughs] It’s just the details that you have to write in there are crazy.
If you can assume knowledge on the other side, you can assume that the agent on the other side can do some deductive reasoning and has a similar set of knowledge, experience, and skills in the world that the declarative sentences are much better for communication and for getting things done because it can work in multiple situations.
OK, let me continue. “The advantage of declarative information is one of generality. The fact that when two objects collide, they make a noise may be used in particular situations to make a noise, to avoid making a noise, to explain a noise, or to explain the absence of noise.” In theory, the idea is that I can write one statement that can be used for all these different situation.
It is a true statement, is generally true so I can save time as the programmer, because I’m writing just one statement, the computer can use it for multiple situation. “Once one has decided to build an AI system that represents information declaratively, one still has to decide what kind of declarative language to allow.”
Now you have this new problem of writing a language that can express this. “Every increase in expressive power carries a price in the required complexity of the reasoning and problem-solving programs.” You go from the simplest systems that are just constant symbols, then you add some predicate symbols, some variables.
Now you’re in first-order logic, and it’s just really hard. You need a much better engine for running these programs.
He talks about, Prolog is this local optimum. It’s interesting because in 1971 Prolog didn’t exist, but in 1986, it did. This is something that he could look back on. He says, “Prolog represents a local optimum in this continuum because Horn clauses are medium expressive, but can be interpreted directly by a logical problem solver.”
One major limitation that is usually accepted is to limit the derivation of new facts to formulas without variables, that is to substitute constants for variables and then do propositional reasoning. It appears that most human daily activity involves only such reasoning.
People aren’t doing science all the time when they’re going about their day. They’re not inferring what goes up must come down, which is a universal statement with a universally quantified variable.
They’re saying if I drop this apple, it’s going to hit the ground. Everything is instantiated to constants. This apple, that ground, it’s going to fall.
“A Prolog program can sterilize a container only by killing each bacterium individually and would require that some other part of the program successively generate the names of the bacteria. It cannot be used to discover or rationalize canning, sealing the container and then heating it to kill all the bacteria at once.”
It has the same problem, that it cannot generalize. This is something that we do all the time. It’s not science exactly, but we can do these specific quantifications.
You can say stuff like all the people in the room, and there might be 20 people in the room. We don’t have to list them all. We can infer things from that in a way that might not be well expressed in predicate logic.
In logic, you would have to say, “For all people, if the person is in the room, then they heard me talk”. This is the way you would express that in logic. Notice that you have this “for all”, this variable that’s quantified over all people. Then there’s a conditional in there.
When we say for all the people in the room, all the people in the room heard me, we might not be doing that. That might just be a shortcut, and then we have some heuristic that’s like, “Were they in the room?” Yes? No?
It’s not this thing that we’re applying to all people and then narrowing it with an if statement. This is my interjection here. This isn’t what he’s saying.
Now I’ll read from his paper. “My own opinion is that reasoning and problem-solving programs will eventually have to allow the full use of quantifiers and sets and have strong enough control methods to use them without combinatorial explosion.”
Eric: “While the 1958 idea,” the one of this logic, “was well received, few attempts were made to embody it in program in the immediately following years. I spent most of my time on what I regarded as preliminary projects, mainly LISP. My main reason for not attempting an implementation was that I wanted to learn how to express common sense knowledge in logic first.”
He wanted to do the expression first before making the logic engine. You can see why, as a logician himself, he could sit there and work out simple problems himself and know whether a language was expressive enough. Why should you write the engine first and then later learn this doesn’t do what I need it to do?
It’s much better to work out a few problems on a piece of paper, learn that there’s some missing gap in this inference and we can’t figure out why these rules don’t work. We need some universal quantifier here, that kind of thing. You can work it out before you make an engine to do the deduction.
“McCarthy and Hayes made the distinction between, this is a hard word for me, epistemological and heuristic aspects of the AI problem and asserted that generality is more easily studied [laughs] epistemologically.
“The distinction is that the epistemology is completed when the facts available have as a consequence that a certain strategy is appropriate to achieve the goal. Whereas the heuristic problem involves the search that finds the appropriate strategy.”
This is what I was saying that there’s two problems. One is having the information encoded in the right way, that’s the epistemology of it. Then there’s the heuristic aspect, which is actually taking all that and doing a practical search to generate the desired knowledge. That’s what I was talking about, where you could do it on paper, just by working on the epistemology of it.
“The common-sense information possessed by humans would be written as logical sentences and included in the database.” He’s talking about his own work here, his own approaches. “Any goal seeking program could consult the database for the facts needed to decide how to achieve its goal, especially facts about the effects of actions.
“The much-studied example is the set of facts about the effects of a robot, trying to move objects from one location to another, this led in the 1960s, to the situation calculus, which was intended to provide a way of expressing the consequences of actions, independent of the problem.”
“Oh wait.” You have a database and you put a bunch of facts about it, about the world in it, especially putting facts about the consequences of actions, so this might be like, if you paint something, it changes its color, right? Something like that.
He has this some…it’s not code it’s like a formula, S’= E*S, so you have a situation S the current situation, some event E happens, and the result is a new situation S’.
Eric: Notice that the situation calculus applies only when it is reasonable to reason about discrete events, each of which results in a new total situation, continuous events and concurrent events are not covered.
I’m not going to read all the code that he has here, the axioms, but one thing that’s clear is that you have to state quite a lot of stuff, like the result of moving a thing that’s at position X is to Y, is that it is now in Y, things like that.
“The facts that were included in the axioms had to be delicately chosen in order to avoid the introduction of contradictions arising from the failure to delete a sentence that wouldn’t be true in the situation that resulted from an action”, so it was very tedious.
You have to say everything, that’s still true, all the things that are not true anymore, anything that has changed, you had to rewrite the whole description of the situation in every rule, and if you added a new thing, like a new piece of information, that was being tracked. You had location and you had color, and then you had some other, so you have those two things.
You describe all the rules, like moving something that doesn’t change its color and painting something doesn’t change its location, right? You say all that stuff, but now you add a new thing, like it’s rotation, so you can flip the blocks.
Now you have to say, flipping the block, doesn’t change its location or its color and changing and painting it, changes its color but not its location and not its orientation, so you have to start describing everything that changes and not changes it’s very hard.
A problem with the situation calculus axioms is that they were again not general enough. This was the qualification problem, putting an axiom in a common-sense database asserting that birds can fly.
Clearly the axiom must be qualified in some way since penguins, dead birds and birds whose feet are encased in concrete can’t fly. He’s describing this new problem that he’s calling the qualification problem.
You can make a general statement like, “Birds can fly.” If you send that to a person, they’d be like, “Yeah, sure that sounds right,” but then you’re like, “But what about dead birds. Huh, did you think of that?”
Then you can [laughs] say, “Huh, what about if I cut their wings off?” Aha. [laughs] Then you’re like, “Well, but you didn’t say that. You didn’t say that this bird was like tied down or in a cage,” so he needed to invent this new thing, because you can always come up with all these exceptions.
“Formalized non-monotonic reasoning provides a formal way of saying that a bird can fly unless there is an abnormal circumstance, and of reasoning that only the abnormal circumstances whose existence follows from the facts being taken into account will be considered.”
He’s trying to formalize this thing that people do. If I say, “Birds can fly,” then later I tell you…and you’re like, “Yes, that’s right.” Then I say, “What about a penguin?” Then you say, “Well, you didn’t say penguins.” Penguins are special. Most birds can fly, penguins are special. You don’t have to list all of the exceptions.
He’s trying to formalize this. If you were describing a situation, if you leave out all…You have to make it possible to leave out the details, because you can’t describe everything about the situation. If you leave out a detail, like it’s a penguin, that’s OK.
You’re just going to assume it’s not a penguin. You’re going to assume that the birds can fly, it’s true, and you don’t have to know all the details of its wings and whether it’s healthy and whether it’s alive. You can leave that stuff out and the reasoning can continue and infer the general case without the exceptions.
The frame problem is another problem that he’s going to describe. The frame problem occurs when there are several actions available each of which changes certain features of the situation. Somehow it is necessary to say that an action changes only the features of the situation to which it directly refers.
When there is a fixed set of actions and features, it can be explicitly stated which features are unchanged by an action even though it may take a lot of axioms. If additional features of situations and additional actions may be added to the database, we face the problem that the axiomatization of an action is never completed.
Let me explain this problem here. This is the frame problem other researchers have defined it differently, it’s similar but this is his…He was first, so he’s defining it how he defined it. You want to make this database and we talked about this before. If you put…painting a block, changes its color, you also have to describe…It does not change its location.
It does not change its orientation. Let’s say you have a complete set of all that. It’s all well described. Now you add another variable in there that the system is keeping track of, you have to go through all your rules and add this.
Of course, your rules are growing because you’re adding this new variable that can change, you’re increasing the amount of stuff you have to put in the database with each addition of a thing. Obviously not scalable, you’d never finish. You need meta rule that says, “Well, if I don’t mention location, it’s not going to change. Just use the same location as before.”
If I mentioned paint and the paint changes, then it changes but if I don’t mention paint, it doesn’t change or the color. We need some way of solving this frame problem. He’s got the situation calculus, he’s showing that your statements, your axioms become a lot less wordy because you have this thing called AB aspect.
It’s a way of describing, there’s a bunch of stuff that’s there, that’s not changing. This treats the qualification problem, because any number of conditions that may be imagined as preventing moving or painting can be added later and asserted to imply the corresponding AB aspect.
It treats the frame problem in that we don’t have to say that moving doesn’t affect colors and painting location. Now when you add a new axiom and a new thing that can change about the situation, you don’t have to change all your existing rules. That’s very important.
Even with formalized non-monotonic reasoning, the general common-sense database still seems elusive. The problem is writing axioms that satisfy our notions of incorporating the general facts about a phenomenon.
Whenever we tentatively decide on some axioms, we are able to think of situations in which they don’t apply, and a generalization is called for. Moreover, the difficulties that are thought of are often ad hoc. That of the bird with its feet encased in concrete.
Ad hoc, meaning you don’t have a rule. If you say all birds can fly and you say, bird A has its feet are encased in concrete, does the system have a way of representing that that means he’s too heavy to fly? [laughs] Now you need a whole bunch more statements and axioms in your database. It’s exploding the size of the database needed.
I want to interject here, at this point, because I think we’ve gotten enough of a sense. We’re almost at the end now, but we have a real sense of the difficulty of this logical approach.
Back when AI started, this is a caricature of what was thought at the time. It seems naive now, so it’s hard to describe it without making it seem so cartoonish.
The idea was smart people reason through situations, and logic is something that smart people have learned how to do. It seems to be able to describe all these situations.
Therefore, machines can do this basic logic because it’s very formal. That might be a good way to make a computer be able to think, reason, that thinking is basically reason. That characterizes it well, thinking is reason.
Another caricature was if we’re trying to make intelligence, intelligent people can play good chess. You have to be pretty intelligent to play chess. Let’s make a computer who can play chess. That way we will be simulating intelligence.
On the flip side, they thought the stuff like walking and manipulating blocks with a robot using a camera was a simple problem. It must be easy, because even a two-year-old can do that.
It turns out that that was exactly backward. The stuff that a two-year-old can do turns out to still be pretty hard. We’re making good strides these days in it, because we have a lot more experience with it and faster computers.
It turns out that that stuff that two-year-olds can do is actually the hardest stuff. Two-year-olds can recognize shapes, and people, and can talk about it and make simple statements about the world. They can run around. They can jump and climb. In unknown situations, there’s all this stuff that’s really hard.
This is the kind of thing that AI was discovering like, “Hey, wait a second. We thought chess was hard because it’s hard for people.” It takes a lot of study to become good at chess, a lot of practice, but anybody can learn to walk. It turns out it’s the opposite that the current best explanation for it is that, actually, walking is something that took millions of years to evolve.
This kind of reasoning that we do with our forebrain is actually a recent invention, evolutionarily speaking, a recent emergent phenomenon. It’s hard for us because it’s so recent. It hasn’t had enough time to get well-developed.
We also seem to be one of the only species that can do it. Some dogs can solve problems. Monkeys can solve problems. Dolphins can solve problems, but it’s a handful of species that can do that. Then we have this other ability of being able to reason in more general terms.
We are able to come up with systems of physical laws that have predictive power that we can use to build spaceships and stuff.
All of that, this is the same thing that people thought like, “Oh, that’s the hard thing. That’s intelligence.” Actually, that stuff is fairly easy for the machine to do because it is so formal. The best logic that you do is very formal. At least at the time, that’s what was thought.
This easy stuff, things that even non-intelligent people can do, [laughs] animals do it. A bird can stand on two legs and walk. It doesn’t seem to be that hard, but that is actually something that’s very hard to do with a mechanical approach to…You’re not using logic. Let’s put it that way. [laughs] You’re not using logic to walk down the street.
This is very similar to this chess idea. We’re now at a place where we do have walking robots. They are not doing logic, and we know that. We have computer programs that can recognize faces, and they’re not doing logic. There’s this monkey wrench that’s been thrown into this approach in general. That maybe we’re not even doing logic when we are doing logic. [laughs]
We might use logic to write down a proof so that we can communicate it and have it formally verified by somebody else. But that’s very hard for us. Even the best logicians have to have a lot of concentration, they need to have the door closed and quiet. They need to really focus. It takes all of our brain power to do that.
Meanwhile, we can make plans to go to the grocery store. We can plan a vacation. We can do all this stuff. It does not seem to take all that skill. There must be something else we’re doing besides logic.
This is a perspective of a grad student in the early 2000s. Looking back on this time, that we’re not doing logic. It seems like there’s logic in it. But all these problems that he’s talking about, the frame problem, the qualification problem, these are things that you deal with in logic, but they’re not problems that we are having. When we are talking, we know what we mean.
I think in 1986, they were still very optimistic about logic and having databases of, “We just need more facts, more facts.” I think that we know that it is not the case. Obviously, you need the system to know about the world, but it’s not going to be in logic.
If you have a two-year-old and you tell it, “Make it so that B is on top of A.” They’re not doing a math problem. They have some other system that…Maybe it’s even a special purpose system of object configuration engine that just knows, “Oh, move this. Move that. It’s done.” It’s a special purpose piece of hardware in our brains that knows how to reason about that.
It feels intuitive because it’s not conscious. We are not doing logic. We’re not working out, “Well, if A is on B and B is on C, then we have to move C to B.” We’re not doing that.
Reification, reasoning about knowledge, belief or goals requires extensions the domain of objects reasoned about. Sentences like precedes…We’re trying to make a statement about what has to be done first.? You’re saying, block two being on block three precedes block one, but being on block two.
You’re trying to stack blocks in a certain order. You have to do them in the right order. You make a logical statement and it says, “This situation has to precede that situation.” On block one block two has to be regarded as an object in the first-order language. This process of making objects out of sentences and other entities is called reification.
He’s realizing that for certain situations, you need to start talking about knowledge itself. You need axioms about the axioms. This is reification.
Now we’re going to talk about context and making it formal. “Whenever we write an axiom, a critic can say that the axiom is true, only in a certain context. With a little ingenuity, the critic can usually devise a more general context in which the precise form of the axiom doesn’t hold.
“Consider the sentence, ‘The book is on the table.’ The critic may propose to haggle about the precise meaning of ‘on,’ inventing difficulties about what can be between the book and the table or about how much gravity there has to be in a spacecraft in order to use the word ‘on.’
“Thus, we encounter Socratic puzzles over what the concepts mean in complete generality and encounter examples that never arise in life. There simply isn’t a most general context.” I’m going to describe this problem a little bit better. Then I’m going to talk about my own personal experience with reasoning.
You have the sentence, “The book is on the table.” You can really start to nitpick what does “on” mean? What if the book is on a piece of paper that’s on the table? Does that still count? How do you represent that?
What if the book is on a box that’s on the table? Is the book on the table still? The book is floating above the table? It’s suspended from the ceiling? Is that on the table?
What’s the difference between a column of air between the book and the table and the box being? There’s a real problem. You can go to this higher context, what if you don’t have gravity. The book is touching the table, but there’s no up or down. You could say, the table is on the book and that should be acceptable too, right?
This is a problem you get in logic, where you’re trying to define context and of course, universal statements. Then you move to a new context and universal statements don’t apply so much anymore, or you get weird results. You get weird statements that the system says must be true. It’s a problem.
From my personal experience, if I’ve been doing a lot of programming. Let’s say I woke up early and I started programming early in the day and then programmed all through the day, in the afternoon. Because I’ve been dealing with a computer that requires such pedantic specificity to work with, I cannot reason on a human level anymore.
A friend will say something and I will nitpick it. “Do you mean duh-duh-duh?” Or, “Do you mean that and this?” That’s not really the precise way to say it. I have to apologize to them. I say, “Wait, I’m sorry. I know what you mean. [laughs] Don’t worry about being so precise. I’ve just been dealing with a computer all day.”
There must be something else happening in our brains, in our minds, that allows us to reason without being quite so precise and somehow short-circuit all these logical puzzles, what he calls Socratic puzzles, about the world. Even if our friends make precision errors in their sentences, we know what they mean, we can work around it.
This is showing them what we’re doing truly is not pure logic. There are approaches that try to use…like “Society of Mind”, like Marvin Minsky’s, where instead of having one monotonic…let’s call it monolithic logic engine, and then this huge database of facts that the engine can use to infer things.
Maybe we have a large number of small special-purpose engines, each with maybe a different approach to the problem. Then there might also be systems that can choose between those other subsystems. Which subsystem would have a useful answer to this problem at this time, they’re activated at different times by different hormonal states.
If we’re stressed, we might be using this. If we’re more relaxed and focus, we’ll be using this other subsystem. This approach kicks the problem down the road, but it shows that maybe a single big logic engine is not the only solution to the problem. That maybe you could have a small logic engine that runs on a small number of facts, but that’s very expensive and slow.
We’re going to try to do it with some heuristics first and try to solve the problem in a satisficing way.
I mentioned before that maybe we have a special-purpose, special-reasoning subsystem in our brains that can solve these little cube-on-cube problems, like how to move the cubes so that they’re properly stacked in the right order.
We can solve these problems without resorting to this general-purpose problem. There’s even, nowadays, doubts and good arguments for the fact that we aren’t general. We don’t have general purpose.
We can’t even imagine the purposes and the contexts in which we don’t operate all the time.
This is, though, the way that, say, scientific revolutions happen, with a total paradigm shift. It requires thinking in a new context that no one had imagined before. No one knew how to do the reasoning in that new context.
This is one of those things where we assume that we’re better than we are or that we’re more intelligent than we are. We’re trying to make the computer be as intelligent as we imagine we are, but we’re not that intelligent. [laughs] We’re not that good.
Or maybe our definition of intelligence is wrong. We don’t need to be so smart in everything we do. We can rely on habit. We can rely on culture to give us pat answers to problems with no optimal solution. We just pick an arbitrary one that seems to work and everyone agrees on. We just work with that.
We don’t need to solve a logic problem with these Socratic problems all the time. He’s still trying to formalize this idea of context.
“Humans find it useful to say, ‘The book is on the table,’ omitting reference to time and precise identifications of what book and what table. This problem of how general to be arises whether the general common-sense knowledge is expressed in logic, in program, or in some other formalism.
“A possible way out involves formalizing the notion of context and combining it with the circumscription method of nonmonotonic reasoning.”
This is where he’s leaving it off. I assume that that means that this is what he was working on in 1986, or this was at least the approach he thought was the most promising. Of course, he doesn’t conclude. It just stops, and there you go. That’s the end.
I think I’ve hashed it out pretty well. This problem of generality and AI has caused us to think a lot about ourselves and what we do when we’re thinking. There’s a much better understanding of our psychology now and all the biases that we have and that those biases served an evolutionary purpose.
Whenever I learn or I read about a bias that I hadn’t heard about before, there’s always an explanation, like why would this be useful? It’s obviously not optimal, but why is it useful?
It’s probably a case where we stumbled on an evolution optimized for something like energy conservation so that we would be more likely to survive, to procreate, and energy being a scarce resource so we make all these assumptions.
We have, for instance, loss aversion. Why would we want loss aversion? It might be that [laughs] you might hold on to something that you have longer than is wise, in that most of the time, you’re going to get to keep it. Most of the time.
There’s these weird situations that you might get in, where you would be better off…You might lose your life if you don’t drop the thing you have, but you want to hold on to it. Those situations are so rare.
Nature, natural selection, evolution has found some balance between dropping what you have to conserve something else, versus holding on to what you have, in the hopes that you’ll get to keep it. It’s found a balance that…People often talk about situations and certainly in the modern world, where this loss aversion doesn’t make sense.
For instance, to an economist, earning $10 is the same as saving $10. The situation is you spend the $10 and then you earn $10, or you just save the $10. To an economist, it’s the same.
The end situation is the same but to a person, it feels a lot different from losing that $10 that you earn, that you spent time earning. It feels different from gaining another $10.
In a world of scarcity, where a new $10 might not come or a new tree full of berries. You might not find a new tree full of berries. Hold on to the one you have. That seems to make a lot of sense. Anyway, what I’m trying to get at is our reasoning is not optimal, and it might not even be logical. It’s not logical with finely-tuned parameters. It’s not logic.
It’s stuff like, “don’t let go,” “hold on to stuff,” and “defend what you have.” That “defend what you have” is not like a computer printout, like calculation detects that 70 percent chance that you will lose what you…It’s not like that. You feel territorial. That’s a hormonal flush that you get. It’s not reason.
Later, you might make up a reason why you did that, “Oh, I thought I would be able to win in the fight if I defended it.” You make that up. But no, you just had a flush of hormones that made you want to keep this thing and defend it against intruders.
I think that this AI and people talking about, “Oh, AI hasn’t been successful.” It has been successful. It has posed questions about reasoning, about intelligence, that has been fruitful. It’s led to a lot of the development of computer science in general.
We could list programming languages, data structures, databases.
SQL, because it’s a logic language, it was once considered AI. Now that it works and it’s open-source software that people take for granted, they forget that it comes from the AI world. I think more fruitful is AI has asked questions, has forced us to ask questions about ourselves.
We’ve learned so much about who we are as a species, what intelligence is, what is the nature of intelligence? How could we possibly work? How do we do what we do?
I think that this is the real contribution of artificial intelligence is asking these really profound, deep questions. You look back at 1986, it seems naive, but it’s only because of these questions asked in 1986 that we have our current understanding. It’s very important.
Thank you so much for listening. Please tell your friends. If you liked this episode, you can always subscribe and you’ll get the new ones. I’m going to continue with these Turing Award winners. I don’t know. I’m learning a lot, let’s put it that way, edifying.
My name is Eric Normand. This has been a Turing Award lecture reading, John McCarthy. Thank you for being there. As always, rock on.