My definition of intelligence has become refined over time.
In my article about the the human mind being a simulation engine, I already wrote out some of the points I am going to make, but my thinking has on the topic has become more clear.
The human mind is composed of tens of thousands of models of our world. I suspect that the majority of neurons, especially in the neocortex are all contributing to model building.

What exactly is a model. In science, a model is a simplified definition of some phenomenon. In math it is often a formula for how numbers move through a system. You can think of a toy car: it can drive around, it has a horn, toy wheels, but its not a real car, just a simplified version of a car. You can think of a cartoon strip as a model as well. They takeaway many parts and focus on some specific aspect.

The human mind makes models for everything. All of our vision is models. Models of what lines look like, shapes, objects, animals, etc.
Concepts are the same thing as models. If you understand something, you know how it works at some level, that means you have a mental model of it.
So a human mental model is a simplified representation of phenomena in our real world. With this definition, learning means to build better models over time that explain the world with increasing accuracy. These models have the following properties:
Composed of other models and features
In the machine learning world, when a machine learning model (similar but different kind of model) is built, features are fed to the algorithm to learn from the data. For example, lets say you want to build a system that distinguishes between cat and dog images: features could be weight, size, paw size, tail length, fur color, nose size, etc. With our models, everything we are calling a feature here is a model in and of itself. We have models in our mind of fur: it is a coat on many animals to keep them warm, it is made of hair, it grows, humans can wear it, it is flammable, etc. We have models for legs: humans have 2, other animals usually have 4, they are used for moving, they have feet attached, etc. These models feed into other models like the “dog model” and the “dog model” feeds into other models like “mammal model” and “living creature model” and “4 legged animal model”. The more understanding we have of the lower level models, the more understanding we may have of the higher level models. Think of how we have 2 different models of gravity in use: Newton’s Law of
Gravitation and Einstein’s General Theory of Relativity. It is models all the way down. Here is a thought experiment I did on what infant early models might look like.
Composable and Compositional
All the models can be combined to make new models and they are compositional in that when you add multiple models together you get new meaning. For example “purple dog”, “dog house”, “dog ships”, are instantly understandable. Most models can be combined to make new meaning, but many combinations don’t make sense: “chicken wrench”, “skinny fat”, or “hungry pillows”. I explain this in more detail here.
Interlinked and hierarchical models
I discussed 2 ideas above: that all models are made of other models and that these models are composable and compositional, meaning they can be combined in limitless ways. Since all these models are loosely interlinked, you can have rich understandings of many concepts , even concepts that you have never directly experienced like “fat unicorns”. To give an example of this interlinking, let’s look at the concept of a dog again. Someone asks you to take care of their dog and you agree to help out even though you have never had your own dog. Since its a living creature, you know that you will need to feed it. Since its a mammal with a brain similar to humans, it probably has a personality that you can see. It will probably need to go for a walk which means you will need to have a leash. It may need cleaning since it has a fur coat. It may need some toys to play with. You need to understand all these other models to be able to use them.
Generalized and invariant knowledge
Humans learn models of everything and store information that is invariant and generalized in a way that has not been replicated in computers. Our internal models of dogs can recognize all breeds of dogs, drawing of dogs, cartoon dogs, dogs from different angles, dogs that you can partially see, dogs that are deformed, stick figures of dogs, etc. All of this with only seeing a few examples where computers may need thousands of examples. This invariance does not apply to only vision, but to all of our senses. You can touch different types of cups such as plastic cups, paper cups, glass cups, tall cups, tiny cups, fat cups, and will recognize them all as cups. You can hear different meows and recognize them all as cats. This generalized knowledge representation in computers has been studied for a long time with little success. Its only been recently with deep learning that we have started to make bigger progress, but we are still very far away from human level generalization.
Multi modal models
In addition to these models having generalized knowledge, they also store information in a cross modal way, meaning sensory input from each of your five senses are cross linked with each other. To continue using the example above, you can recognize a dog from your vision, but also from hearing it bark, from touching it, from smelling it, and if you are from certain countries, maybe from tasting it. The human mind stores multi-modal representations for many of the concepts it learns. This multi-modal representation allows us to combine and use concepts in different ways. For example, your mental model of a chicken probably captures they look, how they moves, the sounds they make, and what they taste like. In English, people sometimes call other people chicken because chicken’s are usually scared of everything.
They can be attached to symbols
Many of these models have symbols or words attached to them. So you can write or read the word “dog” and all the previous models and concepts will appear in your mind. Language is another variation of the multi-modal models we have in our minds. Not every model you have has a word for it though, for example you may feel a certain way anytime the sun is shining down when you are sitting under your favorite tree. You don’t have a name for this, but its something that you have been doing reliably for years and it is part of your habits. In fact, for many of the models we have, we use them subconsciously, such as the model you have for predicting where a football will land. We use words and symbols as shortcuts to get directly to a specific group of concepts that we want to hold in our attention.
Motion and Time
The fundamental difference between animals and plants is that one moves and the other doesn’t; one operates in real time while the other is in a different time scale. Motion has a fundamental representation in all of our models. Our models store sequences like short movies. Think about every verb in the dictionary. To jump, to run, to slide, to spin, to turn, to eat, to do homework, etc. As you read those words, every single one of those models conjure up a little movie playing in your head of the movement of those actions. Just as we store the sequence of ABCs in our minds, we store sequences for everything else such as adjectives and nouns. By storing motions of concepts, we can do all kinds of prediction like estimate where a ball will land or when the car next to you will pass you.
Simulation engine and causality
All of these models can be run inside the mind as a simulation engine. These simulations allow us to predict, plan, understand the past, imagine, and make theories about anything that we want. With this simulation engine, we can also run scenarios in our minds to understand cause and effect. We can run scenarios like “what happens if I leave the infant alone all night”, “what will happen if I skip school today”, or “what will happen if I try to make it through this yellow light”. I go into more detail here.
Autonomous Unsupervised Learning
All of our base models are learned automatically without specific training like is required with modern machine learning. Its most likely via some form of the free energy principle or predictive coding, which have not been successfully implemented in computers yet. The free energy principle posits that all of life is learning to minimize surprise. Another way to say it is “learning is predicting the future” and if we predict incorrectly, then that is surprise and the model must update its internal state. To give a more concrete example, If you think you see a cup and you move and instead you see a chicken, then that would elicit surprise and your mind would automatically update its model of “cup-ness”. Human babies and other animals interact with their environment and are able to learn many things. If I close my eyes, my vision changes; if I move my arm, I should feel this stimulus; I should see this object if I continue to move, etc. A human babies first few years are all about bootstrapping and building these basic models. The base models start off very primitive, but they bootstrap each other and as the baby gets older, more sophisticated concepts like eating, crawling, walking, and talking eventually form. Contrast that with computers where we must train them ourselves by giving them millions of pieces of training data for them to understand a few image classes. So the majority of the models we learn happen in an autonomous way. We do have the ability to focus our learning via curriculum and schooling. For example we can choose our major in college, but by the time you get there, you already have thousands of mental models that your college major will built on top of. When we train computers, we have to tell them ahead of time “these are pictures of dogs, these are pictures of cats”. With the human mind we automatically say “this thing looks like a dog, but its different enough so we will call it a cat”. We have not figured out how to automatically create new classes or objects in computers. This is an active area of research. My favorite hypothesis of how biological systems are able to do this are with “Markov Blankets“.
Amorphous drifting models
There is one main difference between a computer database and a human mind database that I want to emphasize. Computers are deterministic and humans are not. In a computer, every time you lookup the record for “grandmother” in a computer database, you will get the exact same row and data. When a human retrieves information about “grandmother”, different neurons fire and different information is retrieved every time. There is not a single grandmother neuron that fires, instead there are whole probabilistic clusters of neurons that fire for a model. Every time the concept of “grandmother” is retrieved, different neurons will fire. And many of those same neurons in that cluster will fire for other models. Earlier I mentioned that models are interlinked. Neurons that represent one model can also represent models and features of many other models. The model for “living organism” may fire for dog, human, and grandmother. On top of that, a paper was just released showing that memories drift between neurons: “Drifting neuronal representations: Bug or feature?”
Compared to Deep Learning
The main reason deep learning and neural networks are better than previous technologies is because the features are automatically learned. A deep learning model’s algorithm takes training data and then pushes it through layers of artificial neurons to learn a progressively abstract representation of the data (using the back propagation algorithm). Before deep learning, programmers would hand built features to feed into a learning model. This process is painful, laborious, and often wrong because you don’t know if the feature has enough representative power (I’ve built lots of these old school models). In this idea of a model building engine, each of the features that feed into a model are models in and of themselves and they are all learned in an autonomous, unsupervised fashion. It could be that we just need to modify current neural networks, but its more likely we need a brand new architecture.
So what does this all mean? Well, many people have characterized the core cognitive algorithm of the mind in many different ways. The mind is a computer, the mind is a time machine, the mind is a prediction engine, the mind is like a telephone network, the mind is a meme machine, etc. All of these cover different aspects of the mind, but I do believe “The mind as an autonomous interlinked model building database” covers most of it. There are a lot of smart computer scientists and neuroscientists trying to chip away at this problem. One of the most promising areas of research to me is grid cells and the hippocampus. The more I look at what the mind is doing and where were are with technology, the more I think its further away. Is there at least a tiny model of this idea we could start to build? 🙂