Book Notes: On Intelligence by Jeff Hawkins|
Aug 29, 2005, 2:22a
On Intelligence by Jeff Hawkins
On Intelligence is a brilliant book that proposes what I like to think of as a "grand unifying theory" for how the brain works. I highly recommend the book to anyone who has any interest in the brain. Notes from the book:
- Redwood Neuroscience Institute is the research institute Hawkins started in 2002 to study the brain. It recently merged with a department at Berkeley and is now called the Redwood Center for Theoretical Neuroscience.
- Numenta is the company Hawkins founded to commercialize the research that the Redwood Center uncovers.
- "Complexity is a symptom of confusion, not a cause." (5)
- "It is the ability to make predictions about the future that is the crux of intelligence." (6)
- "A thesis of this book is that understanding cannot be measured by external behavior ... it is instead an internal metric of how the brain remembers things and uses its memories to make predictions." (20)
- "A neural network is unlike a computer in that it has no CPU and doesn't store information in a centralized memory. The network's knowledge and memories are distributed throughout its connectivity - just like real brains." (24)
- Hawkins believes 3 things are essential to understanding the brain:
1) Inclusion of time in brain function - real brains process rapidly changing streams of information
2) Importance of feedback - for every fiber feeding forward into the cortex, there are 10 fibers feeding back to the senses
3) Account for the physical architecture of the brain - neocortex is a repeating hierarchy
- "A real brain and a three-row neural network are built with neurons, but have almost nothing else in common." (28)
- Auto-associative memories: neural nets connected using lots of feedback. These memories fed the output of each neuron back into the input. With this feedback loop, when a pattern of activity was imposed on the artificial neurons, they formed a memory of that pattern. The auto-associative memory associated patterns with themselves. (30)
- "The most important property [of auto-associative memories] is that you don't have to have the entire pattern you want to retrieve in order to retrieve it ... The auto-associative memory can retrieve the correct pattern, as it was originally stored, even though you start with a messy version of it." (30)
- "An auto-associative memory can be designed to store sequences of patterns, or temporal patterns. This feature is accomplished by adding a time delay to the feedback. With this delay, you can present an auto-associative memory with a sequence of patterns, similar to a melody, and it can remember the sequence." (30)
- "Scientific frameworks are often difficult to discover, not because they're complex, but because intuitive but incorrect assumptions keep us from seeing the correct answer." (32)
- Sometimes the engineering solution differs radically from nature's solution (e.g. planes fly differently than birds do). But Hawkins believes that in order to understand how intelligence and understanding works you must look at the underlying natural structure (the brain), though AI researchers have been hesitant to do so.
- "If you are born without a cerebellum or it is damaged, you can lead a pretty normal life. However, this is not true for most other brain regions; most are required for basic living, or sentience." (41)
- The neocortex is about 2 millimeters thick and has six layers. Humans are smarter because our cortex covers a larger area relative to body size. The human neocortex contains 30B neurons.
- Idea for an experiment: Surgically increase the mass of a rat's neocortex to see if intelligence/behavior is affected
- If you damage or lose your
- right parietal lobe, you lose the ability to perceive or conceive of anything on the left side of your body or in the left half of space around you
- left frontal region known as Broca's area, you lose the ability to use the rules of grammar, although your vocabulary and comprehension of words is unchanged
- fusiform gyrus, you lose the ability to recognize faces
- Basic sensory hierarchy in the brain:
- visual information enters via the primary visual area (V1), which is concerned with low-level features such as tiny edge-segments, small-scale components of motion, binocular disparity, and basic color and contrast information
- V1 feeds info to other areas, such as V2, V4, and IT
- V4 responds to objects of medium complexity, such as star shapes in different colors
- motor, auditory, and somatosensory inputs also have a similar hierarchy
- eventually, all of this sensory info passes into "association areas"
- When the axon from one neuron touches the dendrite of another, they form small connections called synapses; synapses are where the nerve impulse from one cell influences the behavior of another cell
- When 2 neurons spike at nearly the same time, the connection strength between them will be increased, in a process known as Hebbian learning
- "The formation and strengthening of synapses is what causes memories to be stored." (48)
- It's estimated that our neocortex has 30 trillion synapses, and this is apparently sufficient to store all the things you can learn in a lifetime
- Vernon Mountcastle looked at the brain, noticed that all regions of the cortex look the same, and suggested that since they all look the same, perhaps they are all performing the same basic operation, based on a single cortical algorithm. The thing that makes each region specialize is how they are connected to each other and to other parts of the central nervous system, which is dictated by our genes. This makes sense since the brain is so plastic - if specific region is damaged, other regions routinely take over.
- "For example, newborn ferret brains can be surgically rewired so that the animals' eyes send their signals to the areas of the cortex where hearing normally develops. The surprising result is that the ferrets develop functioning visual pathways in the auditory portions of their brains. In other words, they see with brain tissue that normally hears sounds." (54)
- "Apparently no area of cortex is content to represent nothing." (54)
- The optic nerve sends 1,000,000 fibers to the brain, and the auditory nerve sends 30,000 fibers
- Saccade - the sudden movement your eye makes 3 times every second to fixate on a new point
- Proprioceptive system - system of sensors that tell us about our joint angles and bodily position
- "In fact, your brain can't directly know where your body ends and the world begins. Neuroscientists studying body image have found that our sense of self is a lot more flexible than it feels. For example, if I give you a little rake and have you use it for reaching and grasping instead of using your hand, you will soon feel that it has become a part of your body. Your brain will change its expectations to accommodate the new patterns of tactile input. The rake is literally incorporated into your body map." (60)
- Now here's an interesting experiment: take a fake plastic hand and place it in front of you. Get someone to stroke both your fake hand and your real hand (out of sight). After a short time, you'll actually feel the sensations being applied to the fake hand as if it were your own.
- Using your tongue to see: "The subject wears a small camera on his forehead and a chip on his tongue (12 x 12 electrodes). Visual images are translated pixel for pixel into points of pressure on the tongue ... the brain quickly learns to interpret the patterns correctly [as vision] ... [a person blind] tried on the tongue unit and saw images for the first time since his childhood ... Images initially experiences as sensations on the tongue were soon experienced as images in space." (61)
- "It doesn't matter where [which senses] the patterns come from; as long as they correlate over time in consistent ways, the brain can make sense of them." (62) Synesthesia (mixing senses, e.g. seeing colors when hearing sounds) may be the result of a pattern detection bug in the wiring of the brain.
- "A system running the neocortical algorithm will be intelligent based on whatever kinds of patterns we choose to give it." (63)
- Neurons are slow compared to transistors; they can fire about 200 times per second (200 Hz) while modern computers can do 1 billion operations per second (1 Ghz). However, a human can perform significant tasks in less than a second (e.g. recognize a cat in a photo) - this is Hawkins' "100-step rule": a human can do a lot more in just 100 steps than a computer, or even a massively parallel computer, can in a similar number of steps, no matter how large or how fast. Hawkins takes this as evidence that the brain "doesn't compute the answers to problems; it retrieves the answers from memory." (68)
- "The entire cortex is a memory system. It isn't a computer at all." (68)
- Hawkins' core thesis:
The neocortex stores sequences of patterns
The neocortex recalls patterns auto-associatively
The neocortex stores patterns in an invariant form
The neocortex stores patterns in a hierarchy
- "Truly random thoughts don't exist. Memory recall almost always follows a pathway of association." (71)
- Your memory of the alphabet is a sequence of patterns. [I also can count up to 15 in Hindi in the same way. I never remember what number a specific word maps to, but I do remember the sequence, and I count it to find what number a word maps to.]
- "Most of the information [in your brain] is sitting there idly waiting for the appropriate cues to invoke it." (73)
- "Our brains fill in what they miss with what they expect to hear. It's well established that we don't actually hear all the words we perceive." (74)
- "We call this chain of memories thought, and although its path is not deterministic, we are not fully in control of it either." (75)
- "Artificial auto-associative memories fail to recognize patterns if they are moved, rotated, rescaled, or transformed in any of a thousand other ways, whereas our brains handle these variations with ease." (76) The brain can do this pattern-matching easily because it isn't as literal; instead, it matches the incoming patterns with what Hawkins calls an invariant representation, which is an archetype of sorts. It is a pattern of salient "keys" that help the brain define and identify an object or concept. The problem of understanding how the cortex forms invariant representations remains one of the biggest mysteries in science.
- One example of invariant representation is illustrated by your ability to recognize a melody in any key. The memory actually stores the important relationships in the song, not the actual notes (pitch intervals).
- "To make a specific prediction, the brain must combine knowledge of the invariant structure with the most recent details." (83)
- When we look around a room, our brain is actively forming predictions of what we expect to see, and when we see something that we don't correctly predict/expect, it is immediately noticed.
- "What we perceive is a combination of what we sense and of our brains' memory-derived predictions." (87)
- "Prediction is not just one of the things your brain does. It is the primary function of the neocortex, and the foundation of intelligence. The cortex is an organ of prediction ... Even behavior is best understood as a by-product of prediction." (89)
- Predictions are essentially our expectations of how the world around us will behave, which is also tied closely with happiness - we're happy when things exceed our expectations (e.g. unexpected free food!) and we're unhappy when things fall short of our expectations (e.g. due to traffic, it took an hour to get to where you were going instead of the 30 min you expected).
- Bizarre yet makes sense: "Right after the New York City stopped running elevated trains, people called the police in the middle of the night claiming that something work them up. They tended to call around the time the trains used to run past their apartments." (93) Their brains expected a sound at that time each night, and when that prediction didn't come true, it drew the person's attention to it and woke them up.
- Another example of how your brain fabricates predictions based on current inputs - "filling in": due to your optic nerve, there is a blind spot in each eye. However, even if you're looking at a rug with one eye, your brain will "fill in" the pattern it expects to see in your blind spot so you don't see the blind spot at all.
- "We don't even have to assume the cortex knows the difference between sensations and behavior; to the cortex they are both just patterns." (100)
- Hawkins contends that behavior is just the brain observing a self-fulfilling prophesy: "Instead I believe the cortex predicts seeing the arm, and this prediction is what causes the motor commands to make the prediction come true. You think first, which causes you to act to make your thoughts come true." (102)
- Humans have greater cortical control over motor commands than other mammals: "If you damage the motor cortex of a rat, the rat may not have noticeable deficits. If you damage the motor cortex of a human, he or she becomes paralyzed." (103)
- "Thus, intelligence and understanding started as a memory system that fed predictions into the sensory system. These predictions are the essence of understanding. To know something means that you can make predictions about it ... These predictions are our thoughts, and, when combined with sensory input, they are our perceptions. I call this view of the brain the memory-prediction framework of intelligence." (104)
- "[In your cortex] what is actually happening flows up, and what you expect to happen flows down." (113)
- The higher up you go in the cortex (e.g. from V1 to IT) you expect to see fewer changes over time.
- All predictions are learned by experience. If there are consistent patterns among the inputs flowing into your brain, your cortex will use them to predict future events.
- I expected there to be a smoother gradient of communication from V1 to IT, not this rigid 4 step communication.
- "V1 is made up of numerous separate little cortical areas that are only connected to their neighbors indirectly, through regions higher up in the hierarchy." (122)
- Invariant representations are formed in every cortical region.
- "Each region of the cortex learns sequences, develops what I call 'names' for the sequences it knows, and passes these names to the next region higher in the cortical hierarchy ... This 'name' is a group of cells whose collective firing represents the set of objects in the sequence ... By collapsing predictable sequences into 'named objects' at each region in our hierarchy, we achieve more and more stability the higher we go. This creates invariant representations." (129-130) A bit hand-wavy, but interesting nonetheless.
- "If lower regions of the cortex fail to predict what patterns they are seeing, they consider this an error and pass the error up the hierarchy. This is repeated until some region does anticipate the pattern." (133) This escalation happens all the way up to the hippocampus, which creates the first new memory. This memory will eventually get pushed down into the neocortex and to lower levels over time. Without the hippocampus, a person can't form any new memories, and Hawkins' approach provides a logical answer for why this is.
- The human cortex has about several hundred million microcolumns, each of which functions as a basic unit when it comes to pattern learning, matching, and predicting. Information flows mostly in the direction of the column - horizontally at layer 1 and vertically in layers 2 through 6. 90% of the synapses in a column come from outside the column. "The large number of synapses connecting cells in a column to other parts of the brain provide each column with the context it needs in order to predict its activity in many different situations." (141)
Detailed schematic of a column:
- Each column is made up of 6 layers.
- Converging inputs from lower regions always arrive at layer 4 - the main output layer
- These inputs also connect to layer 6
- Layer 4 cells project to layers 2 and 3 in their column
- Layers 2 and 3 project to the next higher column
- Layer 6 cells project down to layer 1 of a lower column (feedback)
- Layer 1 cells project laterally to layer 1 of horizontally adjacent columns
- Layers 2, 3 and 5 have dendrites in layer 1, and so accept signals from layer 1
- Layers 2 and 3 project down to layers 5 and 6
- The large cells in layer 5 appear to be responsible for movement both in the motor cortex and the visual cortex (e.g. saccades)
- Delayed self-feedback occurs via the thalamus - layer 5 cells project to the thalamus, which then projects from the thalamus to layer 1 of the same column. Hawkins suspects this is the delayed feedback that lets auto-associative memory models learn sequences.
- Layer 1 provides a way of converting an invariant representation into a more detailed and specific representation. For example, if you know that the next note must be five notes away (either a C or a G), layer 1 signals activate the 2 columns corresponding to these specific predictions. We can think of the input from above to layer 1 as the name of a song, and the inputs from horizontal columns as where we are in the song. Thus layer 1 carries much of the information we need to predict when a column should be active.
- In this way, layer 1 contains info about which columns were just active in this region of the cortex.
- Is consciousness controlled in the thalamus? If it's damaged, the person is left in a persistent vegetative state.
4 key questions about the cortical algorithm:
1) How does a region of cortex classify its inputs? Layer 4 cells receive inputs from many lower levels, and if it receives the "right" combination, it will fire, which has the effect of classifying the inputs as inputs that would fire this specific column. A column with strong input should prevent other columns from firing. This is done via inhibitory projections. How this "right" combination is identified and learned is not currently well understood.
2) How does it learn sequences of patterns? The entire column becomes active (layers 4, then 2 and the 3, then 5) after receiving a signal from below. If layer 1 of the column is active at the same time, synapses to layers 2, 3 and 5 are strengthened due to the simultaneous firing. If this occurs often enough, these layer 1 synapses become strong enough to make layers 2, 3 and 5 fire even when a layer 4 cell hasn't fired - meaning parts of the column can become active without receiving input from below. In this way, cells in layers 2, 3, and 5 learn to "anticipate" when they should fire based on input from layer 1. Before learning the column can only become active with input from a layer 4 cell. After learning, the column can become partially active via memory.
3) How does it form a constant pattern or "name" for a sequence? Layer 2 cells stay on during learned sequences, and represent the name of a sequence of inputs. They will present a constant pattern to higher cortical regions as long as our region can predict what columns will be active next. Layer 3b cells fire when a column becomes active unexpectedly and doesn't fire if the activation is expected. Layer 3a cells' only job is to prevent layer 3b cells from firing when it sees the appropriate pattern in layer 1. Layer 2 cells could stay active even when their column isn't based on inputs from layer 6 cells in columns above it (via layer 1 in its own column). It is as if the higher region sends the name of a melody to layer 1 below. This event causes a set of layer 2 cells to fire, one for each of the columns that will be active as the melody is heard.
4) How does it make specific predictions? Let's say, based on the current observed melody, the higher column is expecting the next musical interval to be a fifth. It activates all columns that represent fifths, such as C-G, D-A, and E-B. Also, let's say that the last not you heard was a D. So now in layer 2 we have activity in all columns that are fifths, and in layer 4 we have partial input to all columns representing intervals involving D. The intersection of these two sets represents our answer, the column representing D-A. A layer 6 cell that receives lower inputs (observations) on the way to layer 4 and higher inputs (expectations) from layer 1 (via layer 3) will fire - Prediction becomes reality.
- "Every moment in your waking life, each region of your neocortex is comparing a set of expected columns driven from above with the set of observed columns driven from below. Where the 2 sets intersect is what we perceive." (156)
- Daydreaming or thinking is based on the projections from layer 6 into layer 4, which means our predictions become the input. This allows us to see the consequences of our own predictions.
- The memory-prediction model of cortex requires that synapses far from the cell body be able to detect specific patterns, which is not what most scientists today think to be possible
- The higher columns learn patterns of lower "objects" - The unexpected result of this learning is that, during repetitive learning, representations of objects move down the cortical hierarchy (due to projections from L6 to L1)
- "I believe layer 4 pattern classification starts at the bottom and moves up. But as it does, we start forming sequences that then move down. It is the memory of sequences I am suggesting re-form lower and lower in the cortex." (166)
- The hippocampus is the top dog in the cortical pyramid, where new memories are first born before being pushed down into the neocortex.
- "The more you know, the less you remember." (171) Because there are less novel behaviors that trigger new memory formation.
- When 2 regions of cortex connect to each other in a hierarchical way, they also connect indirectly through the thalamus. This second pathway only transfers info up the hierarchy, not down. This pathway can either be closed or open. Hawkins speculates that this second pathway is responsible for imagination. If you attend to something, it pushed simple inputs up to higher levels of the visual pathway. Hawkins contends that "the alternate pathway through the thalamus is the mechanism by which we attend to details that normally we wouldn't notice." (173) The thalamus sends the raw data to the higher region. I don't think we know how this alternate pathway is opened up.
Epochs of intelligence
1) DNA as the medium for memory - individuals could not learn and adapt in their lifetimes, they could only pass on the DNA-based memory via their genes.
2) Modifiable nervous systems that can learn memories in their lifetimes - individuals still cannot communicate their learnings to others. This epoch included the creation and expansion of the neocortex.
3) Learned memories can be communicated to others, including future generations - much of our learning is built by standing on the shoulders of giants, which is only possible thanks to language.
- I predict that the 4th epoch will be characterized by instant, precise, global communication of information where everyone knows everything and new information is communicated instantly, precisely, and globally as it is created.
- "We want art to be familiar yet at the same time to be unique and unexpected." (186-187) Makes sense because we want it to trigger some existing columns but also trigger the creation of new patterns in existing columns.
- "Creativity is mixing and matching patterns of everything you've ever experienced or come to know in your lifetime. It's saying 'this is kinda like that.' The neural mechanism for doing this is everywhere in our cortex." (187)
- Einstein's brain had more support cells (glia) per neuron than average. It showed an unusual patterns of grooves (sulci) in the parietal lobes - a region thought to be important for mathematical ability and spatial reasoning. It was also 15$ wider than most other brains.
Train yourself to be more creative:
1) Assume up front that there is an answer for what you're trying to solve
2) Let your mind wander
3) Ponder the problem often but also do other things so the cortex will have the opportunity to find an analogous memory
- "To this day I still hear people claim that computers should adapt to users. This isn't always true. Our brains prefer systems that are consistent and predictable, and we like learning new skills." (192)
- "I think consciousness is simply what it feels like to have a cortex." (194) Consciousness includes 2 attributes: 1) self-awareness and 2) qualia - the idea that feelings associated with sensations are somehow independent of sensory input.
- Self-awareness is synonymous with forming declarative memories, which are memories that you can recall and talk about to someone else (they can be expressed verbally - e.g. where I was last weekend, but not how to ride a bike).
- I disagree with Hawkins when he states that "your belief that you were conscious disappeared only when your declarative memory was erased." (197) I frequently don't remember what I did yesterday, but I don't think that yesterday didn't happen.
- IDEA: add nerve endings to the brain of a rat and see how it's behavior changes
- Hawkins believes that qualia is caused in part by the fact that our brains don't have nerve endings and thus can't be fit into the model that the brain creates based on its sensory inputs.
- "Most of what you perceive is not coming through your senses; it is generated by your internal memory model." (202)
- EXPERIMENT: show a child a black and white world for several years; how will they react to color?
- Me: the world is not an inevitability.
- Based on this understanding of the brain, what would we build? "Start with a set of senses to extract patterns from the world ... Next, attach to these senses a hierarchical memory system that works on the same principles as the cortex ... Once our intelligent machine has created a model of the world, it can then see analogies to past experiences, make predictions of future events, propose solutions to new problems, and make this knowledge available to us." (209)
Issues when building this new intelligent machine:
1) Capacity - Need 8 trillion bytes of memory to represent 32 trillion synapses of 2 bits each synapse
2) Connectivity - Brain is connected all across, which would be difficult to wire in hardware. I propose using software instead. Hawkins recommends using the same line for multiple communications, but recognizes that this may be the most significant technical challenge.
- 3 obvious applications of cortical-based memory systems: speech recognition, vision, and smart cars.
- "One way we can glimpse the future of intelligent machines is to think of aspects of the technology that will scale well. That is, which attributes of intelligent machines will grow cheaper and cheaper, faster and faster, or smaller and smaller. Things that grow at exponential rates rapidly outpace our imagination and are most likely to play a key role in the most radical evolutions in future technology." (222-223)
4 attributes that will scale well:
1) Speed: Intelligent machines will think a million times faster than a human brain. 2 machines could converse a million times faster than 2 humans.
2) Capacity: Human brain constrained by biology (width of pelvis, caloric intake), but this isn't true for machines. They would have deeper hierarchies and therefore deeper understandings. Wider regions mean more details could be remembered or perceive with greater acuity.
3) Replicability: Once one has learned, easy to reproduce same learning.
4) Sensory Systems: Not constrained to the 5 human senses, intelligent machines would have new sensory inputs (e.g. global weather, entire electromagnetic spectrum, etc.)
Testable Predictions of Hawkins' Model:
1) We should find cells in all areas of cortex that show enhanced activity in anticipation of a sensory event, as opposed to a reaction of a sensory event.
2) The more spatially specific a prediction can be, the closer to primary sensory cortex we should find cells that become active in anticipation of an event.
3) Cells that exhibit enhanced activity in anticipation of sensory input should be preferentially located in cortical layers 2, 3 and 6 and the prediction should stop moving down the hierarchy in layers 2 and 3.
4) One class of cells in layers 2 and 3 should preferentially receive input from layer 6 cells in higher cortical regions. Corollary: We should find another class of cells in layers 2 or 3 whose apical dendrites form synapses preferentially with axons originating in nonspecific regions of the thalamus. These cells predict next items in a sequence.
5) A set of "name" cells described in prediction 4 should remain active during learned sequences.
6) Another class of cells in layers 2 or 3 (different from the name cells referred to in predictions 4 and 5) should be active in response to an unanticipated input, but should be inactive in response to an anticipated input.
7) Unanticipated events should propagate up the hierarchy. The more novel the event the higher the unanticipated input should flow. Completely novel events should reach the hippocampus.
8) Sudden understanding should result in a precise cascading of predictive activity that flows down the cortical hierarchy.
9) The memory-prediction framework requires that pyramidal neurons can detect precise coincidences of synaptic output on thin dendrites.
10) Representations move down the hierarchy with training.
11) Invariant representations should be found in all cortical areas.
Read comments (2) - Comment
« Google Desktop 2 with Sidebar
Multiple Sclerosis »
- Nov 15, 2010, 8:44a
wonderful! this is one of the best articles here... i dunno why there are no comments on this!! I'm gonna get a copy of this book asap! thanks! BTW i'm a first year UG stud... just finished high school.. and i'm not much into mathematics.. so is this worth reading for a stud like me? or is the level of the book too high? just wanna know if i can comprehend the things...
- Nov 15, 2010, 6:16p
Hawkins writes this for the non-mathematician with an interest in the brain, so it'll probably be perfect for you. Go for it.