**Warning: This is a response to the book On Intelligence by Jeff Hawkins. It was originally written for a Cognitive Science course. Ahead lies 4000 words worth of historical computer science, bumbling neurobiology, and a bit of armchair philosophy. Read at your own peril**
Jeff Hawkins, author of On Intelligence has a bone to pick with both Artificial Intelligence researchers and neuroscientists: neither group, he claims, seems to be concerned with determining the nature of intelligence. For their part, computer scientists have long been resistant to the notion that the structure of the brain is important in promoting intelligent behavior. Neuroscientists, meanwhile, have not put forth as much energy as Hawkins would like into putting forth any sort of unified theory or framework of cognitive function.
The sins Hawkins accuses computer scientists of committing can seemingly be attributed to an over-reverence for Alan Turing. If it is true that a Turing Machine can compute anything that can be computed, and if any we believe that one of the primary functions of the brain is to compute, then it seems perfectly plausible to suggest that the brain is nothing more than a biological implementation of a Turing Machine. Since all Universal Turing Machines are effectively equivalent, then it further seems reasonable to insist that digital computers ought to be a perfectly suited to playing host to intelligence.
Hawkins also argues that Artificial Intelligence researchers were led astray from the very beginning by the Turing Test. Turing lived at a time when psychology was still dominated by behaviorism: the notion that intelligence could only be determined by action. The Turing Test endorses this thinking, and while it is not a benchmark that researchers seriously pursue, virtually all testing of machine intelligence that has followed in its wake is also focused on this input-output driven benchmarking.
The largely fruitless results of historical AI research suggest that maybe this is not the right way to do things. We have made programs and algorithms that can do all manner of seemingly complicated activities from playing chess to approximating optimized configurations for complex systems. The “simple” things that we would really like computers to do, though (vision, language acquisition, motor control) have made only minimal progress. If the brain is following formal, procedural algorithms in the fashion of a turing machine, we obviously have not found them. More likely, though, is that the brain does something different.
Hawkins suggests that the brain, while obviously capable of traditional computation, must also do something more elaborate, and we ought to try replicating that process if we want to create computers and/or programs that exhibit intelligence. Computer Scientists came up with this same idea decades ago and it led to the idea of Neural Networks. This was a good start, but it was not taken far enough. As soon as toy three-layer networks produced some interesting results, researchers returned to navel-gazing instead of taking further steps to mimic brain behavior. But who can blame them when we still understand so little about the brain?
This leads us to Hawkins’ frustrations with the neuroscience community. Chiefly, he thinks that we simply have not put in enough effort to determine how the brain, and specifically the cortex, works. We have gathered reams of experimental data about brain activity. We have rough maps of where all sorts of phenomena are processed in the brain -- from linguistic syntax to motor control to immediate optical stimulus. What we do not know is really anything concrete about how these signals interact to produce what we think of as intelligence.
This is not really surprising in itself. The brain is an incredibly complex and sensitive organ, and it is very hard to perform any sort of accurate experimentation on it. All of our current methodologies necessarily sacrifice either spatial or temporal resolution, and we really need both in order to say anything meaningful. Still, Hawkins would encourage boldness. We cannot spend all of our time simply collecting data without anything to use that data for, and our current theories are too low-level and low-risk to be of interest. We need a theory that can explain the phenomenon of intelligence as a whole, both in order to understand ourselves and in order to imbue this property on future generations of machines.
To this end, Hawkins suggests an overarching theory to describe the nature and purpose of brain function as it relates to intelligence. Such a theory, while likely flawed, would give direction to our research and change the way we think about our work, both in neuroscience and A.I.. Ultimately, On Intelligence is Hawkins’ attempt to lay out such a theory: the Memory Prediction Framework.
In short, the Memory Prediction Framework suggests this: The function of the brain, as leads to intelligence, is not simple computation. Computation would suggest nothing more than stimulus-response pairings. Instead, Hawkins claims that the human brain strives to use previously identified patterns in order to predict and change the future.
For Hawkins, the seat of what we think of as intelligence -- agency, intentionality, adaptability, even creativity and consciousness -- is the cortex. The cortex is not exclusive to humans, but the most notable difference in brain structure between us and other mammals is the sheer size of ours, thanks to the evolutionarily recent expansion known as the neocortex. Since humans are so far beyond other animals in our ability to understand and control our environments and pass on our knowledge to subsequent generations, this enlarged cortex ought to play the primary role.
Hawkins proposes that the chief function of the cortex is to contain a model of the world as we have experienced. At its most abstract, ignoring all of the biology involved, this is accomplished by forming what Hawkins calls invariant memories. We recognize patterns that occur together -- the shape of a hand, the sensation of heat on our skin, the sound of a musical interval -- and group the sensations together into a single mental concept, recognizable even when specifics -- the starting pitch, the orientation of the hand, the location of the burn -- change.
From these patterns, we construct composites of patterns. From letters we pick up words and phrases and stories. Eyes and nose and lips become a face. Intervals expand to phrases combine to make songs. Simple patterns, universally, become building blocks for more complicated concepts, both in the abstract or the specific. As we experience a pattern more, it becomes more accessible and more foundational to creating new patterns. In a world where we often encounter combinations of stimuli that seem completely unrelated, these models help us come to reasonable conclusions about our surroundings. If you hear an animal roar but look around and see that you are still in your kitchen, you don’t assume that a bear got into the house, but that somebody has the television on too loud.
The power of these models is apparent: if we know what follows from what we are experiencing right now, we know how to respond. We see a bottle teetering on a table and we can steady it before it falls. We see an old wounds threatening to reopen during an argument and we preemptively make peace. We build tools to initiate a chain of events that lead to a desired goal, whether that is an improved harvest or a man setting foot on the moon. By predicting how the environment will respond to our actions or what state naturally follows from the current one, we exhibit control over reality in a way that simple organisms simply cannot. We nudge the trajectory of the world to be more favorable for us.
What is even more remarkable is that we can create invariant “memories” about things we have never witnessed. We have developed tools, notably language, that allow us to pass on our experiences to others, providing them the ability to respond appropriately to situations before they have ever encountered them. Further, for lack of a better term, we have the ability to imagine. We can create mental worlds rooted in the model of reality that we have built but distinctly different. We can tweak parameters, ask ourselves how things would change if certain patterns coincided in a new way. We can test the outcome of a course of action without taking on the risk ourselves. This gives us our unique capacity for invention in both the practical and artistic senses.
All of this is well and good, and we can provide ample anecdotal evidence to convince ourselves of it based purely on reason. The question is, can it be supported by biology? Unfortunately, there is still so much about the brain, and especially the cortex, that is a mystery to us that little can be said conclusively. Hawkins does, however, offer up a description of what is known about the cortex and how this could endorse the Memory Prediction Framework. This is where my expertise wanes, but I will relay Hawkins’ lesson to the best of my ability:
Physically, the cortex is a thin coating of brain matter, consisting of “grey matter” (neurons) and “white matter” (axons connecting the neurons), surrounding the evolutionarily old brain. The cortex is divided physically into six layers of neurons. Each neuron is connected at many points to the neighbors in its own vertical column as well as to neurons in its neighboring columns and a plethora of other neurons distributed throughout the whole of the rest of the cortex. It is well known that, in the visual regions of the cortex, these connections form further topological hierarchies, and Hawkins thinks it natural that this should also be the case for other regions, as we will see. Signals begin in the lower levels of these hierarchies and then collect and travel upwards as more elaborate relationships between signals are processed.
These upwards connections have been studied widely, but the first important thing that Hawkins focuses on is the fact that there are actually more feedback connections traveling down the hierarchies than there are transmitting data forward. Most theories of the brain seem to discount the importance of these connections, but we will see shortly that they take on prominence in the Memory Prediction Framework.
There is one other important feature of cortical design that we have to discuss first. I already mentioned that the cortex is consistent in its physical makeup across its entirety. In the late 1970’s, this led researcher Vernon Mountcastle to put forth a theory that has largely been dismissed ever since: that there is no significant functional differentiation in the cortex. That is, all regions of the cortex, whether they process sight or language or movement follow some universal algorithm. This, in part, is why Hawkins assumes logical hierarchies within all regions of the cortex and not just the regions that process vision.
As evidence for this claim, Hawkins offers up two arguments. First, the extreme plasticity of the cortex. We know, for example, that violinists have larger areas of their cortex dedicated to controlling the movement of the fingers on their left hand. Individuals born deficient in one sense seem to have larger regions devoted to their others. In one particularly shocking experiment, a man turned blind was able to regain “sight” be having a camera send electrical impulses to his tongue. This somatosensory information was processed in the visual regions of the cortex.
Second, there is no reason to believe that the brain has any reason to process different senses differently. The brain itself is without any sense, after all, and all senses can ultimately be described in the same way. Whether it is a collection of light rays collecting on the retina or longitudinal wave colliding with the cochlea, all of our perception boils down to spatio-temporal patterns. If the brain is just processing spatio-temporal patterns at every turn (as the Memory Prediction Framework already suggests it does), then it should not matter what the input mechanism for these patterns is, just that the pattern is delivered for interpretation.
From here out, I will adopt this theory that all inputs to the cortex are equivalent. I may use language that seems pertinent to the way we perceive a particular sense, notably sight, but please understand that I am referring to any generic sense.
Most of what researchers have observed in the brain has been how these signals are propagated upward through the (logical) hierarchy. A particular image or impression is received from the sensory organs and is transmitted to the cortex. A subset of neurons immediately connected to that sense fires in accordance with the received image and, in firing, pass a signal upwards. The next layer receives this input pattern exactly as the lower layer did and again fires off a subset of its member neurons. This continues until the signal reaches layers high enough for us to make conscious sense of what is being perceived. These groupings of firing neurons, notably the particular groupings that result in recognition, embody the patterns that constitute our invariant memories.
What is interesting is that, while the lowest regions are rapidly modulating due to constantly changing input (because of the saccadic motion of the eye or the continuous flow of sound through the air or whatever else), higher regions -- regions where things in the world are recognized -- stay active for much longer. Obviously, then, these higher regions respond to increasingly general patterns. Hawkins also suggests that there are specific temporal patterns, that is, sequences of spatial patterns, to which these regions respond, and that they will remain active for as long as they receive the same expected repetition of spatial patterns.
Tying into the importance of these temporal patterns, Hawkins now comes back to the significance of the feedback connections, the ones connecting logically higher layers in the hierarchy to those closer to the actual perceived input. In short, Hawkins claim is that, once a pattern has been recognized at a higher level, it tells the levels below which input it is expecting to receive next. In effect, we prime ourselves to see what follows from what we are currently seeing, and we begin responding to it before it can even occur. This is the “prediction” element of the Memory Prediction Framework.
When the actual input defies the expected input, we are jarred out of the current pattern and the image once again propagates normally until we can replace our previous pattern one with one that more accurately represents what we are truly seeing and resume along a more appropriate course of action.
All of this feedback and neural priming predisposes us to see patterns with which we are already familiar. This can help us resolve ambiguities in the environment and perform on-the-fly error correction, but it can also cause us to gloss over seemingly unimportant distinctions. We see this any time we automatically correct a minor spelling or grammatical mistake or even when we try our hands at a “spot the differences” exercise. We are very good at seeing what we want to see because that makes our lives easier and our processing swifter.
So, we now understand how memories are activated and how they are used to predict the immediate future. But how are they formed? Hawkins endorses a simple mechanism known as Hebbian Learning. Simply put, when neurons fire at the same time, the synapses between them are strengthened. This means that each member of a neuron pattern firing increases the odds of the other neurons in the pattern firing as well. These patterns identify an element of a “memory”, and the patterns themselves are stored in the synapses. The more we see a pattern, the more likely we are to see it in the future, even if not all of the elements of that pattern are available in the immediately presented image.
Hawkins has a lot more to say about how the brain operates: the significance of the columnar alignment of neurons as a processing unit; the nature and method of inhibition between neurons and columns; role of the hippocampus as the topmost hierarchical element of the cortex; the importance of the thalamus as a gateway between cortical regions. This is where nuance that is beyond me comes into play, though, and I cannot hope to do all of it justice. I take Hawkins at his word in these arguments, in part because, he offers a number of testable hypotheses that need to be confirmed in order for the Memory Prediction Framework to stand up.
For instance, Hawkins suggests that we will eventually be able to identify the downwards cascades of “predictive” activity through neural hierarchies that should coincide with sudden understanding. Novel events, on the other hand, should be seen propagating upward toward the hippocampus. He identifies specific sorts of cells that should exist in particular cortical layers and which should show excitement in anticipation of an input or respond differentially depending on whether its input is expected or unexpected.
Either because of a lack of interest or continuing limitations in our monitoring technology, it would seem that none of Hawkins’ hypotheses have been either confirmed or refuted in the years since he wrote On Intelligence. If Hawkins really wants to promote his theories or provide some evidence of their correctness without waiting for technology to catch up, the best route might be through implementing them successfully in technology. To this end, Hawkins has already started a company called Numenta, with the goal of producing machine learning packages that model a variation on what he calls Hierarchical Temporal Memory -- a learning architecture that mimics his theories about the design and behavior of the cortex.
Hawkins talks about future generations of machines that utilize Hierarchical Temporal Memory to process patterns and make predictions about any phenomena imaginable. Computers are already better than we are at processing large amounts of data. The electrical signals that travel through a microprocessor are orders of magnitude faster than the electrochemical impulses that drive neural activity. There is no telling what we could discover with intelligent machines churning away and making “informed” predictions based on all of the information we could feed into them.
This is especially apparent when we consider the sorts of input such machines could process. We have already committed to the notion that senses are arbitrary and interchangeable, so why should machines be limited to making decisions based on our senses? Imagine computers that operate and make extrapolations based on input from novel senses, like sonar or barometric pressure, as easily as we do sight. Such computers would drastically increase our ability to, say, predict weather patterns or plan unmanned spaceflight. And that is just the beginning. Imagine how much more effective the approach could be if we actually had computer architectures that integrate memory and processing the way the brain seems to.
But one important question remains: would such a machine be intelligent? This is where we need to start holding Hawkins accountable for some of his early rhetoric. For all of his grand talk early going about the superiority of the brain, Hawkins still appears to believe that intelligence can be replicated by digital machines. This would not necessarily be human intelligence, complete with all of the intangible qualities that make life fascinating, but at least the pattern-driven, future predicting intelligence that he believes is the key to our success as a species. So, whether he commits to it directly or not, Hawkins ultimately believes after all that the brain is nothing more than an augmented Turing Machine; a Turing Machine optimized for the sorts of feedback driven, memory intensive algorithms he has described, but Turing Machine nonetheless.
Moreover, for all of his complaints about behaviorism, Hawkins is ultimately insisting that behavior is the ultimate benchmark of intelligence. He simply shifts the scale from the macro (measurable action) to the micro (method of signal processing). If prediction is the defining measure of intelligence, does the implementation matter? It seems unreasonable to suggest that it is, in which case Hawkins is not really offering us anything new. Prediction, of a sort, has been the goal of A.I. all along. Big Blue was able to “predict” the correct move at each juncture to beat Kasparov. In the abstract realm, the Chinese Room is able to “predict” an appropriate response to a written inquiry in an unknown language. We already have all kinds of learning systems, ranging from traditional Neural Networks to Bayesian modeling, that seek to make predictions based on previous observations without making any effort at faithfully modeling the cortex. Why are these implementations less capable of intelligence than Hawkins’?
Truly, as regards prediction, Hawkins’ solution is different only in approach. This approach may have advantages, it does not give us intelligence by itself. Since prediction alone would appear to be insufficient (unless we suddenly want to change our minds and ascribe intelligence to existing implementations (actual or theoretical) of A.I.), we need to look for another primary feature of intelligence and ask whether that is fundamental to the Memory Prediction Framework and Hierarchical Temporal Memory.
Off the cuff, I would argue this defining feature of intelligence is willfulness -- the ability to direct (if not select) our thoughts, to forge mental connections of our own volition rather than programmatically. Truth be told, I am not certain that the Hierarchical Temporal Memory can provide that or that the Memory Prediction Framework can explain it. Certainly, Hawkins does. He suggests that consciousness is simply “the feeling of having a [sufficient] cortex”. But, while his description of the brain and the neural connections within the cortex certainly provides mechanisms and conduits through which thought could be consciously directed, the driving force is nowhere to be found.
So, if Hawkins’ framework cannot successfully explain intelligence in individuals, we cannot expect it to imbue intelligence on machines. Hierarchical Temporal Memory does not create any stronger argument for understanding than passing the Turing Test. That does not mean his model is useless. The feedback and priming systems that Hierarchical Temporal Memory contains seem well suited to monitoring and making predictions about real time systems, whether it be vision or other crucial signal monitoring. Hawkins’ work may not be any less artificial than other routes that we have taken, but it is still a potentially beneficial supplement.
For the neuroscientist, Hawkins ideas would seem to posses more significant value. The importance of feedback, pattern recognition, and pre-conscious prediction in cognition make a world of sense at a cursory glance, even if the mechanisms that guide it may not uniform throughout the cortex. Even if we limit the Memory Prediction Framework to being a description of cortical function and not intelligence as a whole, though, Hawkins still runs into hot water. For one thing, he discounts the importance of the old brain in intelligence far more than can be acceptable. More than that, though, there has to be a reason that Mountcastle’s theories of universal cortical function have not gained more traction over the past 30-some years.
I may be willing to accept that uniformity could well be the norm for processing sensory input. Patterns in where we process the senses are easily explained by nerve connections from those senses, and all of the signals ultimately are translated into the same sorts neural firings. How does this explain issues like Brocha’s area, though? Why would virtually all humans process linguistic syntax, something with no direct connection to a single sense or the outside world at all, in the same region of the cortex? There does not seem to be an easy “path of least resistance” explination available here.
Since this, ultimately, is Hawkin’s goal, I have to end by giving him credit. His goal was never to provide a definitive answer to these problems, but a starting point. On Intelligence does provide and intriguing, if likely flawed, account of what intelligence could be. I suspect it will not stand the test of time, but it may well provide a useful stepping stone. By taking the risk of proposing not only a theory but standards by which it can be tested, Hawkins has left future researchers with the ammunition to tear apart his framework and iteratively replace it with one that lies closer to the truth. That risk can only lead us closer to the truth.
Bibliography
Hawkins, Jeff. On Intelligence (New York: St. Martin’s Press, 2004).
Hebb, D.O. The Organization of Behavior (New York: Wiley and Sons, 1949).
Mountcastle, Vernon B. “An Organizing Principle for Cerebral Function: The Unit Model and the Distributed System” in The Mindful Brain (Cambridge, Mass: MIT Press, 1978).
Legacy Content. Numenta. http://www.numenta.com/legacy.php. Accessed March 6, 2012.
Thursday, March 8, 2012
Computers, the Cortex, Prediction, and Intelligence
Labels:
Computer Science,
coursework,
philosophy
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment