Saturday, December 27, 2008

The Subtle Structure of the Everyday Physical World = The Weakness of Abstract Definitions of Intelligence

In my 1993 book "The Structure of Intelligence" (SOI), I presented a formal definition of intelligence as "the ability to achieve complex goals in complex environments." I then argued (among other things) that pattern recognition is the key to achieving intelligence, due to the algorithm
  • Recognize patterns regarding which actions will achieve which goals in which situations
  • Choose a goal that is expected to be good at goal achievement in the current situation
The subtle question in this kind of definition is: How do you average over the space of goals and environments? If you average over all possible goals and environments, weighting each one by their complexity perhaps (so that success with simple goals/environments is rated higher), then you have a definition of "how generally intelligent a system is," where general intelligence is defined in an extremely mathematically inclusive way.

The line of thinking I undertook in SOI was basically a reformulation in terms of "pattern theory" of ideas regarding algorithmic information and intelligence that originated with Ray Solmonoff; and Solomonoff's ideas have more recently been developed by Shane Legg and Marcus Hutter into a highly rigorous mathematical definition of intelligence.

I find this kind of theory fascinating, and I'm pleased that Legg and Hutter have done a more thorough job than I did of making a fully formalized theory of this nature.

However, I've also come to the conclusion that this sort of approach, without dramatic additions and emendations, just can't be very useful for understanding practical human or artificial intelligence.

What is Everyday-World General Intelligence About?

Let's define the "everyday world" as the portion of the physical world that humans can directly perceive and interact with -- this is meant to exclude things like quantum tunneling and plasma dynamics in the centers of stars, etc. (though I'll also discuss how to extend my arguments to these things).

I don't think everyday-world general intelligence is mainly about being able to recognize totally general patterns in totally general datasets (for instance, patterns among totally general goals and environments). I suspect that the best approach to this sort of totally general pattern recognition problem is ultimately going to be some variant of "exhaustive search through the space of all possible patterns" ... meaning that approaching this sort of "truly general intelligence" is not really going to be a useful way to design an everyday-world AGI or a significant components of one. (Hutter's AIXItl and Schmidhuber's Godel Machine are examples of exhaustive search based AGI methods.)

Put differently, I suspect that all the AGI systems and subcomponents one can really build are SO BAD at solving this general problem, that it's better to characterize AGI systems
  • NOT in terms of how well they do at this general problem
but rather
  • in terms of what classes of goals/environments they are REALLY GOOD at recognizing patterns in
I think the environments existing in the everyday physical and social world that humans inhabit are drawn from a pretty specific probability distribution (compared to say, the "universal prior," a standard probability distribution that assigns higher probability to entities describable using shorter programs), and that for this reason, looking at problems of compression or pattern recognition across general goal/environment spaces without everyday-world-oriented biases, is not going to lead to everyday-world AGI.

The important parts of everyday-world AGI design are the ones that (directly or indirectly) reflect the specific distribution of problems that the everyday world presents an AGI system.

And this distribution is really hard to encapsulate in a set of mathematical test functions. Because, we don't know what this distribution is.

And this is why I feel we should be working on AGI systems that interact with the real everyday physical and social world, or the most accurate simulations of it we can build.

One could formulate this "everyday world" distribution, in principle, by taking the universal prior and conditioning it on a huge amount of real-world data. However, I suspect that simple, artificial exercises like conditioning distributions on text or photo databases don't come close to capturing the richness of statistical structure in the everyday world.

So, my contention is that
  • the everyday world possesses a lot of special structure
  • the human mind is structured to preferentially recognize pattern related to this special structure
  • AGIs, to be successful in the everyday world, should be specially structured in this sort of way too
To encompass this everyday-world bias (or other similar biases) into the abstract mathematical theory of intelligence, we might say that intelligence relative to goal/environment class C is "the ability to achieve complex goals (in C) in complex environments (in C)"

And we could formalize this by weighting each goal or environment by a product of
  • its simplicity (e.g. measured by program length)
  • its membership in C, considering C as a fuzzy etc
One can create a formalization of this idea using Legg and Hutter's approach to defining intelligence also.

One can then characterize a system's intelligence in terms of which goal/environment sets C it is reasonably intelligent for.

OK, this does tell you something.

And, it comes vaguely close to Pei Wang's definition of intelligence as "adaptation to the environment."

But, the point that really strikes me lately is how much of human intelligence has to do, not with this general definition of intelligence, but with the subtle abstract particulars of the C that real human intelligences deal with (which equals the everyday world).

Examples of the Properties of the Everyday World That Help Structure Intelligence

The propensity to search for hierarchical patterns is one huge example of this. The fact that searching for hierarchical patterns works so well, in so many everyday-world contexts, is most likely because of the particular structure of the everyday world -- it's not something that would be true across all possible environments (even if one weights the space of possible environments using program-length according to some standard computational model).

Taking it a step further, in my 1993 book The Evolving Mind I identified a structure called the "dual network", which consists of superposed hierarchical and heterarchical networks: basically a hierarchy in which the distance between two nodes in the hierarchy is correlated with the distance between the nodes in some metric space.

Another high level property of the everyday world may be that dual network structures are prevalent. This would imply that minds biased to represent the world in terms of dual network structure are likely to be intelligent with respect to the everyday world.

The extreme commonality of symmetry groups in the (everyday and otherwise) physical world is another example: they occur so often that minds oriented toward recognizing patterns involving symmetry groups are likely to be intelligent with respect to the real world.

I suggest that the number of properties of the everyday world of this nature is huge ... and that the essence of everyday-world intelligence lies in the list of these abstract properties, which must be embedded implicitly or explicitly in the structure of a natural or artificial intelligence for that system to have everyday-world intelligence.

Apart from these particular yet abstract properties of the everyday world, intelligence is just about "finding patterns in which actions tend to achieve which goals in which situations" ... but, this simple meta-algorithm is well less than 1% of what it takes to make a mind.

You might say that a sufficiently generally intelligent system should be able to infer these general properties from looking at data about the everyday world. Sure. But I suggest that would require a massively greater amount of processing power than an AGI that embodies and hence automatically utilizes these principles? It may be that the problem of inferring these properties is so hard as to require a wildly infeasible AIXItl / Godel Machine type system.

Important Open Questions

A couple important questions raised by the above:
  1. What is a reasonably complete inventory of the highly-intelligence-relevant subtle patterns/biases in the everyday world?
  2. How different are the intelligence-relevant subtle patterns in the everyday world, versus the broader physical world (the quantum microworld, for example)?
  3. How accurate a simulation of the everyday world do we need to have, to embody most of the subtle patterns that lie at the core of to everyday-world intelligence?
  4. Can we create practical progressions of simulations of the everyday world, such that the first (and more crude) simulations are very useful to early attempts at teaching proto-AGIs, and the development of progressively more sophisticated simulations roughly tracks the development of progress in AGI design and development.
The second question relates to an issue I raised in a section of The Hidden Pattern, regarding the possibility of quantum minds -- minds whose internal structures and biases are adapted to the quantum microworld rather than to the everyday human physical world. My suspicion is that such minds will be quite different in nature, to the point that they will have fundamentally different mind-architectures -- but there will also likely be some important and fascinating points of overlap.

The third and fourth questions are ones I plan to explore in an upcoming paper, an expansion of the AGI-09 conference paper I wrote on AGI Preschool. An AGI Preschool as I define it there is a virtual world defining a preschool environment, with a variety of activities for young AI's to partake in. The main open question in AGI Preschool design at present is: How much detail does the virtual world need to have, to support early childhood learning in a sufficiently robust way? In other words, how much detail is needed so that the AGI Preschool will posssess the subtle structures and biases corresponding to everyday-world AGI? My AGI-09 conference paper didn't really dig into this question due to length limitations, but I plan to address this in a follow-up, expanded version.

6 comments:

artiphys said...

at the risk (always) of being way out of my depth here...

It seems to me that what you are getting at is the issue of generality vs. specialization. The assumption is that specialization implies more optimal strategies, within the specific domain. In data compression, our bitrate is dependent on our prediction model. A better prediction means more certainty, ie less entropy, and according to Shannon's law, less bits (bits == log2(1/P)). However, if our data defies our model, a highly specialized compression algorithm (one finely tuned to the usual probabilities) blows up, and does worse than a "stupider" scheme that makes less assumptions about the data.

Isn't there an analogous situation in AI? Something like, the more specialized a mind is in a particular domain, the better it is at analyzing that domain, given equal resources. In practice, that should mean something along the lines of more accurate induction given fewer instances. In other words, knowing your stuff. Tiger Woods as opposed to Uncle Jim. But if Tiger is asked to play chess against Gary Kasparov...

But the interesting issue to me is, what is the tradeoff, from an evolutionary point of view, between specialization and generality? In the case of legged locomotion, which I've spent some time banging my head against, one sees a wide variation in nature. Horses and deer can walk pretty much out of the box, but humans take close to a year to get the hang of it. Is that because human locomotion is so much more difficult? I doubt it. It's not bipedalism; many birds can do that shortly after birth. I suspect that human evolution simply traded specialization for general cognition skills, and slow development was worth the price. Note that humans can learn all sorts of athletic skills, most of which seem beyond the capabilities of your average animal (throwing, catching, jumping, etc.) Animals can be trained, but they don't come up with this stuff on their own. Anecdotally, they seem more specialized and less flexible in their ability to learn physical skills.

Digging myself further into this hole, it seems to me that an argument can be made that even in the human case (assuming it's more general than most or all other animals), evolution has provided quite a bit of prior data, in the forms you suggest such as the intrinsic nature of symmetry and heirarchy in diverse everyday problems. But, can't we therefore think of evolution as actually being the general AGI, and human beings as representing the system after a whole bunch of learning (a billion year's worth) has taken place? Physics is the prior for evolution; the existence of complex life, and the mechanics of evolution, are the prior for everyday world intelligence.

The specifics of our planet and its biosphere may impose a well-defined barrier between evolution and phenotypic learning, but perhaps this is an artifact of this particular system. In other words, a 'true' AGI might include elements that roughly mimic the time scale and generality of evolution, and other elements that are closer in nature to memory and "actual" learning. However, the demarcation between these two phases may be less distinct; or there could be three or more phases; or perhaps there's a way to engineer it so the transition from generality to specific knowledge is continuous.

dan miller

Anonymous said...

I don't understand what you are talking about at all. I'm guessing you want to understand the idea of intelligence to build a robot. I recommend you say things in a very basic, simple, clear way so everyone can understand you. I'm just being honest.

artiphys said...

> Erin Rachel Young said...

> I don't understand what you are talking about at all.

Out of curiosity, are you responding to the original post, or my comment?

-artiphys

Anonymous said...

Artiphys,

Ben gave me a link to his webpage. On his webpage it has a link to his blog. I was reading his blog and wondering what he was talking about. I guess that is a career blog. I'm kind of responding to the original post. I didn't even look at what you said.

Sincerely,
Erin

memetic warrior said...

"The extreme commonality of symmetry groups in the (everyday and otherwise) physical world is another example: they occur so often that minds oriented toward recognizing patterns involving symmetry groups are likely to be intelligent with respect to the real world."

I'm very interested in these matters. I know the work of Markus Hutter, Schmidhuber and Solomonoff. The Solomonoff general theory of inductive inference is a cornerstone in the understanding not only of intelligence but also life in general. Don´t forget that living beings compute things, that is, execute algoritms to respond tu inputs in order to generate outoputs, not only at the neural-intelligent level, but also at the chemical level (many chemical levels, deeply entangled lets say, from celular to body levels). A ameba computes things. He also adjust the parameters of the algoritms according with the results and furter imputs. That implies also pattern recognition. Therefore, intelligence may be just an application of pattern recognition applied to higuer level phenomena. but the pattern recognition of inputs is pervasive at all scales of life. It seems that life and evolution tries to anticipate the future and react to them trough the search of computational algoritms accurate enough but not more, so that the algorithmic complexity is minimun. An algorithmically complex world make evolution progress impossible. there is no swimmig animal that live in aquatic environmments where turbulence are big enough.

In this picture enters the research of Max Tegmark, an cosmologist that draw a clear line of trough from observational cosmology up to a deep philosophical and mathematical consequence: all the ovserved data points to the most plasible hypothesis that the universe is infinite and that there are many levels of multiverses with differente laws, some of these levels are part of the cosmologist consensus, it´s just scientific deduction from the experimental data. The unavoidable conclusion is that the antropic weak principle rules, therefore our local universe has just the right structure for life to exist. And this is the point in which I return to the higlighted phrase: yes, life and evolution imposes many restrictions for the everyday world in order to exist: simplicity, continuity and smoothness are required for computability and discovery of solutions-adaptations in the fitness landscape of the living being. Simplicity means predictability, that is, in the first place, a mathematical universe where events obey laws that at the macroscopic levels seem to appear as smooth, continuous, that is, discoverable by the simple mechanism of natural selection and stimulus-response.

I mean, as you said, everyday world; Althoug the real laws seems complicated, they can be aproximated with simple algoritms. general relativity and quantum mechanics needed 4.5 billion years of evolution for being discovered by a living being, but there are dozens of simple laws that are consequences of these fundamental laws that living beings used from the first moment in his computations, such are chemical laws, fluid dynamics, aristotelian physics, frequencies of light and obscurity and so on. There is a recurring repetition or reuse of mathematical structures from the more fundamental laws down to the everyday laws, and among everyday laws. It seems that the world is made to help evolution by allowing reuse of computational structures for phenomena that have a common. this is more notorius at the inteligent level: the set theory, the number theory, and so on are pervasive for every intelligent phenomena. This is mathematics, can be said, yes but is find that reality is mathematic, and simple, by the way. If mathematicity is freedom of contradictions, there are many alternative mathematical structures that the world may use instead of the simple ones. After all, why use mathematics at the first place?. The answer to all of this is the anthropic principle applied to our local universe to the restricitions that natural selecction imposes to the laws of the everyday world. I think that intelligence is just a piece in this puzzle. This perspective can add little, in the first place, to the quiestion of what is intelligence, but I think that can do in the middle term, because some other areas of knowledge that are now considered, can be added in new ways to the field. for example entropy, information theory, arrow of time, probability etc. It can allow to restrict the general, untractable problem, as you said, of general-general intelligence to the general problem in-this-world.

I found a nice web site about computations at the chemical level:
A Tour of Molecular Information Theory

memetic warrior said...

Artiphys:

"Horses and deer can walk pretty much out of the box, but humans take close to a year to get the hang of it. Is that because human locomotion is so much more difficult?"

The reason is just adaptation: Horses _need_ to walk and run from the first moment on, because its sole protection agains predators is its speed. Human babies do not, because they have the protection of his parents. Because primates also did not need to, then the hominid has delevoped a by-learning mechanism for walking instead of a hardcoded one. The former is indeed , i guess less demanding of precious genetic code.

But paradoxically human babies DO have an instinctive capability for swimming, that they forget after a few months, like other instincts. apparently because some hominids lived in swamps, so their babies born underwater.

We would born now upright if our ancestors would have need of it.