Notes on Clark Chapter 3
("Mind and World: The Plastic Frontier")
Econ 308: Agent-Based Computational Economics
- Last Updated: 16 June 2006
- Latest Course Offering: Spring 2006
- Course Instructor:
-
Professor Leigh Tesfatsion
-
tesfatsi AT iastate.edu
-
Syllabus for Econ 308
- Basic Reference:
- Andy Clark, Being There: Putting Brain, Body, and World Together
Again, MIT Press, Cambridge, MA, 1998 (paper), ISBN: 0-262-53156-9
Basic Concepts
- Associative Engine (Clark, p. 53)
- A vision of the brain as an organ engaging in environmental
interactions through an iterated series of simple pattern-completing
computations. The latter involve learning to reliably associate
particular stimuli (e.g., partial facial representations) with particular
responses (e.g., names of individuals).
- Artificial Neural Network (ANN) (Clark, p. 53)
- An artificial neural network is a deliberately constructed
information processing system consisting of nodes (input, hidden, and
output), connections (edges), and connection weights whose basic architecture
is inspired by actual biological neural systems (neurons, axons, synapses,
and synaptic weights).
- ANN Supervised Training Method (Clark, p. 52)
- A process involving the adjustment of the connection weights in an
ANN by means of some form of supervised error-correction procedure.
- Gradient Descent (Clark, p. 57)
- A particular type of ANN supervised training method under which the
actual outputs at the ANN output nodes are compared against desired outputs,
the resulting overall error (deviation) is calculated, and the various ANN
connection weights are then adjusted up or down to decrease this error.
Consequently, the rate of change (gradient) in the ANN connection weights is
always in the direction of lower error (descent).
- External Scaffolding (Clark, p. 60)
- (From Clark Chapter 2): "(We) may may often solve problems
by `piggy-backing' on reliable environmental properties. This
exploitation of external structure is what I mean by the term
scaffolding." In any given circumstance, this external
structure might include other people, linguistic tools (language),
environmental aspects, and/or intrinsic bodily dynamics (e.g., the
properties of muscles).
- Pragmatic Action (Clark, p. 64)
- "Pragmatic action is action undertaken because of a
need to alter the world to achieve some physical goal (e.g., one
must peel potatoes before boiling them)."
- Epistemic Action (Clark, p. 64)
- "Epistemic action, in contrast, is action whose
primary purpose is to alter the nature of our own mental tasks. ...
(T)he changes we impose are driven by our own computational and
information-processing needs."
- Epistemic Credit (Clark, p. 69)
- "Individual brains should not take all the credit for the
flow of thoughts or the generation of reasoned responses. Brain and
world collaborate in ways that are richer and more clearly driven by
computational and informational needs than was previously
suspected."
Key Issues
1. Major lessons of ANN research?
(Clark, p. 58): "The major lesson of neural network research, I
believe, has been to thus expand our vision of the ways a physical
system like the brain might encode and exploit information and
knowledge."
(Clark, p. 59): "(C)ognitive science can no longer afford
simplifications that take the real world and the acting organism out
of the loop -- such simplifications may obscure the solutions to
ecologically realistic problems that characterize active embodied
agents such as human beings. ... (A)bstracting away from the
real-world poles of sensing and acting deprives our artificial
systems of the opportunity to simplify or otherwise transform their
information-processing tasks by the direct exploitation of
real-world structure."
(Clark, p. 59-60): "Artificial neural networks...present an
interesting combination of strengths and weaknesses. (B)enefits
accrue because the systems are, in effect, massively parallel
pattern completers. ... (T)hey are not intrinsically well suited to
highly sequential, stepwise problem solving of the kind involved in
logic and planning... A summary characterization might be `good at
Frisbee, bad at logic' -- a familiar profile indeed. ... (ANNs) are
fast but limited systems that, in effect, substitute pattern
recognition for classical reasoning."
(Clark, p. 60): "(W)e are generally better at Frisbee than at logic.
Nonethless, we are also able ... to engage in long-term planning and
to carry out sequential reasoning. If we are at root associative
pattern-recognition devices (like ANNs), how do we do it? ... One
(factor) merits immediate attention. It is the use of our old
friend, external scaffolding."
2. Mistaking Mind for the Brain Alone?
(Clark, p. 61): "The combination of basic pattern-completing
abilities and complex, well-structured environments may thus enable
us to haul ourselves up by our own computational bootstraps."
(Clark, p. 61): "(C)lassical rule-and-symbol based AI may have made
a fundamental error, mistaking the cognitive profile of the agent
plus the environment for the cognitive profile of the naked
brain..."
(Clark, p. 62): "Not all animals are capable of originating
(external scaffoldings), and not all animals are capable of
benefiting from them once they are in place. The stress on external
scaffolding thus cannot circumvent the clear fact that human brains
are special. But the computational difference may be smaller and
less radical than we sometimes believe."
(Clark, pp. 63-64): Jigsaw puzzle example. Humans solve jigsaw
puzzles by "(p)icking up pieces, rotating them to check for
potential spatial matches, and then trying them out... Imagine, in
contrast, a system that first solved the whole puzzle by pure
thought and then used the world merely as the arena in which the
already-achieved solution was to be played out. This crucial
difference is nicely captured by David Kirsh and Paul Maglio (1994)
as the distinction between pragmatic and epistemic
action."
(Clark, p. 64): "(T)he classic (cognitive science/AI) image bundles
into the machine a set of operational capacities which in
real life emerge only from the interactions between machine
(brain) and world."
(Clark, pp. 65-66): "(E)xternal structures (including external
symbols like words and letters) are special insofar as they allow
types of operations not readily (if at all) performed in the inner
realm. A more complex example (than scrabble is) performance on the
computer game Tetris. ... (In) the case of Tetris the internal and
external operations must be temporally coordinated so closely that
the inner and outer systems (the brain/CNS and the on-screen
operations) seem to function together as a single integrated
computational unit." (NOTE: CNS = Central Nervous System)
(Clark, pp. 68-69): "It is (the) methodological separation of the
tasks of explaining mind and reason (on the one hand) and explaining
real-world, real-time action taking (on the other) that a cognitive
science of the embodied mind aims to question. ... (H)uman reasoners
are truly distributed cognitive engines: we call on external
resources to perform specific computational tasks, much as a
networked computer may call on other networked computers to perform
specific jobs. ... The true engine or reason, we shall see, is
bounded neither by skin nor skull."
Questions Posed by Moderators During In-Class
Discussion
1. Why is mean-squared error (MSE) used in supervised training of
ANNs?
As Patrick Jordan correctly noted, the use of squared errors (error
= discrepancy between desired and actual output at some output node) protects
against positive errors cancelling out negative errors. Note, however, that
a similar protection against cancellation would also be provided by the use
of mean *absolute* error (MAE) in which the magnitudes of the errors are
summed without regard for their signs. Each of these "penalty cost"
functions implicitly imposes a different kind of penalty on different kinds
of errors. The use of squared errors in MSE heavily penalizes large errors
(discrepancies) relative to small errors, in a nonlinear manner. In
contrast, the use of MAE penalizes errors linearly in proportion to their
magnitude.
A wide variety of other penalty costs functions might also be
considered. A "Bayesian" statistician would say that the exact choice of
penalty cost function should be tailored to the problem at hand, reflecting
the user's relative "disutility" for different error patterns in a particular
problem context. A "classical" statistician might prefer to work with MSE
because of an underlying presumption that the errors are random variables
with a known "normal" (bell-shaped) distribution, in which case the MSE (a
sum of squared normally distributed random variables) would itself be a random
variable with a known distribution.
2. Is SUPERVISED training necessary for ANNs?
ANNs can range from highly supervised to highly unsupervised. More
precisely, ANNs can range all the way from being completely hard-wired
(connections and connection weights all pre-specified by a human designer) to
to being totally self-organizing (connections and connection weights can all
freely adapt in response to successive inputs). Self-organizing
ANNs exhibit what is referred to as unsupervised training because
there is no top-down controller guiding the self-organization process.
A useful discussion of ANN training covering this full range is given by
Stan Franklin in his well-known monograph Artificial Minds, MIT Press,
Cambridge, MA, 1997.
3. What happens if the supervisor training an ANN is wrong about
something?
This is another interesting question. While it may be impossible for a
particular ANN ever to learn the "correct" solution with a poor supervisor
(poorly designed penalty cost function), one could imagine that a population
of ANNs with different types of supervisors might evolve to a point where
only "good" supervisors remain active (survival of the fittest solution).
4. How many epistemic actions should be taken in any given situation?
This interesting question is closely related to several more
traditionally posed questions arising in statistics and control theory
generally. In statistical sequential decision making, a key question is when
to stop sampling data and decide on an action (the "optimal stopping rule"
problem). In control theory, the issue is when to stop collecting
information about an incompletely understood system and to start exploiting
the information you already have by choosing an action (control) conditional
on this information in an attempt to achieve some optimization objective (the
"dual control" problem).
5. How many "processors" (nodes) do we need to model the human mind?
As various class members noted, this will presumably depend on the
problem or situation we were considering. A number of CAS researchers are
explicitly focusing on this type of issue in various guises.
Here's an interesting point. Under relatively weak assumptions,
ANNs are known to be "universal approximators." For example, any continuous
function that maps input into output (where the input is restricted to some
closed bounded domain) can be approximated arbitrarily closely by a suitably
constructed ANN. The difficulty is that this non-constructive theorem does
not say anything specific about how to carry out this construction. For
example, the theorem says nothing about HOW MANY hidden node layers or HOW
MANY nodes per hidden node layer would be needed to achieve good
approximation.
6. Does the brain have a CPU?
Well, if it does, where might it be located? Brain scans suggest that
widely dispersed areas of the brain tend to be activated when a person
undertakes a task. Note that this question is distinct from the
question of internal representations -- the latter could be supported by
dispersed sensory inputs, the issue being whether the dispersed sensory inputs
are ever "fused" into a single coherent mental representation
of some aspect of the world (e.g., a spatial map, Robert Brook's coke can,
etc.).
7. And how do YOU play Tetris (if you do)?
Copyright © 2006 Leigh Tesfatsion. All Rights Reserved.