Can a decision-maker directly update an expected utility function via
transitional utility assessments in a manner analogous to Bayes' rule for
updating a probability distribution via transitional probability assessments?
As detailed in a 1982 Theory and Decision article titled "A Dual Approach
to Bayesian Inference and Adaptive Control"(pdf,774K),
the short answer is "yes."
I first posed this question in my 1975 Ph.D. thesis in economics, written at the University of Minnesota under the
direction of Clifford Hildreth and Leonid Hurwicz. This thesis,
titled "Two Essays on Individual Choice," was subsequently
published in part as
Tesfatsion (1980a).
A major component of this thesis was the development of a conditional expected utility model for
boundedly rational decision makers in which utility and probability were
symmetrically treated. That is, utility and probability were both
conditioned on the decision-maker's selected goals and actions (g,a), and both were
defined over the same set of possible (g,a)-conditioned events. Given this symmetry, I wondered whether
it might be both feasible and useful to derive a "Bayes' rule for utility" in
parallel to the standard Bayes' rule for probability.
More precisely, many sequential decision techniques require the successive
updating of a probability distribution defined over the state-of-the-world
for some decision environment. For certain types of problems, however, an
understanding of this probability distribution has no intrinsic value.
Ultimate interest focuses on the criterion function -- e.g. expected utility
-- which is indirectly updated by means of the updated probability
distribution. Suppose attention is shifted from the state-of-the-world as a
random event to the criterion function, itself, as a random function of the
state. Could the criterion function then be estimated and updated directly,
by-passing explicit probability distribution updating altogether?
An affirmative answer was provided in a series of studies
Tesfatsion (1976),
Tesfatsion (1977),
Tesfatsion (1978a),
and
Tesfatsion (1978b).
In these studies I showed that consistent directly updated expected utility estimates can be obtained for an
interesting class of sequential decision problems by filtering over a
sequence of past utility assessments. Under weak plausible restrictions, the
sequence of actions maximizing the updated expected utility estimates
converges to a local maximum of the true expected utility function.
Moreover, suitable selection of a "prior" expected utility function at the
beginning of the initial period can guarantee, in some cases, the convergence
of these actions to a global maximum of the true expected utility function.
I called this method criterion filtering.
Criterion filtering represents one possible approach to sequential
decision-making that decreases computational complexity while retaining the
essence of the Bayesian message: prior and sample information are to be
combined to form updated expected utility evaluations over the set of
feasible actions.
Without the inclusion of a prior expected utility function in the criterion
filter, the selection of an action in accordance with criterion filtering
reduces to the following simple opportunity cost maxim analogous to the
maximum likelihood principle for determining parameter estimates: Select an
action today that would have yielded maximum average utility had it been
selected over "similar" past periods. With the inclusion of a prior
expected utility function in the criterion filter, the selection of an action
in accordance with criterion filtering is analogous to the Bayesian
determination of a parameter estimate by maximization of a posterior
probability density function.
At this point I was using dynamic programming extensively in other ongoing
research. Consequently, I wondered if criterion filtering could be extended
to the direct updating of dynamic programming value (cost-to-go) functions.
An affirmative answer was provided in
Tesfatsion (1979).
Specifically, this article demonstrates the feasibility of
directly updating dynamic programming value functions on the basis of
sequentially obtained utility assessments, bypassing the need for explicit
probability updating via Bayes' rule. The performance of the method is
systematically explored for a class of adaptive control problems by means of
computational experiments.
In Tesfation (1980b)
and Tesfatsion (1984)
I extended criterion filtering to sequential game frameworks with boundedly rational players.
The basic idea of criterion filtering motivated in part my
development (with Robert Kalaba) of the
"Flexible Least Squares (FLS)"
approach to the sequential estimation of dynamic nonlinear systems with
poorly understood structure. I have also used criterion filtering for the modeling of learning agents in
some of my agent-based computational economics research, particularly in my research on
the
Trade Network Game (TNG).
It appears that criterion filtering has some interesting
connections to the elegant Q-learning theory independently developed
by Charles Watkins (1989, "Learning From Delayed Rewards," Ph.D. Thesis,
Cambridge University) and to temporal-difference learning as
elaborated by Richard S. Sutton and Andrew G. Barto (Reinforcement Learning: An
Introduction, The MIT Press, Third Printing, 2000).
Leigh Tesfatsion (1984),
"Games, Goals, and Bounded Rationality",
Theory and Decision, Vol. 17, pp. 149-175.
The published article is available from
SpringerLink.
Abstract: This article presents a generalization of the
standard N-person game with flexible information requirements suitable for
players constrained by certain types of bounded rationality. In particular,
strategies (complete contingency plans) are replaced with partial contingency
plans augmented by goals. Both utility and probability are conditioned on
selected goals and actions (g,a), and both are defined over the same set of
possible (g,a)-conditioned events.
Well-known existence theorems for Nash equilibria and Nash bargaining
solutions are extended to this context. For adaptive sequential games, the
symmetrical treatment of payoffs and probability assessments permits players
to learn their successive moves via "criterion filtering."
That is, the expected utility criterion
function of each player can be directly updated in each decision period using
transitional utility assessments in a manner analogous to Bayes' rule for
updating probability distributions using transitional probability
assessments.
Leigh Tesfatsion (1982),
"A Dual Approach to Bayesian Inference and Adaptive Control"(pdf,774K),
Theory and Decision, Vol. 14, pp. 177-194.
The published article is available from
SpringerLink.
Abstract: This article surveys results established to
date for the "criterion filtering" approach to adaptive control.
Criterion filtering bypasses the
usual preliminary updating of probability distributions via transitional
probability assessments (Bayes' rule) and focuses instead on the direct
updating of the criterion function, itself, via transitional return
assessments.
Leigh Tesfatsion (1980a),
"A Conditional Expected Utility Model for
Myopic Decision Makers", Theory and Decision, Vol. 12, pp.
185-206.
The published article is available from
SpringerLink.
Abstract: This article formulates and axiomatizes a
conditional expected utility model that allows a decision maker to specify
actions in the form of partial rather than complete contingency plans and
to simultaneously choose goals and actions in end-mean pairs. The resulting
model is shown to be a generalization of the
"Small World" model developed by Leonard Savage (The Foundations of
Statistics, 1972). In this conditional expected utility model, both
utility and probability are conditioned on selected goals and actions (g,a), and
both are defined over the same set of possible (g,a)-conditioned events. For adaptive sequential
decision problems, this symmetrical treatment of utility and probability
permits agents to learn via "criterion filtering."
That is, the expected utility criterion
function can be directly updated in each decision period using transitional
utility assessments in a manner analogous to Bayes' rule for updating
probability distributions using transitional probability assessments.
Abstract: In the absence of contrary information, it
would seem prudent for competitors to attribute to their opposition the same
level of rationality they attribute to themselves. Using a
simple but interesting C3 (command, control, and communication)
problem for illustration, a method is proposed for incorporating symmetrical
rationality without resorting to the general multistage game framework which
has proved difficult to apply in practice. A "criterion filtering" technique
is then proposed for the approximate solution of the resulting model that does not
require integration operations
and that appears to be especially well suited for C3 problems
with finite admissible control sets.
Leigh Tesfatsion (1979),
"Direct Updating of Intertemporal Criterion
Functions for a Class of Adaptive Control Problems"(pdf,2M),
IEEE Transactions
on Systems, Man, and Cybernetics, Vol. SMC-9, pp. 143-151.
The published article is available from
IEEE Xplore.
Abstract: This article extends the criterion filtering
approach
[ Tesfatsion (1978a)]
to intertemporal stochastic
optimization problems. It demonstrates the feasibility of directly updating
dynamic programming value functions on the basis of sequentially obtained
return assessments, bypassing the need for explicit probability updating via
Bayes' rule. The performance of the method for a class of adaptive control
problems is systematically explored using computational experiments.
Robert Kalaba and Leigh Tesfatsion (1978a),
"Two Solution Techniques
for Adaptive Reinvestment: A Small-Sample Comparison", Journal of
Cybernetics, Vol. 8, pp. 101-111.
Abstract: Two adaptive control techniques are compared
in the context of an adaptive reinvestment two-armed bandit problem. The
first technique is Bayesian dynamic programming. The distinctive
feature of the second technique -- "criterion filtering" -- is the direct estimation
and updating of the criterion function without recourse to explicit probability updating via
Bayes' rule. The control selections generated by the two techniques
are shown to closely approximate each other. An explanation for this close
approximation is provided by means of an equivalence theorem for the
specification of the criterion function.
Leigh Tesfatsion (1978a),
"A New Approach to Filtering and Adaptive Control",
Journal of Optimization Theory and Applications, Vol.
25, pp. 247-261.
The published article is available from
SpringerLink.
Abstract: A new approach to adaptive control is
proposed, referred to as criterion filtering. The principle
distinguishing feature of criterion filtering is the direct updating of the
current expected return function by means of a filtering operation on a
vector of past return functions. The data storage and computational problems
often associated with explicit probability updating via Bayes' rule are thus
avoided.
Leigh Tesfatsion (1978b),
"A New Approach to Filtering and Adaptive
Control: Stability Results"(pdf,911K),
Applied Mathematics and Computation,
Vol. 4, No. 1, pp. 27-44.
The published article is available from
Science Direct.
Abstract:
In a companion study
Tesfatsion (1978a),
a new approach to adaptive control is proposed
referred to as "criterion filtering." The principle distinguishing feature
of criterion filtering is the direct updating of the current expected return
function by means of a filtering operation on a vector of past return
functions. In this paper convergence properties are established for a simple
linear criterion function filter designed for a class of adaptive control
problems typified by a well-known two-armed bandit problem.
Leigh Tesfatsion (1977),
"A New Approach to Filtering and Adaptive Control: Optimality Results",
Journal of Cybernetics, Vol. 7, pp.133-146.
Abstract:
In a companion study
Tesfatsion (1978a),
a new approach to adaptive control is proposed
referred to as "criterion filtering." The principle distinguishing feature
of criterion filtering is the direct updating of the current expected return
function by means of a filtering operation on a vector of past return
functions. In this paper sufficient conditions are established for control
variables selected in accordance with a simple linear criterion filter to
converge to a global maximum of the true criterion function. When states
depend nontrivially on control variable selection, the decision maker
determines the trade-off between rate of convergence and global optimality by
the choice of the greatest lower bound for the prior (initial period)
criterion function. When states are independent of control variable
selection, the asymptotic global optimality of control variable selections
holds under weak restrictions.
Leigh Tesfatsion (1976),
"Bayes' Theorem for Utility"(pdf,867K),
Discussion Paper 76-65, Center for Economic Research, University of Minnesota.
Abstract:
This paper develops in depth the analogy between the direct updating of expected return functions on the basis of transitional return assessments and the use of Bayes' Rule to update probability distributions on the basis of transitional probability assessments.