Movatterモバイル変換

[0]ホーム

Jump to content

Backward induction

Edit links

From Wikipedia, the free encyclopedia

Process of reasoning backwards in sequence

Not to be confused withBackpropagation.

Backward induction is the process of determining asequence of optimal choices by reasoning from the endpoint of a problem or situation back to its beginning using individual events or actions.^[1] Backward induction involves examining the final point in a series of decisions and identifying the optimal process or action required to arrive at that point. This process continues backward until the best action for every possible point along the sequence is determined. Backward induction was first utilized in 1875 byArthur Cayley, who discovered the method while attempting to solve thesecretary problem.^[2]

Indynamic programming, a method ofmathematical optimization, backward induction is used for solving theBellman equation.^[3]^[4] In the related fields ofautomated planning and scheduling andautomated theorem proving, the method is called backward search orbackward chaining. In chess, it is calledretrograde analysis.

Ingame theory, a variant of backward induction is used to computesubgame perfect equilibria insequential games.^[5] The difference is that optimization problems involve onedecision maker who chooses what to do at each point of time. In contrast, game theory problems involve the interacting decision of severalplayers. In this situation, it may still be possible to apply a generalization of backward induction, since it may be possible to determine what the second-to-last player will do by predicting what the last player will do in each situation, and so on. This variant of backward induction has been used to solve formal games from the beginning of game theory.John von Neumann andOskar Morgenstern suggested solvingzero-sum, two-person formal games through this method in theirTheory of Games and Economic Behaviour (1944), the book which established game theory as a field of study.^[6]^[7]

Decision-making example

[edit]

Optimal-stopping problem

[edit]

Consider a person evaluating potential employment opportunities for the next ten years, denoted as times $t=1,2,3,...,10$ . At each $t {\displaystyle t}$ , they may encounter a choice between two job options: a 'good' job offering asalary of $\$100$ or a 'bad' job offering a salary of $\$44$ . Each job type has an equal probability of being offered. Upon accepting a job, the individual will maintain that particular job for the entire remainder of the ten-year duration.

This scenario is simplified by assuming that the individual's entire concern is their total expected monetary earnings, without any variable preferences for earnings across different periods. In economic terms, this is a scenario with an implicitinterest rate of zero and a constantmarginal utility of money.

Whether the person in question should accept a 'bad' job can be decided by reasoning backwards from time $t=10$ .

At $t=10$ , the total earnings from accepting a 'good' job is $\$100$ ; the value of accepting a 'bad' job is $\$44$ ; the total earnings from rejecting the available job is $\$0$ . Therefore, if they are still unemployed in the last period, they should accept whatever job they are offered at that time for greater income.
At $t=9$ , the total earnings from accepting a 'good' job is $2\times \$100=\$200$ because that job will last for two years. The total earnings from accepting a 'bad' job is $2\times \$44=\$88$ . The total expected earnings from rejecting a job offer are $\$0$ now plus the value of the next job offer, which will either be $\$44$ with 1/2 probability or $\$100$ with 1/2 probability, for an average ('expected') value of ${\frac {\$100+\$44}{2}}=\$72$ . Therefore, the job available at $t=9$ should be accepted.
At $t=8$ , the total earnings from accepting a 'good' job is $3\times \$100=\$300$ ; the total earnings from accepting a 'bad' job is $3\times \$44=\$132$ . The total expected earnings from rejecting a job offer is $\$0$ now plus the total expected earnings from waiting for a job offer at $t=9$ . As previously concluded, any offer at $t=9$ should be accepted and the expected value of doing so is ${\frac {\$200+\$88}{2}}=\$144$ . Therefore, at $t=8$ , total expected earnings are higher if the person waits for the next offer rather than accepting a 'bad' job.

By continuing to work backwards, it can be verified that a 'bad' offer should only be accepted if the person is still unemployed at $t=9$ or $t=10$ ; a bad offer should be rejected at any time up to and including $t=8$ . Generalizing this example intuitively, it corresponds to the principle that if one expects to work in a job for a long time, it is worth picking carefully.

A dynamic optimization problem of this kind is called anoptimal stopping problem because the issue at hand is when to stop waiting for a better offer.Search theory is a field of microeconomics that applies models of this type to matters such as shopping, job searches, and marriage.

Game theory

[edit]

Ingame theory, backward induction is a solution methodology that follows from applying sequential rationality to identify an optimal action for each information set in a givengame tree. It develops the implications of rationality via individual information sets in theextensive-form representation of a game.^[8]

In order to solve for asubgame perfect equilibrium with backwards induction, the game should be written out inextensive form and then divided intosubgames. Starting with the subgame furthest from the initial node, or starting point, the expected payoffs listed for this subgame are weighed, and a rational player will select the option with the higher payoff for themselves. The highest payoffvector is selected and marked. To solve for the subgame perfect equilibrium, one should continually work backwards from subgame to subgame until the starting point is reached. As this process progresses, the initial extensive form game will become shorter and shorter. The marked path of vectors is the subgame perfect equilibrium.^[9]

Multi-stage game

[edit]

The application of backward induction in game theory can be demonstrated with a simple example. Consider amulti-stage game involving two players planning to go to a movie.

Player 1 wants to watchThe Terminator, and Player 2 wants to watchJoker.
Player 1 will buy a ticket first and tell Player 2 about her choice.
Next, Player 2 will buy his ticket.

Once they both observe the choices, the second stage begins. In the second stage, players choose whether to go to the movie or stay home.

As in the first stage, Player 1 chooses whether to go to the movie first.
After observing Player 1's choice, Player 2 makes his choice.

For this example, payoffs are added across different stages. The game is aperfect information game. Thenormal-form matrices for these games are:

*Stage 1*
Player 2 Player 1	Joker	Terminator
Joker	3, 5	0, 0
Terminator	1, 1	5, 3

*Stage 2*
Player 2 Player 1	Go to Movie	Stay Home
Go to Movie	6, 6	4, -2
Stay Home	-2, 4	-2, -2

Extensive form for the Joker-Terminator game

Theextensive form of this multi-stage game can be seen to the right. The steps for solving this game with backward induction are as follows:

Analysis starts from the final nodes.
Player 2 will observe 8subgames from the final nodes to choose "go to movie" or "stay home".
- Player 2 would make 8 possible comparisons in total, choosing the option with the highest payoff in each.
- For example, considering the first subgame, Player 2's payoff of 11 for "go to movie" is higher than his payoff of 7 for "stay at home." Player 2 would therefore choose "go to movie."
- The method continues for every subgame.
Once Player 2's optimal decisions have been determined (bolded green lines in the extensive form diagram), analysis starts for Player 1's decisions in her 4 subgames.
- The process is similar to step 2, comparing Player 1's payoffs in order to anticipate her choices.
- Subgames that would not be chosen by Player 2 in the previous step are no longer considered because they are ruled out by the assumption of rational play.
- For example, in the first subgame, the choice "go to movie" offers a payoff of 9 since the decision tree terminates at the reward (9, 11), considering Player 2's previously established choice. Meanwhile, "stay home" offers a payoff of 1 since it ends at (1, 9), so Player 1 would choose "go to movie."
The process repeats for each player until the initial node is reached.
- For example, Player 2 would choose "Joker" for the first subgame in the next iteration because a payoff of 11 ending in (9, 11) is greater than "Terminator" with a payoff of 6 at (6, 6).
- Player 1, at the initial node, would select "Terminator" because it offers a higher payoff of 11 at (11, 9) than Joker, which has a reward of 9 at (9, 11).
To identify asubgame perfect equilibrium, one needs to identify a route that selects an optimal subgame at each information set.
- In this example, Player 1 chooses "Terminator" and Player 2 also chooses "Terminator." Then they both choose "go to movie."
- The subgame perfect equilibrium leads to a payoff of (11,9).

Limitations

[edit]

Backward induction can be applied to only limited classes of games. The procedure is well-defined for any game of perfect information with no ties of utility. It is also well-defined and meaningful for games of perfect information with ties. However, in such cases it leads to more than one perfect strategy. The procedure can be applied to some games with nontrivial information sets, but it is not applicable in general. It is best suited to solve games with perfect information. If all players are not aware of the other players' actions and payoffs at each decision node, then backward induction is not so easily applied.^[10]

Ultimatum game

[edit]

Economics

[edit]

Entry-decision problem

[edit]

Adynamic game in which the players are an incumbent firm in an industry and a potential entrant to that industry is to be considered. As it stands, the incumbent has amonopoly over the industry and does not want to lose some of its market share to the entrant. If the entrant chooses not to enter, the payoff to the incumbent is high (it maintains its monopoly) and the entrant neither loses nor gains (its payoff is zero). If the entrant enters, the incumbent can "fight" or "accommodate" the entrant. It will fight by lowering its price, running the entrant out of business (and incurring exit costs—a negative payoff) and damaging its own profits. If it accommodates the entrant it will lose some of its sales, but a high price will be maintained and it will receive greater profits than by lowering its price (but lower than monopoly profits).

If the incumbent accommodates given the case that the entrant enters, the best response for the entrant is to enter (and gain profit). Hence the strategy profile in which the entrant enters and the incumbent accommodates if the entrant enters is aNash equilibrium consistent with backward induction. However, if the incumbent is going to fight, the best response for the entrant is to not enter, and if the entrant does not enter, it does not matter what the incumbent chooses to do in the hypothetical case that the entrant does enter. Hence the strategy profile in which the incumbent fights if the entrant enters, but the entrant does not enter is also a Nash equilibrium. However, were the entrant to deviate and enter, the incumbent's best response is to accommodate—the threat of fighting is not credible. This second Nash equilibrium can therefore be eliminated by backward induction.

Finding a Nash equilibrium in each decision-making process (subgame) constitutes as perfect subgame equilibria. Thus, these strategy profiles that depict subgame perfect equilibria exclude the possibility of actions like incredible threats that are used to "scare off" an entrant. If the incumbent threatens to start aprice war with an entrant, they are threatening to lower their prices from a monopoly price to slightly lower than the entrant's, which would be impractical, and incredible, if the entrant knew a price war would not actually happen since it would result in losses for both parties. Unlike a single-agent optimization which might include suboptimal or infeasible equilibria, a subgame perfect equilibrium accounts for the actions of another player, ensuring that no player reaches a subgame mistakenly. In this case, backwards induction yielding perfect subgame equilibria ensures that the entrant will not be convinced of the incumbent's threat knowing that it was not a best response in the strategy profile.^[13]

Unexpected hanging paradox

[edit]

Main article:Unexpected hanging paradox

Theunexpected hanging paradox is aparadox related to backward induction. The prisoner described in the paradox uses backwards induction to reach a false conclusion. The description of the problem assumes it is possible to surprise someone who is performing backward induction. The mathematical theory of backward induction does not make this assumption, so the paradox does not call into question the results of this theory.

Common knowledge of rationality

[edit]

Backward induction works only if both players arerational, i.e., always select an action that maximizes their payoff. However, rationality is not enough: each player should also believe that all other players are rational. Even this is not enough: each player should believe that all other players know that all other players are rational, and so on, ad infinitum. In other words, rationality should becommon knowledge.^[14]

Limited backward induction

[edit]

Limited backward induction is a deviation from fully rational backward induction. It involves enacting the regular process of backward induction without perfect foresight. Theoretically, this occurs when one or more players have limited foresight and cannot perform backward induction through all terminal nodes.^[15] Limited backward induction plays a much larger role in longer games as the effects of limited backward induction are more potent in later periods of games.

A four-stage sequential game with a foresight bound

Experiments have shown that in sequential bargaining games, such as theCentipede game, subjects deviate from theoretical predictions and instead engage in limited backward induction. This deviation occurs as a result ofbounded rationality, where players can only perfectly see a few stages ahead.^[16] This allows for unpredictability in decisions and inefficiency in finding and achievingsubgame perfect Nash equilibria.

There are three broad hypotheses for this phenomenon:

The presence of social factors (e.g. fairness)
The presence of non-social factors (e.g. limited backward induction)
Cultural difference

Violations of backward induction is predominantly attributed to the presence of social factors. However, data-driven model predictions for sequential bargaining games (using thecognitive hierarchy model) have highlighted that in some games the presence of limited backward induction can play a dominant role.^[17]

Within repeated public goods games, team behavior is impacted by limited backward induction; where it is evident that team members' initial contributions are higher than contributions towards the end. Limited backward induction also influences how regularly free-riding occurs within a team's public goods game. Early on, when the effects of limited backward induction are low, free riding is less frequent, whilst towards the end, when effects are high, free-riding becomes more frequent.^[18]

Limited backward induction has also been tested for within a variant of the race game. In the game, players would sequentially choose integers inside a range and sum their choices until a target number is reached. Hitting the target earns that player a prize; the other loses. Partway through a series of games, a small prize was introduced. The majority of players then performed limited backward induction, as they solved for the small prize rather than for the original prize. Only a small fraction of players considered both prizes at the start.^[19]

Most tests of backward induction are based on experiments, in which participants are only to a small extent incentivized to perform the task well, if at all. However, violations of backward induction also appear to be common in high-stakes environments. A large-scale analysis of the American television game showThe Price Is Right, for example, provides evidence of limited foresight. In every episode, contestants play theShowcase Showdown, a sequential game of perfect information for which the optimal strategy can be found through backward induction. The frequent and systematic deviations from optimal behavior suggest that a sizable proportion of the contestants fail to properly backward induct and myopically consider the next stage of the game only.^[20]

Notes

[edit]

^"Non-credible threats, subgame perfect equilibrium and backward induction",Game Theory, Cambridge University Press, pp. 317–332, 2012-05-31, retrieved2024-04-04
^Rust, John (9 September 2016).Dynamic Programming. The New Palgrave Dictionary of Economics: Palgrave Macmillan.ISBN 978-1-349-95121-5.
^Adda, Jerome; Cooper, Russell W. (2003-08-29).Dynamic Economics: Quantitative Methods and Applications. MIT Press.ISBN 978-0-262-01201-0.
^Mario Miranda and Paul Fackler, "Applied Computational Economics and Finance", Section 7.3.1, page 164. MIT Press, 2002.
^Drew Fudenberg and Jean Tirole, "Game Theory", Section 3.5, page 92. MIT Press, 1991.
^MacQuarrie, John. "4, Fundamentals".Mathematics and Chess. University of St Andrews. Retrieved2023-11-25.
^von Neumann, John; Morgenstern, Oskar (1953). "Section 15.3.1.".Theory of Games and Economic Behavior (Third ed.). Princeton University Press.
^Watson, Joel (2002).Strategy: an introduction to game theory (3 ed.). New York: W.W. Norton & Company. p. 63.
^Rust, John (9 September 2016).Dynamic Programming. The New Palgrave Dictionary of Economics: Palgrave Macmillan.ISBN 978-1-349-95121-5.
^Watson, Joel (2002).Strategy: an introduction to game theory (3 ed.). New York: W.W. Norton & Company. p. 188.
^Kamiński, Marek M. (2017)."Backward Induction: Merits And Flaws".Studies in Logic, Grammar and Rhetoric.50 (1):9–24.doi:10.1515/slgr-2017-0016.
^Camerer, Colin F (1 November 1997)."Progress in Behavioral Game Theory".Journal of Economic Perspectives.11 (4):167–188.doi:10.1257/jep.11.4.167.JSTOR 2138470. Archived fromthe original on 14 December 2022. Retrieved19 December 2019.
^Rust J. (2008) Dynamic Programming. In: Palgrave Macmillan (eds) The New Palgrave Dictionary of Economics. Palgrave Macmillan, London
^Aumann, Robert J. (January 1995). "Backward induction and common knowledge of rationality".Games and Economic Behavior.8 (1):6–19.doi:10.1016/S0899-8256(05)80015-6.
^Marco Mantovani, 2015. "Limited backward induction: foresight and behavior in sequential games," Working Papers 289, University of Milano-Bicocca, Department of Economics
^Ke, Shaowei (2019)."Boundedly rational backward induction".Theoretical Economics.14 (1):103–134.doi:10.3982/TE2402.hdl:2027.42/147808.S2CID 9053484.
^Qu, Xia; Doshi, Prashant (1 March 2017). "On the role of fairness and limited backward induction in sequential bargaining games".Annals of Mathematics and Artificial Intelligence.79 (1):205–227.doi:10.1007/s10472-015-9481-7.S2CID 23565130.
^Cox, Caleb A.; Stoddard, Brock (May 2018). "Strategic thinking in public goods games with teams".Journal of Public Economics.161:31–43.doi:10.1016/j.jpubeco.2018.03.007.
^Mantovani, Marco (2013). "Limited backward induction".CiteSeerX 10.1.1.399.8991.
^Klein Teeselink, Bouke; van Dolder, Dennie; van den Assem, Martijn; Dana, Jason (2022)."High-Stakes Failures of Backward Induction".

Game theory

Traditionalgame theory

Definitions	Asynchrony Bayesian regret Best response Bounded rationality Cheap talk Coalition Complete contract Complete information Complete mixing Confrontation analysis Conjectural variation Contingent cooperator Coopetition Cooperative game theory Dynamic inconsistency Escalation of commitment Farsightedness Game semantics Hierarchy of beliefs Imperfect information Incomplete information Information set Move by nature Mutual knowledge Non-cooperative game theory Non-credible threat Outcome Perfect information Perfect recall Ply Preference Rationality Sequential game Simultaneous action selection Spite Strategic complements Strategic dominance Strategic form Strategic interaction Strategic move Strategy Subgame Succinct game Topological game Tragedy of the commons Uncorrelated asymmetry
Equilibrium concepts	Backward induction Bayes correlated equilibrium Bayesian efficiency Bayesian game Bayesian Nash equilibrium Berge equilibrium Bertrand–Edgeworth model Coalition-proof Nash equilibrium Core Correlated equilibrium Cursed equilibrium Edgeworth price cycle Epsilon-equilibrium Gibbs equilibrium Incomplete contracts Inequity aversion Individual rationality Iterated elimination of dominated strategies Markov perfect equilibrium Mertens-stable equilibrium Nash equilibrium Open-loop model Pareto efficiency Payoff dominance Perfect Bayesian equilibrium Price of anarchy Program equilibrium Proper equilibrium Quantal response equilibrium Quasi-perfect equilibrium Rational agent Rationalizability Rationalizable strategy Satisfaction equilibrium Self-confirming equilibrium Sequential equilibrium Shapley value Strong Nash equilibrium Subgame perfect equilibrium Trembling hand equilibrium
Strategies	Appeasement Bid shading Cheap talk Collusion Commitment device De-escalation Deterrence Escalation Fictitious play Focal point Grim trigger Hobbesian trap Markov strategy Max-dominated strategy Mixed strategy Pure strategy Tit for tat Win–stay, lose–switch
Games	All-pay auction Battle of the sexes Nash bargaining game Bertrand competition Blotto game Centipede game Coordination game Cournot competition Deadlock Dictator game Trust game Diner's dilemma Dollar auction El Farol Bar problem Electronic mail game Gift-exchange game Guess 2/3 of the average Keynesian beauty contest Kuhn poker Lewis signaling game Matching pennies Obligationes Optional prisoner's dilemma Pirate game Prisoner's dilemma Public goods game Rendezvous problem Rock paper scissors Stackelberg competition Stag hunt Traveler's dilemma Ultimatum game Volunteer's dilemma War of attrition
Theorems	Arrow's impossibility theorem Aumann's agreement theorem Brouwer fixed-point theorem Competitive altruism Folk theorem Gibbard–Satterthwaite theorem Gibbs lemma Glicksberg's theorem Kakutani fixed-point theorem Kuhn's theorem One-shot deviation principle Prim–Read theory Rational ignorance Rational irrationality Sperner's lemma Zermelo's theorem
Subfields	Algorithmic game theory Behavioral game theory Behavioral strategy Compositional game theory Contract theory Drama theory Graphical game theory Heresthetic Mean-field game theory Negotiation theory Quantum game theory Social software
Key people	Albert W. Tucker Alvin E. Roth Amos Tversky Antoine Augustin Cournot Ariel Rubinstein David Gale David K. Levine David M. Kreps Donald B. Gillies Drew Fudenberg Eric Maskin Harold W. Kuhn Herbert Simon Herbert Scarf Hervé Moulin Jean Tirole Jean-François Mertens Jennifer Tour Chayes Ken Binmore Kenneth Arrow Leonid Hurwicz Lloyd Shapley Martin Shubik Melvin Dresher Merrill M. Flood Olga Bondareva Oskar Morgenstern Paul Milgrom Peyton Young Reinhard Selten Robert Aumann Robert Axelrod Robert B. Wilson Roger Myerson Samuel Bowles Suzanne Scotchmer Thomas Schelling William Vickrey

Combinatorial game theory

Core concepts	Combinatorial explosion Determinacy Disjunctive sum First-player and second-player win Game complexity Game tree Impartial game Misère Partisan game Solved game Sprague–Grundy theorem Strategy-stealing argument Zugzwang
Games	Chess Chomp Clobber Cram Domineering Hackenbush Nim Notakto Subtract a square Sylver coinage Toads and Frogs
Mathematical tools	Mex Nimber On Numbers and Games Star Surreal number Winning Ways for Your Mathematical Plays
Search algorithms	Alpha–beta pruning Expectiminimax Minimax Monte Carlo tree search Negamax Paranoid algorithm Principal variation search
Key people	Claude Shannon John Conway John von Neumann

Evolutionary game theory

Core concepts	Bishop–Cannings theorem Evolution and the Theory of Games Evolutionarily stable set Evolutionarily stable state Evolutionarily stable strategy Replicator equation Risk dominance Stochastically stable equilibrium Weak evolutionarily stable strategy
Games	Chicken Stag hunt
Applications	Cultural group selection Fisher's principle Mobbing Terminal investment hypothesis
Key people	John Maynard Smith Robert Axelrod

Mechanism design

Core concepts	Algorithmic mechanism design Bayesian-optimal mechanism Incentive compatibility Market design Monotonicity Participation constraint Revelation principle Strategyproofness Vickrey–Clarke–Groves mechanism
Theorems	Myerson–Satterthwaite theorem Revenue equivalence
Applications	Digital goods auction Knapsack auction Truthful cake-cutting