Movatterモバイル変換

Expectiminimax

From Wikipedia, the free encyclopedia

Variation of the minimax algorithm

Expectiminimax
Class	Search algorithm
Worst-case performance	${\mathcal {O}}(b^{m}n^{m})$ , where $n {\displaystyle n}$ is the number of distinct dice throws
Best-case performance	${\mathcal {O}}(b^{m})$ , in case all dice throws are known in advance

Theexpectiminimax algorithm is a variation of theminimax algorithm, for use inartificial intelligence systems that play two-playerzero-sum games, such asbackgammon, in which the outcome depends on a combination of the player's skill andchance elements such as dice rolls. In addition to "min" and "max" nodes of the traditional minimax tree, this variant has "chance" ("move by nature") nodes, which take theexpected value of a random event occurring.^[1] Ingame theory terms, an expectiminimax tree is the game tree of anextensive-form game ofperfect, butincomplete information.

In the traditionalminimax method, the levels of the tree alternate from max to min until the depth limit of the tree has been reached. In an expectiminimax tree, the "chance" nodes are interleaved with the max and min nodes. Instead of taking the max or min of theutility values of their children, chance nodes take a weighted average, with the weight being the probability that child is reached.^[1]

The interleaving depends on the game. Each "turn" of the game is evaluated as a "max" node (representing the AI player's turn), a "min" node (representing a potentially-optimal opponent's turn), or a "chance" node (representing a random effect or player).^[1]

For example, consider a game in which each round consists of a single die throw, and then decisions made by first the AI player, and then another intelligent opponent. The order of nodes in this game would alternate between "chance", "max" and then "min".^[1]

Pseudocode

[edit]

The expectiminimax algorithm is a variant of theminimax algorithm and was firstly proposed byDonald Michie in 1966.^[2]Itspseudocode is given below.

function expectiminimax(node, depth)if node is a terminal nodeor depth = 0return the heuristic value of nodeif the adversary is to play at node        // Return value of minimum-valued child nodelet α := +∞foreach child of node            α := min(α, expectiminimax(child, depth-1))else if we are to play at node        // Return value of maximum-valued child nodelet α := -∞foreach child of node            α := max(α, expectiminimax(child, depth-1))else if random event at node        // Return weighted average of all child nodes' valueslet α := 0foreach child of node            α := α + (Probability[child] × expectiminimax(child, depth-1))return α

Note that for random nodes, there must be a known probability of reaching each child. (For most games of chance, child nodes will be equally-weighted, which means the return value can simply be the average of all child values.)

Expectimax search

[edit]

Expectimax search is a variant described inUniversal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability (2005) by Tom Everitt andMarcus Hutter.

Alpha-beta pruning

[edit]

Bruce Ballard was the first to develop a technique, called *-minimax, that enablesalpha-beta pruning in expectiminimax trees.^[3]^[4] The problem with integratingalpha-beta pruning into the expectiminimax algorithm is that the scores of a chance node's children may exceed the alpha or beta bound of its parent, even if the weighted value of each child does not. However, it is possible to bound the scores of a chance node's children, and therefore bound the score of the CHANCE node.

If a standard iterative search is about to score the $i {\displaystyle i}$ th child of a chance node with $N {\displaystyle N}$ equally likely children, that search has computed scores $v_{1},v_{2},\ldots ,v_{i-1}$ for child nodes 1 through $i-1$ . Assuming a lowest possible score $L {\displaystyle L}$ and a highest possible score $U {\displaystyle U}$ for each unsearched child, the bounds of the chance node's score is as follows:

${\text{score}}\leq {\frac {1}{n}}\left((v_{1}+\ldots +v_{i-1})+v_{i}+U\times (n-i)\right)$

${\text{score}}\geq {\frac {1}{n}}\left((v_{1}+\ldots +v_{i-1})+v_{i}+L\times (n-i)\right)$

If an alpha and/or beta bound are given in scoring the chance node, these bounds can be used to cut off the search of the $i {\displaystyle i}$ th child. The above equations can be rearranged to find a new alpha & beta value that will cut off the search if it would cause the chance node to exceed its own alpha and beta bounds:

$\alpha _{i}=N\times \alpha -\left(v_{1}+\ldots +v_{i-1}\right)+U\times (n-i)$

$\beta _{i}=N\times \beta -\left(v_{1}+\ldots +v_{i-1}\right)+L\times (n-i)$

Thepseudocode for extending expectiminimax with fail-hardalpha-beta pruning in this manner is as follows:

function *-minimax(node, depth, α, β)if node is a terminal nodeor depth = 0return the heuristic value of nodeif node is a max or min nodereturn the minimax value of the nodelet N = numSuccessors(node)    // Compute α, β for childrenlet A = N * (α - U) + Ulet B = N * (β - L) + Llet sum = 0foreach child of node        // Limit child α, β to a valid rangelet AX = max(A, L)let BX = min(B, U)        // Search the child with new cutoff valueslet score = *-minimax(child, depth - 1, AX, BX)        // Check for α, β cutoff conditionsif score <= Areturn αif score >= Breturn β        sum += score        // Adjust α, β for the next child        A += U - v        B += L - v    // No cutoff occurred, return scorereturn sum / N

This technique is one of a family of variants of algorithms which can bound the search of a CHANCE node and its children based on collecting lower and upper bounds of the children during search. Other techniques which can offer performance benefits include probing each child with a heuristic to establish a min or max before performing a full search on each child, etc.

References

[edit]

^^a ^b ^c ^dRussell, Stuart Jonathan; Norvig, Peter; Davis, Ernest (2010).Artificial Intelligence: A Modern Approach. Prentice Hall. pp. 177–178.ISBN 978-0-13-604259-4.
^Michie, D. (1966). "Game-Playing and Game-Learning Automata".Advances in Programming and Non-Numerical Computation. pp. 183–200.doi:10.1016/B978-0-08-011356-2.50011-2.ISBN 978-0-08-011356-2.
^Ballard, Bruce W. (September 1983). "The *-minimax search procedure for trees containing chance nodes".Artificial Intelligence.21 (3):327–350.doi:10.1016/S0004-3702(83)80015-0.
^Hauk, Thomas; Buro, Michael; Schaeffer, Jonathan (2006). "Rediscovering *-Minimax Search".Computers and Games. Lecture Notes in Computer Science. Vol. 3846. pp. 35–50.doi:10.1007/11674399_3.ISBN 978-3-540-32488-1.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Expectiminimax&oldid=1292114646"

Categories:

Hidden categories:

[8]ページ先頭

Movatterモバイル変換

Pseudocode

Expectimax search

Alpha-beta pruning

See also

References