Difference between revisions of "Minimax Tree Optimization"

From Chessprogramming wiki
Jump to: navigation, search
 
(2 intermediate revisions by the same user not shown)
Line 10: Line 10:
 
a [[Supervised Learning|supervised]] [[Automated Tuning|tuning method]] based on [[Automated Tuning#MoveAdaption|move adaptation]],
 
a [[Supervised Learning|supervised]] [[Automated Tuning|tuning method]] based on [[Automated Tuning#MoveAdaption|move adaptation]],
 
devised and introduced by [[Kunihito Hoki]] and [[Tomoyuki Kaneko]] <ref>[[Kunihito Hoki]], [[Tomoyuki Kaneko]] ('''2014'''). ''[https://www.jair.org/papers/paper4217.html Large-Scale Optimization for Evaluation Functions with Minimax Search]''. [https://www.jair.org/vol/vol49.html JAIR Vol. 49], [https://pdfs.semanticscholar.org/eb9c/173576577acbb8800bf96aba452d77f1dc19.pdf pdf]</ref>.  
 
devised and introduced by [[Kunihito Hoki]] and [[Tomoyuki Kaneko]] <ref>[[Kunihito Hoki]], [[Tomoyuki Kaneko]] ('''2014'''). ''[https://www.jair.org/papers/paper4217.html Large-Scale Optimization for Evaluation Functions with Minimax Search]''. [https://www.jair.org/vol/vol49.html JAIR Vol. 49], [https://pdfs.semanticscholar.org/eb9c/173576577acbb8800bf96aba452d77f1dc19.pdf pdf]</ref>.  
A MMTO predecessor, the initial '''Bonanza-Method''' was used in Hoki's [[Shogi]] engine [[Bonanaza]] in 2006, winning the [[WCSC16]] <ref>[[Kunihito Hoki]] ('''2006'''). ''Optimal control of minimax search result to learn positional evaluation''. [[Conferences#GPW|11th Game Programming Workshop]] (Japanese)</ref>.
+
A MMTO predecessor, the initial '''Bonanza-Method''' was used in Hoki's [[Shogi]] engine [[Bonanza]] in 2006, winning the [[WCSC16]] <ref>[[Kunihito Hoki]] ('''2006'''). ''Optimal control of minimax search result to learn positional evaluation''. [[Conferences#GPW|11th Game Programming Workshop]] (Japanese)</ref>.
The further improved MMTO version of Bonanaza won the [[WCSC23]] in 2013 <ref>[[Takenobu Takizawa]], [[Takeshi Ito]], [[Takuya Hiraoka]], [[Kunihito Hoki]] ('''2015'''). ''[https://link.springer.com/referenceworkentry/10.1007/978-3-319-08234-9_22-1 Contemporary Computer Shogi]''. [https://link.springer.com/referencework/10.1007/978-3-319-08234-9 Encyclopedia of Computer Graphics and Games]</ref>..
+
The further improved MMTO version of Bonanaza won the [[WCSC23]] in 2013 <ref>[[Takenobu Takizawa]], [[Takeshi Ito]], [[Takuya Hiraoka]], [[Kunihito Hoki]] ('''2015'''). ''[https://link.springer.com/referenceworkentry/10.1007/978-3-319-08234-9_22-1 Contemporary Computer Shogi]''. [https://link.springer.com/referencework/10.1007/978-3-319-08234-9 Encyclopedia of Computer Graphics and Games]</ref>.
  
 
=Move Adaptation=
 
=Move Adaptation=
A chess program has an [[Evaluation#Linear|linear evaluation function]] e(p,&omega;), where '''p''' is the [[Chess Position|game position]] and '''&omega;''' the feature weight vector to be adjusted for optimal play.  
+
A chess program has an [[Evaluation#Linear|linear evaluation function]] '''e(p,&omega;)''', where '''p''' is the [[Chess Position|game position]] and '''&omega;''' the feature weight vector to be adjusted for optimal play.  
 
The optimization procedure iterates over a set of selected positions from games assuming played by an [[Oracle|oracle]] with a desired move given.
 
The optimization procedure iterates over a set of selected positions from games assuming played by an [[Oracle|oracle]] with a desired move given.
 
All possible [[Moves|moves]] from this position are [[Make Move|made]] and the resulting position [[Evaluation|evaluated]].
 
All possible [[Moves|moves]] from this position are [[Make Move|made]] and the resulting position [[Evaluation|evaluated]].
Line 25: Line 25:
  
 
=MMTO=
 
=MMTO=
MMTO improved by performing a [[Minimax|minimax search]] (One or two [[Ply|ply]] plus [[Quiescence Search]]), by grid-adjacent update, and using [https://en.wikipedia.org/wiki/Constraint_(mathematics) equality constraint] and [https://en.wikipedia.org/wiki/Regularization_(mathematics) L1 regularization] to achieve [https://en.wikipedia.org/wiki/Scalability scalability] and [https://en.wikipedia.org/wiki/Stability stability].
+
MMTO improved by performing a [[Minimax|minimax search]] (One or two [[Ply|ply]] plus [[Quiescence Search|quiescence]]), by grid-adjacent update, and using [https://en.wikipedia.org/wiki/Constraint_(mathematics) equality constraint] and [https://en.wikipedia.org/wiki/Regularization_(mathematics) L1 regularization] to achieve [https://en.wikipedia.org/wiki/Scalability scalability] and [https://en.wikipedia.org/wiki/Stability stability].
 
==Objective Function==
 
==Objective Function==
MMTO's [https://en.wikipedia.org/wiki/Loss_function objective function] consists of the sum of three terms, where the first term J(P,&omega;) on the right side is the main part.
+
MMTO's [https://en.wikipedia.org/wiki/Loss_function objective function] consists of the sum of three terms, where the first term '''J(P,&omega;)''' on the right side is the main part.
 
[[FILE:mmtoObjectiveFunction2.jpg|none|text-bottom]]  
 
[[FILE:mmtoObjectiveFunction2.jpg|none|text-bottom]]  
The other terms J<sub>C</sub> and J<sub>R</sub> are constraint and regularization terms.
+
The other terms '''J<sub>C</sub>''' and '''J<sub>R</sub>''' are constraint and regularization terms.
J<sub>C</sub>(P,&omega;) = &lambda;<sub>0</sub>g(&omega;'), where &omega;' is subset of &omega;, g(&omega;')=0 is an [https://en.wikipedia.org/wiki/Constraint_(mathematics) equality constraint], and &lambda;<sub>0</sub> is a [https://en.wikipedia.org/wiki/Lagrange_multiplier Lagrange multiplier].
+
'''J<sub>C</sub>(P,&omega;) = &lambda;<sub>0</sub>g(&omega;')''', where '''&omega;' '''is subset of '''&omega;''', '''g(&omega;')=0''' is an [https://en.wikipedia.org/wiki/Constraint_(mathematics) equality constraint], and '''&lambda;<sub>0</sub>''' is a [https://en.wikipedia.org/wiki/Lagrange_multiplier Lagrange multiplier].
J<sub>R</sub>(P,&omega;) = &lambda;<sub>1</sub>|&omega;<nowiki>''</nowiki>| is the [https://en.wikipedia.org/wiki/Regularization_(mathematics) L1 regularization].
+
'''J<sub>R</sub>(P,&omega;) = &lambda;<sub>1</sub>|&omega;<nowiki>''</nowiki>|''' is the [https://en.wikipedia.org/wiki/Regularization_(mathematics) L1 regularization].
where &lambda;<sub>1</sub> is a constant > 0 and &omega;<nowiki>''</nowiki> is subset of &omega;. The main part of objective function is similar to the H-formula of the Move Adaptation chapter:
+
where '''&lambda;<sub>1</sub>''' is a constant > 0 and '''&omega;<nowiki>''</nowiki>''' is subset of '''&omega;'''. The main part of objective function is similar to the H-formula of the Move Adaptation chapter:
  
 
[[FILE:mmtoObjectiveFunction3.jpg|none|text-bottom]]  
 
[[FILE:mmtoObjectiveFunction3.jpg|none|text-bottom]]  
where s(p,&omega;) is the value identified by the [[Minimax|minimax]] search for position p. '''T(x) = 1/(1 + exp(ax))''',  
+
where '''s(p,&omega;)''' is the value identified by the [[Minimax|minimax]] search for position '''p'''. '''T(x) = 1/(1 + exp(ax))''',  
 
a [https://en.wikipedia.org/wiki/Sigmoid_function sigmoid function] with slope controlled by '''a''', to even become the [https://en.wikipedia.org/wiki/Heaviside_step_function Heaviside step function].  
 
a [https://en.wikipedia.org/wiki/Sigmoid_function sigmoid function] with slope controlled by '''a''', to even become the [https://en.wikipedia.org/wiki/Heaviside_step_function Heaviside step function].  
  
Line 41: Line 41:
 
The iterative optimization process has three steps:
 
The iterative optimization process has three steps:
 
# Perform a [[Minimax|minimax]] [[Search|search]] to identify [[Principal Variation|PV]] [[Leaf Node|leaves]] '''&pi;<sub>&omega;(t)</sub><sup>p.m</sup>''' for all child positions '''p.m''' of position '''p''' in training set '''P''', where '''&omega;(t)''' is the weight vector at the t-th iteration and '''&omega;(0)''' is the initial guess
 
# Perform a [[Minimax|minimax]] [[Search|search]] to identify [[Principal Variation|PV]] [[Leaf Node|leaves]] '''&pi;<sub>&omega;(t)</sub><sup>p.m</sup>''' for all child positions '''p.m''' of position '''p''' in training set '''P''', where '''&omega;(t)''' is the weight vector at the t-th iteration and '''&omega;(0)''' is the initial guess
# Calculate a partial-derivative approximation of the objective function using both '''&pi;<sub>&omega;(t)</sub><sup>p.m</sup>''' and '''&omega;(t)'''. The objective function employs a differentiable approximation of T(x), as well as a constraint and regularization term  
+
# Calculate a partial-derivative approximation of the objective function using both '''&pi;<sub>&omega;(t)</sub><sup>p.m</sup>''' and '''&omega;(t)'''. The objective function employs a differentiable approximation of '''T(x)''', as well as a constraint and regularization term  
 
# Obtain a new weight vector '''&omega;(t+1)''' from '''&omega;(t)''' by using a grid-adjacent update guided by the partial derivatives computed in step 2. Go back to step 1, or terminate the optimization when the objective function value converges  
 
# Obtain a new weight vector '''&omega;(t+1)''' from '''&omega;(t)''' by using a grid-adjacent update guided by the partial derivatives computed in step 2. Go back to step 1, or terminate the optimization when the objective function value converges  
  
Line 54: Line 54:
 
In each iteration, feature weights are updated on the basis of the [https://en.wikipedia.org/wiki/Partial_derivative partial derivatives] of the objective function.  
 
In each iteration, feature weights are updated on the basis of the [https://en.wikipedia.org/wiki/Partial_derivative partial derivatives] of the objective function.  
 
[[FILE:mmtoPartialDifferentation1.jpg|none|text-bottom]]  
 
[[FILE:mmtoPartialDifferentation1.jpg|none|text-bottom]]  
The J<sub>R</sub> derivative is treated in an intuitive manner [https://en.wikipedia.org/wiki/Sign_function sgn](&omega;<sub>i</sub>)&lambda;<sub>1</sub> for &omega;<sub>i</sub> &#8712; &omega;<nowiki>''</nowiki>, and 0 otherwise.
+
The '''J<sub>R</sub>''' derivative is treated in an intuitive manner [https://en.wikipedia.org/wiki/Sign_function sgn]'''(&omega;<sub>i</sub>)&lambda;<sub>1</sub>''' for '''&omega;<sub>i</sub> &#8712; &omega;<nowiki>''</nowiki>''', and '''0''' otherwise.
  
The partial derivative of the [https://en.wikipedia.org/wiki/Constraint_(mathematics) constraint] term J<sub>C</sub> is 0 for &omega;<sub>i</sub> &#x2209; &omega;'.
+
The partial derivative of the [https://en.wikipedia.org/wiki/Constraint_(mathematics) constraint] term '''J<sub>C</sub>''' is 0 for '''&omega;<sub>i</sub> &#x2209; &omega;' '''.
Otherwise, the [https://en.wikipedia.org/wiki/Lagrange_multiplier Lagrange multiplier] &lambda;<sub>0</sub> is set to the [https://en.wikipedia.org/wiki/M-estimator#Median median] of the partial derivatives
+
Otherwise, the [https://en.wikipedia.org/wiki/Lagrange_multiplier Lagrange multiplier] '''&lambda;<sub>0</sub>''' is set to the [https://en.wikipedia.org/wiki/M-estimator#Median median] of the partial derivatives in order to maintain the constraint '''g(&omega;) = 0''' in each iteration. As a result, '''∆&omega;′<sub>i</sub>''' is '''h''' for '''n''' feature weights, '''−h''' for '''n''' feature weights, and 0 in one feature weight, where the number of feature weights in &omega;' is 2n + 1.
in order to maintain the constraint g(&omega;) = 0 in each iteration. As a result, '''∆&omega;′<sub>i</sub>''' is '''h''' for '''n''' feature weights, '''−h''' for '''n''' feature weights, and 0 in one feature weight,  
 
where the number of feature weights in &omega;' is 2n + 1.
 
  
Since the objective function with the minimax values s(p, &omega;) is not always [https://en.wikipedia.org/wiki/Differentiable_function differentiable],
+
Since the objective function with the minimax values '''s(p, &omega;)''' is not always [https://en.wikipedia.org/wiki/Differentiable_function differentiable],
 
an approximation is used by using the evaluation of the [[Principal Variation|PV]] [[Leaf Node|leaf]]:
 
an approximation is used by using the evaluation of the [[Principal Variation|PV]] [[Leaf Node|leaf]]:
 
[[FILE:mmtoPartialDifferentation.jpg|none|text-bottom]]  
 
[[FILE:mmtoPartialDifferentation.jpg|none|text-bottom]]  
where T'(x) = d/dx T(x).
+
where '''T'(x) = d/dx T(x)'''.
  
 
=See also=
 
=See also=

Latest revision as of 08:15, 25 August 2020

Home * Automated Tuning * Minimax Tree Optimization

Bonanza full cast, 1962 [1]

Minimax Tree Optimization (MMTO),
a supervised tuning method based on move adaptation, devised and introduced by Kunihito Hoki and Tomoyuki Kaneko [2]. A MMTO predecessor, the initial Bonanza-Method was used in Hoki's Shogi engine Bonanza in 2006, winning the WCSC16 [3]. The further improved MMTO version of Bonanaza won the WCSC23 in 2013 [4].

Move Adaptation

A chess program has an linear evaluation function e(p,ω), where p is the game position and ω the feature weight vector to be adjusted for optimal play. The optimization procedure iterates over a set of selected positions from games assuming played by an oracle with a desired move given. All possible moves from this position are made and the resulting position evaluated. Each move obtaining a higher score than the desired move adds a penalty to the objective function to be minimized, for instance [5]:

MmtoObjectiveFunction1.jpg

Here, p.m is the position after move m in p, dp is the desired move in p, ℳ′p is the set of all legal moves in p excluding dp, and H(x) is the Heaviside step function. The numerical procedures to minimize such an objective function are complicated, and the adjustment of a large-scale vector ω seemed to present practical difficulties considering partial derivation and local versus global minima.

MMTO

MMTO improved by performing a minimax search (One or two ply plus quiescence), by grid-adjacent update, and using equality constraint and L1 regularization to achieve scalability and stability.

Objective Function

MMTO's objective function consists of the sum of three terms, where the first term J(P,ω) on the right side is the main part.

MmtoObjectiveFunction2.jpg

The other terms JC and JR are constraint and regularization terms. JC(P,ω) = λ0g(ω'), where ω' is subset of ω, g(ω')=0 is an equality constraint, and λ0 is a Lagrange multiplier. JR(P,ω) = λ1|ω''| is the L1 regularization. where λ1 is a constant > 0 and ω'' is subset of ω. The main part of objective function is similar to the H-formula of the Move Adaptation chapter:

MmtoObjectiveFunction3.jpg

where s(p,ω) is the value identified by the minimax search for position p. T(x) = 1/(1 + exp(ax)), a sigmoid function with slope controlled by a, to even become the Heaviside step function.

Optimization

The iterative optimization process has three steps:

  1. Perform a minimax search to identify PV leaves πω(t)p.m for all child positions p.m of position p in training set P, where ω(t) is the weight vector at the t-th iteration and ω(0) is the initial guess
  2. Calculate a partial-derivative approximation of the objective function using both πω(t)p.m and ω(t). The objective function employs a differentiable approximation of T(x), as well as a constraint and regularization term
  3. Obtain a new weight vector ω(t+1) from ω(t) by using a grid-adjacent update guided by the partial derivatives computed in step 2. Go back to step 1, or terminate the optimization when the objective function value converges

Because step 1 is the most time-consuming part, it is worth considering omitting it by assuming the PV does not change between iterations. In their experiments, Hoki and Kaneko used steps 2 and 3 32 times without running step 1.

Grid-Adjacent Update

MMTO uses grid-adjacent update to get ω(t+1) from ω(t) using a small integer h along with the sgn function of the partial derivative approximation.

MmtoGridAdjacentUpdate.jpg

Partial Derivative Approximation

In each iteration, feature weights are updated on the basis of the partial derivatives of the objective function.

MmtoPartialDifferentation1.jpg

The JR derivative is treated in an intuitive manner sgni1 for ωi ∈ ω'', and 0 otherwise.

The partial derivative of the constraint term JC is 0 for ωi ∉ ω' . Otherwise, the Lagrange multiplier λ0 is set to the median of the partial derivatives in order to maintain the constraint g(ω) = 0 in each iteration. As a result, ∆ω′i is h for n feature weights, −h for n feature weights, and 0 in one feature weight, where the number of feature weights in ω' is 2n + 1.

Since the objective function with the minimax values s(p, ω) is not always differentiable, an approximation is used by using the evaluation of the PV leaf:

MmtoPartialDifferentation.jpg

where T'(x) = d/dx T(x).

See also

Publications

Forum Posts

References

  1. Photo of the full cast of the television program Bonanza on the porch of the Ponderosa from 1962. From top: Lorne Greene, Dan Blocker, Michael Landon, Pernell Roberts. This episode, "Miracle Maker", aired in May 1962, Author: Pat MacDermott Company, Directional Public Relations, for Chevrolet, the sponsor of the program. During the 1950s and 1960s, publicity information was often distributed through ad or public relations agencies by the network, studio, or program's sponsor. In this case, the PR agency was making this available for Chevrolet--the little "plug" about their vehicles is seen in the release. Category:Bonanza (TV series) - Wikimedia Commons
  2. Kunihito Hoki, Tomoyuki Kaneko (2014). Large-Scale Optimization for Evaluation Functions with Minimax Search. JAIR Vol. 49, pdf
  3. Kunihito Hoki (2006). Optimal control of minimax search result to learn positional evaluation. 11th Game Programming Workshop (Japanese)
  4. Takenobu Takizawa, Takeshi Ito, Takuya Hiraoka, Kunihito Hoki (2015). Contemporary Computer Shogi. Encyclopedia of Computer Graphics and Games
  5. Description and Formulas based on Kunihito Hoki, Tomoyuki Kaneko (2014). Large-Scale Optimization for Evaluation Functions with Minimax Search. JAIR Vol. 49, pdf

Up one Level