Changes

Newer edit →

Minimax Tree Optimization

10,833 bytes added, 16:45, 22 August 2020

'''[[Main Page|Home]] * [[Automated Tuning]] * Minimax Tree Optimization'''

[[FILE:Bonanza full cast 1962 larger.jpg|border|right|thumb|240px|
[https://en.wikipedia.org/wiki/Bonanza Bonanza] full cast, 1962 <ref>[https://commons.wikimedia.org/wiki/File:Bonanza_full_cast_1962_larger.jpg Photo] of the full cast of the television program [https://en.wikipedia.org/wiki/Bonanza Bonanza] on the porch of the [https://en.wikipedia.org/wiki/Ponderosa_Ranch Ponderosa] from 1962.
From top: [https://en.wikipedia.org/wiki/Lorne_Greene Lorne Greene], [https://en.wikipedia.org/wiki/Dan_Blocker Dan Blocker], [https://en.wikipedia.org/wiki/Michael_Landon Michael Landon], [https://en.wikipedia.org/wiki/Pernell_Roberts Pernell Roberts]. This episode, "Miracle Maker", aired in [http://en.wikipedia.org/wiki/List_of_Bonanza_episodes May 1962],
Author: Pat MacDermott Company, Directional Public Relations, for [https://en.wikipedia.org/wiki/Chevrolet Chevrolet], the sponsor of the program. During the 1950s and 1960s, publicity information was often distributed through ad or public relations agencies by the network, studio, or program's sponsor.
In this case, the PR agency was making this available for Chevrolet--the little "plug" about their vehicles is seen in the release. [https://commons.wikimedia.org/wiki/Category:Bonanza_(TV_series) Category:Bonanza (TV series) - Wikimedia Commons]</ref> ]]

'''Minimax Tree Optimization''' (MMTO), 
a [[Supervised Learning|supervised]] [[Automated Tuning|tuning method]] based on [[Automated Tuning#MoveAdaption|move adaptation]],
devised and introduced by [[Kunihito Hoki]] and [[Tomoyuki Kaneko]] <ref>[[Kunihito Hoki]], [[Tomoyuki Kaneko]] ('''2014'''). ''[https://www.jair.org/papers/paper4217.html Large-Scale Optimization for Evaluation Functions with Minimax Search]''. [https://www.jair.org/vol/vol49.html JAIR Vol. 49], [https://pdfs.semanticscholar.org/eb9c/173576577acbb8800bf96aba452d77f1dc19.pdf pdf]</ref>.
A MMTO predecessor, the initial '''Bonanza-Method''' was used in Hoki's [[Shogi]] engine [[Bonanaza]] in 2006, winning the [[WCSC16]] <ref>[[Kunihito Hoki]] ('''2006'''). ''Optimal control of minimax search result to learn positional evaluation''. [[Conferences#GPW|11th Game Programming Workshop]] (Japanese)</ref>.
The further improved MMTO version of Bonanaza won the [[WCSC23]] in 2013 <ref>[[Takenobu Takizawa]], [[Takeshi Ito]], [[Takuya Hiraoka]], [[Kunihito Hoki]] ('''2015'''). ''[https://link.springer.com/referenceworkentry/10.1007/978-3-319-08234-9_22-1 Contemporary Computer Shogi]''. [https://link.springer.com/referencework/10.1007/978-3-319-08234-9 Encyclopedia of Computer Graphics and Games]</ref>..

=Move Adaptation=
A chess program has an [[Evaluation#Linear|linear evaluation function]] e(p,ω), where '''p''' is the [[Chess Position|game position]] and '''ω''' the feature weight vector to be adjusted for optimal play.
The optimization procedure iterates over a set of selected positions from games assuming played by an [[Oracle|oracle]] with a desired move given.
All possible [[Moves|moves]] from this position are [[Make Move|made]] and the resulting position [[Evaluation|evaluated]].
Each move obtaining a higher score than the desired move adds a penalty to the [https://en.wikipedia.org/wiki/Loss_function objective function] to be minimized, for instance <ref>Description and Formulas based on [[Kunihito Hoki]], [[Tomoyuki Kaneko]] ('''2014'''). ''[https://www.jair.org/papers/paper4217.html Large-Scale Optimization for Evaluation Functions with Minimax Search]''. [https://www.jair.org/vol/vol49.html JAIR Vol. 49], [https://pdfs.semanticscholar.org/eb9c/173576577acbb8800bf96aba452d77f1dc19.pdf pdf]</ref>:

[[FILE:mmtoObjectiveFunction1.jpg|none|text-bottom]]
Here, '''p.m''' is the position after move '''m''' in '''p''', '''dp''' is the desired move in '''p''', ℳ′p is the set of all legal moves in '''p''' excluding '''dp''',
and '''H(x)''' is the [https://en.wikipedia.org/wiki/Heaviside_step_function Heaviside step function]. The numerical procedures to minimize such an objective function are complicated,
and the adjustment of a large-scale vector ω seemed to present practical difficulties considering [https://en.wikipedia.org/wiki/Partial_derivative partial derivation] and local versus global [https://en.wikipedia.org/wiki/Maxima_and_minima minima].

=MMTO=
MMTO improved by performing a [[Minimax|minimax search]] (One or two [[Ply|ply]] plus [[Quiescence Search]]), by grid-adjacent update, and using [https://en.wikipedia.org/wiki/Constraint_(mathematics) equality constraint] and [https://en.wikipedia.org/wiki/Regularization_(mathematics) L1 regularization] to achieve [https://en.wikipedia.org/wiki/Scalability scalability] and [https://en.wikipedia.org/wiki/Stability stability].
==Objective Function==
MMTO's [https://en.wikipedia.org/wiki/Loss_function objective function] consists of the sum of three terms, where the first term J(P,ω) on the right side is the main part.
[[FILE:mmtoObjectiveFunction2.jpg|none|text-bottom]]
The other terms JC and JR are constraint and regularization terms.
JC(P,ω) = λ0g(ω'), where ω' is subset of ω, g(ω')=0 is an [https://en.wikipedia.org/wiki/Constraint_(mathematics) equality constraint], and λ0 is a [https://en.wikipedia.org/wiki/Lagrange_multiplier Lagrange multiplier].
JR(P,ω) = λ1|ω<nowiki>''</nowiki>| is the [https://en.wikipedia.org/wiki/Regularization_(mathematics) L1 regularization].
where λ1 is a constant > 0 and ω<nowiki>''</nowiki> is subset of ω. The main part of objective function is similar to the H-formula of the Move Adaptation chapter:

[[FILE:mmtoObjectiveFunction3.jpg|none|text-bottom]]
where s(p,ω) is the value identified by the [[Minimax|minimax]] search for position p. '''T(x) = 1/(1 + exp(ax))''',
a [https://en.wikipedia.org/wiki/Sigmoid_function sigmoid function] with slope controlled by '''a''', to even become the [https://en.wikipedia.org/wiki/Heaviside_step_function Heaviside step function].

==Optimization==
The iterative optimization process has three steps:
# Perform a [[Minimax|minimax]] [[Search|search]] to identify [[Principal Variation|PV]] [[Leaf Node|leaves]] '''πω(t)p.m''' for all child positions '''p.m''' of position '''p''' in training set '''P''', where '''ω(t)''' is the weight vector at the t-th iteration and '''ω(0)''' is the initial guess
# Calculate a partial-derivative approximation of the objective function using both '''πω(t)p.m''' and '''ω(t)'''. The objective function employs a differentiable approximation of T(x), as well as a constraint and regularization term
# Obtain a new weight vector '''ω(t+1)''' from '''ω(t)''' by using a grid-adjacent update guided by the partial derivatives computed in step 2. Go back to step 1, or terminate the optimization when the objective function value converges

Because step 1 is the most time-consuming part, it is worth considering omitting it by assuming the PV does not change between iterations.
In their experiments, Hoki and Kaneko used steps 2 and 3 32 times without running step 1.

==Grid-Adjacent Update==
MMTO uses grid-adjacent update to get '''ω(t+1)''' from '''ω(t)''' using a small integer '''h''' along with the [https://en.wikipedia.org/wiki/Sign_function sgn function] of the partial derivative approximation.
[[FILE:mmtoGridAdjacentUpdate.jpg|none|text-bottom]]

==Partial Derivative Approximation==
In each iteration, feature weights are updated on the basis of the [https://en.wikipedia.org/wiki/Partial_derivative partial derivatives] of the objective function.
[[FILE:mmtoPartialDifferentation1.jpg|none|text-bottom]]
The JR derivative is treated in an intuitive manner [https://en.wikipedia.org/wiki/Sign_function sgn](ωi)λ1 for ωi &Element; ω<nowiki>''</nowiki>, and 0 otherwise.

The partial derivative of the [https://en.wikipedia.org/wiki/Constraint_(mathematics) constraint] term JC is 0 for ωi &NotElement; ω'.
Otherwise, the [https://en.wikipedia.org/wiki/Lagrange_multiplier Lagrange multiplier] λ0 is set to the [https://en.wikipedia.org/wiki/M-estimator#Median median] of the partial derivatives
in order to maintain the constraint g(ω) = 0 in each iteration. As a result, ∆ω′i is '''h''' for '''n''' feature weights, '''−h''' for '''n''' feature weights, and 0 in one feature weight,
where the number of feature weights in ω′ is 2n + 1.

Since the objective function with the minimax values s(p, ω) is not always [https://en.wikipedia.org/wiki/Differentiable_function differentiable],
an approximation is used by using the evaluation of the [[Principal Variation|PV]] [[Leaf Node|leaf]]:
[[FILE:mmtoPartialDifferentation.jpg|none|text-bottom]]
where T'(x) = d/dx T(x).

=See also=
* [[Eval Tuning in Deep Thought]]
* [[NNUE]]
* [[Texel's Tuning Method]]

=Publications=
* [[Kunihito Hoki]] ('''2006'''). ''Optimal control of minimax search result to learn positional evaluation''. [[Conferences#GPW|11th Game Programming Workshop]] (Japanese)
* [[Tomoyuki Kaneko]], [[Kunihito Hoki]] ('''2011'''). ''Analysis of Evaluation-Function Learning by Comparison of Sibling Nodes''. [[Advances in Computer Games 13]]
* [[Kunihito Hoki]], [[Tomoyuki Kaneko]] ('''2011'''). ''The Global Landscape of Objective Functions for the Optimization of Shogi Piece Values with a Game-Tree Search''. [[Advances in Computer Games 13]]
* [[Kunihito Hoki]], [[Tomoyuki Kaneko]] ('''2014'''). ''[https://www.jair.org/papers/paper4217.html Large-Scale Optimization for Evaluation Functions with Minimax Search]''. [https://www.jair.org/vol/vol49.html JAIR Vol. 49], [https://pdfs.semanticscholar.org/eb9c/173576577acbb8800bf96aba452d77f1dc19.pdf pdf]
* [[Takenobu Takizawa]], [[Takeshi Ito]], [[Takuya Hiraoka]], [[Kunihito Hoki]] ('''2015'''). ''[https://link.springer.com/referenceworkentry/10.1007/978-3-319-08234-9_22-1 Contemporary Computer Shogi]''. [https://link.springer.com/referencework/10.1007/978-3-319-08234-9 Encyclopedia of Computer Graphics and Games]

=Forum Posts=
* [http://www.talkchess.com/forum/viewtopic.php?t=55084 MMTO for evaluation learning] by [[Jon Dart]], [[CCC]], January 25, 2015
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=64189&start=21 Re: Texel tuning method question] by [[Jon Dart]], [[CCC]], June 07, 2017

=References=
<references />
'''[[Automated Tuning|Up one Level]]'''

GerdIsenberg

Bureaucrats, Administrators

25,161

edits

Changes

Minimax Tree Optimization

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools