Changes

← Older edit

Minimax Tree Optimization

165 bytes added, 09:15, 25 August 2020

no edit summary

a [[Supervised Learning|supervised]] [[Automated Tuning|tuning method]] based on [[Automated Tuning#MoveAdaption|move adaptation]],

devised and introduced by [[Kunihito Hoki]] and [[Tomoyuki Kaneko]] <ref>[[Kunihito Hoki]], [[Tomoyuki Kaneko]] ('''2014'''). ''[https://www.jair.org/papers/paper4217.html Large-Scale Optimization for Evaluation Functions with Minimax Search]''. [https://www.jair.org/vol/vol49.html JAIR Vol. 49], [https://pdfs.semanticscholar.org/eb9c/173576577acbb8800bf96aba452d77f1dc19.pdf pdf]</ref>.

A MMTO predecessor, the initial '''Bonanza-Method''' was used in Hoki's [[Shogi]] engine [[~~Bonanaza~~Bonanza]] in 2006, winning the [[WCSC16]] <ref>[[Kunihito Hoki]] ('''2006'''). ''Optimal control of minimax search result to learn positional evaluation''. [[Conferences#GPW|11th Game Programming Workshop]] (Japanese)</ref>.The further improved MMTO version of Bonanaza won the [[WCSC23]] in 2013 <ref>[[Takenobu Takizawa]], [[Takeshi Ito]], [[Takuya Hiraoka]], [[Kunihito Hoki]] ('''2015'''). ''[https://link.springer.com/referenceworkentry/10.1007/978-3-319-08234-9_22-1 Contemporary Computer Shogi]''. [https://link.springer.com/referencework/10.1007/978-3-319-08234-9 Encyclopedia of Computer Graphics and Games]</ref>..

=Move Adaptation=

A chess program has an [[Evaluation#Linear|linear evaluation function]] '''e(p,ω)''', where '''p''' is the [[Chess Position|game position]] and '''ω''' the feature weight vector to be adjusted for optimal play.

The optimization procedure iterates over a set of selected positions from games assuming played by an [[Oracle|oracle]] with a desired move given.

All possible [[Moves|moves]] from this position are [[Make Move|made]] and the resulting position [[Evaluation|evaluated]].

=MMTO=

MMTO improved by performing a [[Minimax|minimax search]] (One or two [[Ply|ply]] plus [[Quiescence Search|quiescence]]), by grid-adjacent update, and using [https://en.wikipedia.org/wiki/Constraint_(mathematics) equality constraint] and [https://en.wikipedia.org/wiki/Regularization_(mathematics) L1 regularization] to achieve [https://en.wikipedia.org/wiki/Scalability scalability] and [https://en.wikipedia.org/wiki/Stability stability].

==Objective Function==

MMTO's [https://en.wikipedia.org/wiki/Loss_function objective function] consists of the sum of three terms, where the first term '''J(P,ω) ''' on the right side is the main part.

[[FILE:mmtoObjectiveFunction2.jpg|none|text-bottom]]

The other terms '''JC ''' and '''JR ''' are constraint and regularization terms.'''JC(P,ω) = λ0g(ω')''', where '''ω' ''' is subset of '''ω''', '''g(ω')=0 ''' is an [https://en.wikipedia.org/wiki/Constraint_(mathematics) equality constraint], and '''λ0 ''' is a [https://en.wikipedia.org/wiki/Lagrange_multiplier Lagrange multiplier].'''JR(P,ω) = λ1|ω<nowiki>''</nowiki>| ''' is the [https://en.wikipedia.org/wiki/Regularization_(mathematics) L1 regularization].where '''λ1 ''' is a constant > 0 and '''ω<nowiki>''</nowiki> ''' is subset of '''ω'''. The main part of objective function is similar to the H-formula of the Move Adaptation chapter:

[[FILE:mmtoObjectiveFunction3.jpg|none|text-bottom]]

where '''s(p,ω) ''' is the value identified by the [[Minimax|minimax]] search for position '''p'''. '''T(x) = 1/(1 + exp(ax))''',

a [https://en.wikipedia.org/wiki/Sigmoid_function sigmoid function] with slope controlled by '''a''', to even become the [https://en.wikipedia.org/wiki/Heaviside_step_function Heaviside step function].

The iterative optimization process has three steps:

# Perform a [[Minimax|minimax]] [[Search|search]] to identify [[Principal Variation|PV]] [[Leaf Node|leaves]] '''πω(t)p.m''' for all child positions '''p.m''' of position '''p''' in training set '''P''', where '''ω(t)''' is the weight vector at the t-th iteration and '''ω(0)''' is the initial guess

# Calculate a partial-derivative approximation of the objective function using both '''πω(t)p.m''' and '''ω(t)'''. The objective function employs a differentiable approximation of '''T(x)''', as well as a constraint and regularization term

# Obtain a new weight vector '''ω(t+1)''' from '''ω(t)''' by using a grid-adjacent update guided by the partial derivatives computed in step 2. Go back to step 1, or terminate the optimization when the objective function value converges

In each iteration, feature weights are updated on the basis of the [https://en.wikipedia.org/wiki/Partial_derivative partial derivatives] of the objective function.

[[FILE:mmtoPartialDifferentation1.jpg|none|text-bottom]]

The '''JR ''' derivative is treated in an intuitive manner [https://en.wikipedia.org/wiki/Sign_function sgn]'''(ωi)λ1 ''' for '''ωi ∈ ω<nowiki>''</nowiki>''', and '''0 ''' otherwise.

The partial derivative of the [https://en.wikipedia.org/wiki/Constraint_(mathematics) constraint] term '''JC ''' is 0 for '''ωi ∉ ω' '''.Otherwise, the [https://en.wikipedia.org/wiki/Lagrange_multiplier Lagrange multiplier] '''λ0 ''' is set to the [https://en.wikipedia.org/wiki/M-estimator#Median median] of the partial derivativesin order to maintain the constraint '''g(ω) = 0 ''' in each iteration. As a result, '''∆ω′i''' is '''h''' for '''n''' feature weights, '''−h''' for '''n''' feature weights, and 0 in one feature weight, where the number of feature weights in ω' is 2n + 1.

Since the objective function with the minimax values '''s(p, ω) ''' is not always [https://en.wikipedia.org/wiki/Differentiable_function differentiable],

an approximation is used by using the evaluation of the [[Principal Variation|PV]] [[Leaf Node|leaf]]:

[[FILE:mmtoPartialDifferentation.jpg|none|text-bottom]]

where '''T'(x) = d/dx T(x)'''.

=See also=

GerdIsenberg

Bureaucrats, Administrators

25,161

edits

Changes

Minimax Tree Optimization

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools