Difference between revisions of "SPSA"

From Chessprogramming wiki
Jump to: navigation, search
 
(4 intermediate revisions by the same user not shown)
Line 48: Line 48:
 
* [[Levente Kocsis]], [[Csaba Szepesvári]], [[Mark Winands]] ('''2005'''). ''[http://link.springer.com/chapter/10.1007/11922155_4 RSPSA: Enhanced Parameter Optimization in Games]''. [[Advances in Computer Games 11]], [http://www.sztaki.hu/~szcsaba/papers/rspsa_acg.pdf pdf], [https://dke.maastrichtuniversity.nl/m.winands/documents/RSPSA.pdf pdf]
 
* [[Levente Kocsis]], [[Csaba Szepesvári]], [[Mark Winands]] ('''2005'''). ''[http://link.springer.com/chapter/10.1007/11922155_4 RSPSA: Enhanced Parameter Optimization in Games]''. [[Advances in Computer Games 11]], [http://www.sztaki.hu/~szcsaba/papers/rspsa_acg.pdf pdf], [https://dke.maastrichtuniversity.nl/m.winands/documents/RSPSA.pdf pdf]
 
* [[Levente Kocsis]], [[Csaba Szepesvári]] ('''2006'''). ''[http://link.springer.com/article/10.1007/s10994-006-6888-8 Universal Parameter Optimisation in Games Based on SPSA]''. [https://en.wikipedia.org/wiki/Machine_Learning_%28journal%29 Machine Learning], Special Issue on Machine Learning and Games, Vol. 63, No. 3, [http://www.sztaki.hu/~szcsaba/papers/kocsis_acg05.pdf pdf]
 
* [[Levente Kocsis]], [[Csaba Szepesvári]] ('''2006'''). ''[http://link.springer.com/article/10.1007/s10994-006-6888-8 Universal Parameter Optimisation in Games Based on SPSA]''. [https://en.wikipedia.org/wiki/Machine_Learning_%28journal%29 Machine Learning], Special Issue on Machine Learning and Games, Vol. 63, No. 3, [http://www.sztaki.hu/~szcsaba/papers/kocsis_acg05.pdf pdf]
 +
* [https://dblp.org/pid/70/382.html Mohammed Shahid Abdulla], [[Shalabh Bhatnagar]] ('''2007'''). ''[https://link.springer.com/article/10.1007/s10626-006-0003-y Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes]''. [https://www.springer.com/journal/10626 Discrete Event Dynamic Systems], Vol.17, No.1
 
* [http://dblp.uni-trier.de/pers/hd/m/Maryak:John_L= John L. Maryak], [http://dblp.uni-trier.de/pers/hd/c/Chin:Daniel_C= Daniel C. Chin] ('''2008'''). ''Global Random Optimization by Simultaneous Perturbation Stochastic Approximation''. [[IEEE#TAC|IEEE Transactions on Automatic Control]], Vol. 53, No. 3, [http://www.jhuapl.edu/spsa/PDF-SPSA/Maryak_Chin_IEEETAC08.pdf pdf]
 
* [http://dblp.uni-trier.de/pers/hd/m/Maryak:John_L= John L. Maryak], [http://dblp.uni-trier.de/pers/hd/c/Chin:Daniel_C= Daniel C. Chin] ('''2008'''). ''Global Random Optimization by Simultaneous Perturbation Stochastic Approximation''. [[IEEE#TAC|IEEE Transactions on Automatic Control]], Vol. 53, No. 3, [http://www.jhuapl.edu/spsa/PDF-SPSA/Maryak_Chin_IEEETAC08.pdf pdf]
 
* [http://dblp.uni-trier.de/pers/hd/s/Song_0001:Qing Qing Song], [[James C. Spall]], [http://dblp.uni-trier.de/pers/hd/s/Soh:Yeng_Chai Yeng Chai Soh], [http://dblp.uni-trier.de/pers/hd/n/Ni:Jie Jie Ni] ('''2008'''). ''Robust Neural Network Tracking Controller Using Simultaneous Perturbation Stochastic Approximation''. [[IEEE#NN|IEEE Transactions on Neural Networks]], Vol. 19, No. 5, [https://pdfs.semanticscholar.org/3f2a/4d69ca8adbbc072d82af58b3c750621d36ab.pdf 2003 pdf] » [[Neural Networks]]
 
* [http://dblp.uni-trier.de/pers/hd/s/Song_0001:Qing Qing Song], [[James C. Spall]], [http://dblp.uni-trier.de/pers/hd/s/Soh:Yeng_Chai Yeng Chai Soh], [http://dblp.uni-trier.de/pers/hd/n/Ni:Jie Jie Ni] ('''2008'''). ''Robust Neural Network Tracking Controller Using Simultaneous Perturbation Stochastic Approximation''. [[IEEE#NN|IEEE Transactions on Neural Networks]], Vol. 19, No. 5, [https://pdfs.semanticscholar.org/3f2a/4d69ca8adbbc072d82af58b3c750621d36ab.pdf 2003 pdf] » [[Neural Networks]]
 
==2010 ...==
 
==2010 ...==
* [[Shalabh Bhatnagar]], [[H. L. Prasad]], [[L.A. Prashanth]] ('''2013'''). ''[http://stochastic.csa.iisc.ernet.in/~shalabh/book.html Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods]''. [http://www.springer.com/series/642 Lecture Notes in Control and Information Sciences], Vol. 434, [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
+
* [[Shalabh Bhatnagar]], [https://dblp.org/pid/31/10493.html H.L. Prasad], [https://scholar.google.co.in/citations?user=Q1YXWpoAAAAJ&hl=en L.A. Prashanth] ('''2013'''). ''[https://link.springer.com/book/10.1007/978-1-4471-4285-0 Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods]''. [https://www.springer.com/series/642 Lecture Notes in Control and Information Sciences], Vol. 434, [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
 
* [[Qi Wang]] ('''2013'''). ''[https://jscholarship.library.jhu.edu/handle/1774.2/36955 Optimization with Discrete Simultaneous Perturbation Stochastic Approximation Using Noisy Loss Function Measurements]''. Ph.D. thesis, [https://en.wikipedia.org/wiki/Johns_Hopkins_University Johns Hopkins University], advisor [[James C. Spall]]
 
* [[Qi Wang]] ('''2013'''). ''[https://jscholarship.library.jhu.edu/handle/1774.2/36955 Optimization with Discrete Simultaneous Perturbation Stochastic Approximation Using Noisy Loss Function Measurements]''. Ph.D. thesis, [https://en.wikipedia.org/wiki/Johns_Hopkins_University Johns Hopkins University], advisor [[James C. Spall]]
 
* [https://scholar.google.com/citations?user=nqDASHMAAAAJ Pushpendre Rastogi], [http://dblp.uni-trier.de/pers/hd/z/Zhu:Jingyi Jingyi Zhu], [[James C. Spall]] ('''2016'''). ''Efficient implementation of Enhanced Adaptive Simultaneous Perturbation Algorithms''. [http://dblp.uni-trier.de/db/conf/ciss/ciss2016.html#RastogiZS16 CISS 2016], [http://www.cs.jhu.edu/~prastog3/res/rastogi2016efficient.pdf pdf]
 
* [https://scholar.google.com/citations?user=nqDASHMAAAAJ Pushpendre Rastogi], [http://dblp.uni-trier.de/pers/hd/z/Zhu:Jingyi Jingyi Zhu], [[James C. Spall]] ('''2016'''). ''Efficient implementation of Enhanced Adaptive Simultaneous Perturbation Algorithms''. [http://dblp.uni-trier.de/db/conf/ciss/ciss2016.html#RastogiZS16 CISS 2016], [http://www.cs.jhu.edu/~prastog3/res/rastogi2016efficient.pdf pdf]
Line 68: Line 69:
 
* [https://groups.google.com/d/msg/fishcooking/mbnctfDdcqM/S_rUOa7177AJ Re: SPSA Tuner] by [[Lyudmil Antonov]], [[Computer Chess Forums|FishCooking]], July 20, 2015
 
* [https://groups.google.com/d/msg/fishcooking/mbnctfDdcqM/S_rUOa7177AJ Re: SPSA Tuner] by [[Lyudmil Antonov]], [[Computer Chess Forums|FishCooking]], July 20, 2015
 
* [https://groups.google.com/d/msg/fishcooking/Np13OpEO1_w/xhYulilABgAJ Too small number of games in SPSA] by [[Lyudmil Antonov]], [[Computer Chess Forums|FishCooking]], April 22, 2016
 
* [https://groups.google.com/d/msg/fishcooking/Np13OpEO1_w/xhYulilABgAJ Too small number of games in SPSA] by [[Lyudmil Antonov]], [[Computer Chess Forums|FishCooking]], April 22, 2016
 +
* [https://groups.google.com/d/msg/fishcooking/iyAFOdVx4RE/d4mnJ8lqDgAJ Self-correcting SPSA tuner for chess engines] [[Ivan Ivec]], [[Computer Chess Forums|FishCooking]], January 04, 2017
 
* [http://www.talkchess.com/forum/viewtopic.php?t=63632 SPSA problems] by Ralf Müller, [[CCC]], April 02, 2017
 
* [http://www.talkchess.com/forum/viewtopic.php?t=63632 SPSA problems] by Ralf Müller, [[CCC]], April 02, 2017
 +
* [https://groups.google.com/d/msg/fishcooking/ERsAux5TU6Q/RU2xnF3bDQAJ SPSA and search.cpp?] by [[Nick Pelling]], [[Computer Chess Forums|FishCooking]], January 06, 2019 » [[Stockfish]]
 +
==2020 ...==
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=77420 A hybrid of SPSA and local optimization] by [[Niels Abildskov]], [[CCC]], June 01, 2021 » [[Texel's Tuning Method]]
  
 
=External Links=
 
=External Links=
Line 78: Line 83:
 
* [https://gist.github.com/yanatan16/5420795 Simultaneous Perturbation Stochastic Approximation code in python · GitHub]
 
* [https://gist.github.com/yanatan16/5420795 Simultaneous Perturbation Stochastic Approximation code in python · GitHub]
 
* [https://youtu.be/kVJQkmYU2VA NIPS 2012 Tutoral - Stochastic Search and Optimization], [[James C. Spall#Video|Video Lecture]] by [[James C. Spall]]  
 
* [https://youtu.be/kVJQkmYU2VA NIPS 2012 Tutoral - Stochastic Search and Optimization], [[James C. Spall#Video|Video Lecture]] by [[James C. Spall]]  
 +
* [https://en.wikipedia.org/wiki/Hr-Bigband hr-Bigband] feat. [[:Category:Richard Bona|Richard Bona]] - Kalabancoro, October, 2019, [https://en.wikipedia.org/wiki/Hessischer_Rundfunk Hessischer Rundfunk], [https://en.wikipedia.org/wiki/YouTube YouTube] Video
 +
: {{#evu:https://www.youtube.com/watch?v=L3WxMJVI28g|alignment=left|valignment=top}}
  
 
=References=  
 
=References=  
 
<references />
 
<references />
 
 
'''[[Algorithms|Up one Level]]'''
 
'''[[Algorithms|Up one Level]]'''
 +
[[Category:Richard Bona]]

Latest revision as of 15:42, 2 June 2021

Home * Programming * Algorithms * SPSA

SPSA, (Simultaneous Perturbation Stochastic Approximation)
a stochastic approximation algorithm devised in the late 80s [1] and 90s by James C. Spall [2]. It is an extension of the Finite Difference Stochastic Approximation (FDSA) algorithm aka Kiefer-Wolfowitz algorithm introduced in 1952 by Jack Kiefer and Jacob Wolfowitz [3], on the other hand motivated by the publication of the Robbins-Monro algorithm in 1951 [4].

The SPSA algorithm is suited for high-dimensional optimization problems giving an objective function of a p-dimensional vector of adjustable weights, Theta or Θ, using a gradient approximation that requires only N+1 or 2N objective function measurements over all N iterations regardless of the dimension of the optimization problem - opposed to FDSA, which needs p + 1 objective function measurements or simulations per step. At each iteration, a simultaneous perturbation vector with mutually independent zero-mean random variables is generated, a good choice for each delta is the Rademacher distribution with probability ½ of being either +1 or -1. Two feature vectors Θ+ and Θ- are calculated by adding and subtracting the delta vector scaled by gain sequence ck to/from the current feature vector Θ, to compare their objective function measurements. Dependent on the outcome and scaled by gain sequences ak and ck, the current feature vector is approximated accordantly. The gain sequences decrease with increasing iterations, converging to 0. The theory pertains to both local optimization and global optimization in the face of multiple local optima [5].

Automated Tuning

α = 0.602; γ = 0.101;
for (k=0; k < N; k++) {
  ak = a / (k + 1 + A)^α;
  ck = c / (k + 1)^γ;
  for each p 
    Δp = 2 * round ( rand() / (RAND_MAX + 1.0) ) - 1.0;
  Θ+ = Θ + ck*Δ;
  Θ- = Θ - ck*Δ;
  Θ +=  ak * match(Θ+, Θ-) / (ck*Δ);
}

In computer chess or games, where the objective function reflects the playing strength to maximize, SPSA can be used in automated tuning of evaluation parameters as well as search parameters. A prominent SPSA instance is devised from Stockfish's tuning method as introduced by Joona Kiiski in 2011 [6], where the objective function is measured once per iteration by playing a pair of games with Θ+ versus Θ-, the function "match" returning a ±2 range, see pseudo code. The selection of the coefficients A, a, c, α and γ determine the initial values and time decay of the gain sequences ak and ck, is critical to the performance of SPSA. Spall recommends using α = 0.602 and γ = 0.101, which are the lowest possible values which theoretically guarantees convergence, further see the practical suggestions in Spall's 1998 SPSA implementation paper [7].

RSPSA

Already at the Advances in Computer Games 11 conference at Academia Sinica, Taipei, Taiwan in 2005, Levente Kocsis, Csaba Szepesvári, and Mark Winands introduced SPSA for the game programming community and discussed several methods that can be used to enhance its performance, including the use of common random numbers and antithetic variates, a combination of SPSA with RPROP (resilient backpropagation), and the reuse of samples of previous performance evaluations. RPROP, though it was originally worked out for the training of neural networks, is applicable to any optimization task where the gradient can be computed or approximated [8]. The described RSPSA (Resilient SPSA) was successfully applied in parameter tuning in the domains of Poker and Lines of Action [9].

See also

Selected Publications

1987 ...

1990 ...

2000 ...

2010 ...

Forum Posts

2010 ...

Re: Stockfish's tuning method by Rémi Coulom, CCC, October 07, 2011

2015 ...

Re: A plea to someone by Jon Dart, CCC, April 08, 2015

2020 ...

External Links

References

  1. James C. Spall (1987). A Stochastic Approximation Technique for Generating Maximum Likelihood Parameter Estimates. Proceedings of the American Control Conference, Minneapolis, MN, pdf reprint
  2. James C. Spall (1992). Multivariate Stochastic Approximation Using a Simultaneous Perturbation Gradient Approximation. IEEE Transactions on Automatic Control, Vol. 37, No. 3
  3. Jack Kiefer, Jacob Wolfowitz (1952). Stochastic Estimation of the Maximum of a Regression Function. The Annals of Mathematical Statistics, Vol. 23, No. 3
  4. Herbert Robbins, Sutton Monro (1951). A Stochastic Approximation Method. The Annals of Mathematical Statistics, Vol. 22, No. 3
  5. John L. Maryak, Daniel C. Chin (2008). Global Random Optimization by Simultaneous Perturbation Stochastic Approximation. IEEE Transactions on Automatic Control, Vol. 53, No. 3, pdf
  6. Stockfish's tuning method by Joona Kiiski, CCC, October 07, 2011
  7. James C. Spall (1998). Implementation of the Simultaneous Perturbation Algorithm for Stochastic Optimization. IEEE Transactions on Aerospace and Electronic Systems, Vol. 34, No. 3, pdf
  8. Martin Riedmiller, Heinrich Braun (1993). A direct adaptive method for faster backpropagation learning: The RPROP algorithm. IEEE International Conference On Neural Networks, pdf
  9. Levente Kocsis, Csaba Szepesvári, Mark Winands (2005). RSPSA: Enhanced Parameter Optimization in Games. Advances in Computer Games 11, pdf, pdf
  10. NOMAD - A blackbox optimization software

Up one Level