Difference between revisions of "GPU"

From Chessprogramming wiki
Jump to: navigation, search
(added Host-Device Latencies section)
Line 118: Line 118:
 
* [https://www.linkedin.com/in/avi-bleiweiss-456a5644 Avi Bleiweiss] ('''2010'''). ''Playing Zero-Sum Games on the GPU''. [https://en.wikipedia.org/wiki/Nvidia NVIDIA Corporation], [http://www.nvidia.com/object/io_1269574709099.html GPU Technology Conference 2010], [http://www.nvidia.com/content/gtc-2010/pdfs/2207_gtc2010.pdf slides as pdf]
 
* [https://www.linkedin.com/in/avi-bleiweiss-456a5644 Avi Bleiweiss] ('''2010'''). ''Playing Zero-Sum Games on the GPU''. [https://en.wikipedia.org/wiki/Nvidia NVIDIA Corporation], [http://www.nvidia.com/object/io_1269574709099.html GPU Technology Conference 2010], [http://www.nvidia.com/content/gtc-2010/pdfs/2207_gtc2010.pdf slides as pdf]
 
* [http://www.esrl.noaa.gov/research/review/bios/mark.govett.html Mark Govett], [[Jacques Middlecoff]], [http://www.cira.colostate.edu/people/view.php?id=297 Tom Henderson] ('''2010'''). ''[http://dl.acm.org/citation.cfm?id=1845128 Running the NIM Next-Generation Weather Model on GPUs]''. [http://www.informatik.uni-trier.de/~ley/db/conf/ccgrid/ccgrid2010.html#GovettMH10 CCGRID 2010]
 
* [http://www.esrl.noaa.gov/research/review/bios/mark.govett.html Mark Govett], [[Jacques Middlecoff]], [http://www.cira.colostate.edu/people/view.php?id=297 Tom Henderson] ('''2010'''). ''[http://dl.acm.org/citation.cfm?id=1845128 Running the NIM Next-Generation Weather Model on GPUs]''. [http://www.informatik.uni-trier.de/~ley/db/conf/ccgrid/ccgrid2010.html#GovettMH10 CCGRID 2010]
 +
'''2011'''
 
* [http://www.esrl.noaa.gov/research/review/bios/mark.govett.html Mark Govett], [[Jacques Middlecoff]], [http://www.cira.colostate.edu/people/view.php?id=297 Tom Henderson], [http://www.cug.org/5-publications/proceedings_attendee_lists/CUG09CD/S09_Proceedings/pages/authors/11-15Wednesday/12A-Rosinski/Rosinski-paper.html Jim Rosinski], [http://www.linkedin.com/pub/craig-tierney/5/854/956 Craig Tierney] ('''2011'''). ''Parallelization of the NIM Dynamical Core for GPUs''. [https://is.enes.org/documents/Govett.pdf slides as pdf]
 
* [http://www.esrl.noaa.gov/research/review/bios/mark.govett.html Mark Govett], [[Jacques Middlecoff]], [http://www.cira.colostate.edu/people/view.php?id=297 Tom Henderson], [http://www.cug.org/5-publications/proceedings_attendee_lists/CUG09CD/S09_Proceedings/pages/authors/11-15Wednesday/12A-Rosinski/Rosinski-paper.html Jim Rosinski], [http://www.linkedin.com/pub/craig-tierney/5/854/956 Craig Tierney] ('''2011'''). ''Parallelization of the NIM Dynamical Core for GPUs''. [https://is.enes.org/documents/Govett.pdf slides as pdf]
 
* [[Ľubomír Lackovič]] ('''2011'''). ''[https://hgpu.org/?p=5772 Parallel Game Tree Search Using GPU]''. Institute of Informatics and Software Engineering, [https://en.wikipedia.org/wiki/Faculty_of_Informatics_and_Information_Technologies Faculty of Informatics and Information Technologies], [https://en.wikipedia.org/wiki/Slovak_University_of_Technology_in_Bratislava Slovak University of Technology in Bratislava], [http://acmbulletin.fiit.stuba.sk/vol3num2/lackovic.pdf pdf]
 
* [[Ľubomír Lackovič]] ('''2011'''). ''[https://hgpu.org/?p=5772 Parallel Game Tree Search Using GPU]''. Institute of Informatics and Software Engineering, [https://en.wikipedia.org/wiki/Faculty_of_Informatics_and_Information_Technologies Faculty of Informatics and Information Technologies], [https://en.wikipedia.org/wiki/Slovak_University_of_Technology_in_Bratislava Slovak University of Technology in Bratislava], [http://acmbulletin.fiit.stuba.sk/vol3num2/lackovic.pdf pdf]
Line 123: Line 124:
 
* [[Damian Sulewski]] ('''2011'''). ''Large-Scale Parallel State Space Search Utilizing Graphics Processing Units and Solid State Disks''. Ph.D. thesis, [[University of Dortmund]], [https://eldorado.tu-dortmund.de/dspace/bitstream/2003/29418/1/Dissertation.pdf pdf]
 
* [[Damian Sulewski]] ('''2011'''). ''Large-Scale Parallel State Space Search Utilizing Graphics Processing Units and Solid State Disks''. Ph.D. thesis, [[University of Dortmund]], [https://eldorado.tu-dortmund.de/dspace/bitstream/2003/29418/1/Dissertation.pdf pdf]
 
* [[Damjan Strnad]], [[Nikola Guid]] ('''2011'''). ''[http://cit.fer.hr/index.php/CIT/article/view/2029 Parallel Alpha-Beta Algorithm on the GPU]''. [http://cit.fer.hr/index.php/CIT CIT. Journal of Computing and Information Technology], Vol. 19, No. 4 » [[Parallel Search]], [[Othello|Reversi]]  
 
* [[Damjan Strnad]], [[Nikola Guid]] ('''2011'''). ''[http://cit.fer.hr/index.php/CIT/article/view/2029 Parallel Alpha-Beta Algorithm on the GPU]''. [http://cit.fer.hr/index.php/CIT CIT. Journal of Computing and Information Technology], Vol. 19, No. 4 » [[Parallel Search]], [[Othello|Reversi]]  
 +
'''2012'''
 
* [[Liang Li]], [[Hong Liu]], [[Peiyu Liu]], [[Taoying Liu]], [[Wei Li]], [[Hao Wang]] ('''2012'''). ''[https://www.semanticscholar.org/paper/A-Node-based-Parallel-Game-Tree-Algorithm-Using-Li-Liu/be21d7b9b91957b700aab4ce002e6753b826ff54 A Node-based Parallel Game Tree Algorithm Using GPUs]''. CLUSTER 2012 » [[Parallel Search]]
 
* [[Liang Li]], [[Hong Liu]], [[Peiyu Liu]], [[Taoying Liu]], [[Wei Li]], [[Hao Wang]] ('''2012'''). ''[https://www.semanticscholar.org/paper/A-Node-based-Parallel-Game-Tree-Algorithm-Using-Li-Liu/be21d7b9b91957b700aab4ce002e6753b826ff54 A Node-based Parallel Game Tree Algorithm Using GPUs]''. CLUSTER 2012 » [[Parallel Search]]
 +
'''2013'''
 
* [[S. Ali Mirsoleimani]], [https://dblp.uni-trier.de/pers/hd/k/Karami:Ali Ali Karami Ali Karami], [https://dblp.uni-trier.de/pers/hd/k/Khunjush:Farshad Farshad Khunjush] ('''2013'''). ''[https://scholar.google.de/citations?view_op=view_citation&hl=en&user=VvkRESgAAAAJ&citation_for_view=VvkRESgAAAAJ:ufrVoPGSRksC A parallel memetic algorithm on GPU to solve the task scheduling problem in heterogeneous environments]''. [http://www.sigevo.org/gecco-2013/program.html GECCO '13]
 
* [[S. Ali Mirsoleimani]], [https://dblp.uni-trier.de/pers/hd/k/Karami:Ali Ali Karami Ali Karami], [https://dblp.uni-trier.de/pers/hd/k/Khunjush:Farshad Farshad Khunjush] ('''2013'''). ''[https://scholar.google.de/citations?view_op=view_citation&hl=en&user=VvkRESgAAAAJ&citation_for_view=VvkRESgAAAAJ:ufrVoPGSRksC A parallel memetic algorithm on GPU to solve the task scheduling problem in heterogeneous environments]''. [http://www.sigevo.org/gecco-2013/program.html GECCO '13]
 
* [https://dblp.uni-trier.de/pers/hd/k/Karami:Ali Ali Karami], [[S. Ali Mirsoleimani]], [https://dblp.uni-trier.de/pers/hd/k/Khunjush:Farshad Farshad Khunjush] ('''2013'''). ''[https://ieeexplore.ieee.org/document/6714232 A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs]''. [https://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6708586 CADS 2013]
 
* [https://dblp.uni-trier.de/pers/hd/k/Karami:Ali Ali Karami], [[S. Ali Mirsoleimani]], [https://dblp.uni-trier.de/pers/hd/k/Khunjush:Farshad Farshad Khunjush] ('''2013'''). ''[https://ieeexplore.ieee.org/document/6714232 A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs]''. [https://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6708586 CADS 2013]
 
* [[Diego Rodríguez-Losada]], [[Pablo San Segundo]], [[Miguel Hernando]], [https://dblp.uni-trier.de/pers/hd/p/Puente:Paloma_de_la Paloma de la Puente], [https://dblp.uni-trier.de/pers/hd/v/Valero=Gomez:Alberto Alberto Valero-Gomez] ('''2013'''). ''GPU-Mapping: Robotic Map Building with Graphical Multiprocessors''. [https://dblp.uni-trier.de/db/journals/ram/ram20.html IEEE Robotics & Automation Magazine, Vol. 20, No. 2], [https://www.acin.tuwien.ac.at/fileadmin/acin/v4r/v4r/GPUMap_RAM2013.pdf pdf]
 
* [[Diego Rodríguez-Losada]], [[Pablo San Segundo]], [[Miguel Hernando]], [https://dblp.uni-trier.de/pers/hd/p/Puente:Paloma_de_la Paloma de la Puente], [https://dblp.uni-trier.de/pers/hd/v/Valero=Gomez:Alberto Alberto Valero-Gomez] ('''2013'''). ''GPU-Mapping: Robotic Map Building with Graphical Multiprocessors''. [https://dblp.uni-trier.de/db/journals/ram/ram20.html IEEE Robotics & Automation Magazine, Vol. 20, No. 2], [https://www.acin.tuwien.ac.at/fileadmin/acin/v4r/v4r/GPUMap_RAM2013.pdf pdf]
 +
'''2014'''
 
* [https://dblp.uni-trier.de/pers/hd/d/Dang:Qingqing Qingqing Dang], [https://dblp.uni-trier.de/pers/hd/y/Yan:Shengen Shengen Yan], [[Ren Wu]] ('''2014'''). ''[https://ieeexplore.ieee.org/document/7097862 A fast integral image generation algorithm on GPUs]''. [https://dblp.uni-trier.de/db/conf/icpads/icpads2014.html ICPADS 2014]
 
* [https://dblp.uni-trier.de/pers/hd/d/Dang:Qingqing Qingqing Dang], [https://dblp.uni-trier.de/pers/hd/y/Yan:Shengen Shengen Yan], [[Ren Wu]] ('''2014'''). ''[https://ieeexplore.ieee.org/document/7097862 A fast integral image generation algorithm on GPUs]''. [https://dblp.uni-trier.de/db/conf/icpads/icpads2014.html ICPADS 2014]
 
* [[S. Ali Mirsoleimani]], [https://dblp.uni-trier.de/pers/hd/k/Karami:Ali Ali Karami Ali Karami], [https://dblp.uni-trier.de/pers/hd/k/Khunjush:Farshad Farshad Khunjush] ('''2014'''). ''[https://link.springer.com/chapter/10.1007/978-3-319-04891-8_12 A Two-Tier Design Space Exploration Algorithm to Construct a GPU Performance Predictor]''. [https://dblp.uni-trier.de/db/conf/arcs/arcs2014.html ARCS 2014], [https://en.wikipedia.org/wiki/Lecture_Notes_in_Computer_Science Lecture Notes in Computer Science], Vol. 8350, [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
 
* [[S. Ali Mirsoleimani]], [https://dblp.uni-trier.de/pers/hd/k/Karami:Ali Ali Karami Ali Karami], [https://dblp.uni-trier.de/pers/hd/k/Khunjush:Farshad Farshad Khunjush] ('''2014'''). ''[https://link.springer.com/chapter/10.1007/978-3-319-04891-8_12 A Two-Tier Design Space Exploration Algorithm to Construct a GPU Performance Predictor]''. [https://dblp.uni-trier.de/db/conf/arcs/arcs2014.html ARCS 2014], [https://en.wikipedia.org/wiki/Lecture_Notes_in_Computer_Science Lecture Notes in Computer Science], Vol. 8350, [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
Line 132: Line 136:
 
* [[Peter H. Jin]], [[Kurt Keutzer]] ('''2015'''). ''Convolutional Monte Carlo Rollouts in Go''. [http://arxiv.org/abs/1512.03375 arXiv:1512.03375] » [[Deep Learning]], [[Go]], [[Monte-Carlo Tree Search|MCTS]]
 
* [[Peter H. Jin]], [[Kurt Keutzer]] ('''2015'''). ''Convolutional Monte Carlo Rollouts in Go''. [http://arxiv.org/abs/1512.03375 arXiv:1512.03375] » [[Deep Learning]], [[Go]], [[Monte-Carlo Tree Search|MCTS]]
 
* [[Liang Li]], [[Hong Liu]], [[Hao Wang]], [[Taoying Liu]], [[Wei Li]] ('''2015'''). ''[https://ieeexplore.ieee.org/document/6868996 A Parallel Algorithm for Game Tree Search Using GPGPU]''. [[IEEE#TPDS|IEEE Transactions on Parallel and Distributed Systems]], Vol. 26, No. 8 » [[Parallel Search]]
 
* [[Liang Li]], [[Hong Liu]], [[Hao Wang]], [[Taoying Liu]], [[Wei Li]] ('''2015'''). ''[https://ieeexplore.ieee.org/document/6868996 A Parallel Algorithm for Game Tree Search Using GPGPU]''. [[IEEE#TPDS|IEEE Transactions on Parallel and Distributed Systems]], Vol. 26, No. 8 » [[Parallel Search]]
 +
'''2016'''
 
* <span id="Astro"></span>[https://www.linkedin.com/in/sean-sheen-b99aba89 Sean Sheen] ('''2016'''). ''[https://digitalcommons.calpoly.edu/theses/1567/ Astro - A Low-Cost, Low-Power Cluster for CPU-GPU Hybrid Computing using the Jetson TK1]''. Master's thesis, [https://en.wikipedia.org/wiki/California_Polytechnic_State_University California Polytechnic State University], [https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?referer=&httpsredir=1&article=2723&context=theses pdf] <ref>[http://www.nvidia.com/object/jetson-tk1-embedded-dev-kit.html Jetson TK1 Embedded Development Kit | NVIDIA]</ref> <ref>[http://www.talkchess.com/forum/viewtopic.php?t=61761 Jetson GPU architecture] by [[Dann Corbit]], [[CCC]], October 18, 2016</ref>
 
* <span id="Astro"></span>[https://www.linkedin.com/in/sean-sheen-b99aba89 Sean Sheen] ('''2016'''). ''[https://digitalcommons.calpoly.edu/theses/1567/ Astro - A Low-Cost, Low-Power Cluster for CPU-GPU Hybrid Computing using the Jetson TK1]''. Master's thesis, [https://en.wikipedia.org/wiki/California_Polytechnic_State_University California Polytechnic State University], [https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?referer=&httpsredir=1&article=2723&context=theses pdf] <ref>[http://www.nvidia.com/object/jetson-tk1-embedded-dev-kit.html Jetson TK1 Embedded Development Kit | NVIDIA]</ref> <ref>[http://www.talkchess.com/forum/viewtopic.php?t=61761 Jetson GPU architecture] by [[Dann Corbit]], [[CCC]], October 18, 2016</ref>
 
* [https://scholar.google.com/citations?user=YyD7mwcAAAAJ&hl=en Jingyue Wu], [https://scholar.google.com/citations?user=EJcIByYAAAAJ&hl=en Artem Belevich], [https://scholar.google.com/citations?user=X5WAGdEAAAAJ&hl=en Eli Bendersky], [https://www.linkedin.com/in/mark-heffernan-873b663/ Mark Heffernan], [https://scholar.google.com/citations?user=Guehv9sAAAAJ&hl=en Chris Leary], [https://scholar.google.com/citations?user=fAmfZAYAAAAJ&hl=en Jacques Pienaar], [http://www.broune.com/ Bjarke Roune], [https://scholar.google.com/citations?user=Der7mNMAAAAJ&hl=en Rob Springer], [https://scholar.google.com/citations?user=zvfOH0wAAAAJ&hl=en Xuetian Weng], [https://scholar.google.com/citations?user=s7VCtl8AAAAJ&hl=en Robert Hundt] ('''2016'''). ''[https://dl.acm.org/citation.cfm?id=2854041 gpucc: an open-source GPGPU compiler]''. [https://cgo.org/cgo2016/ CGO 2016]
 
* [https://scholar.google.com/citations?user=YyD7mwcAAAAJ&hl=en Jingyue Wu], [https://scholar.google.com/citations?user=EJcIByYAAAAJ&hl=en Artem Belevich], [https://scholar.google.com/citations?user=X5WAGdEAAAAJ&hl=en Eli Bendersky], [https://www.linkedin.com/in/mark-heffernan-873b663/ Mark Heffernan], [https://scholar.google.com/citations?user=Guehv9sAAAAJ&hl=en Chris Leary], [https://scholar.google.com/citations?user=fAmfZAYAAAAJ&hl=en Jacques Pienaar], [http://www.broune.com/ Bjarke Roune], [https://scholar.google.com/citations?user=Der7mNMAAAAJ&hl=en Rob Springer], [https://scholar.google.com/citations?user=zvfOH0wAAAAJ&hl=en Xuetian Weng], [https://scholar.google.com/citations?user=s7VCtl8AAAAJ&hl=en Robert Hundt] ('''2016'''). ''[https://dl.acm.org/citation.cfm?id=2854041 gpucc: an open-source GPGPU compiler]''. [https://cgo.org/cgo2016/ CGO 2016]
 
* [[David Silver]], [[Shih-Chieh Huang|Aja Huang]], [[Chris J. Maddison]], [[Arthur Guez]], [[Laurent Sifre]], [[George van den Driessche]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Veda Panneershelvam]], [[Marc Lanctot]], [[Sander Dieleman]], [[Dominik Grewe]], [[John Nham]], [[Nal Kalchbrenner]], [[Ilya Sutskever]], [[Timothy Lillicrap]], [[Madeleine Leach]], [[Koray Kavukcuoglu]], [[Thore Graepel]], [[Demis Hassabis]] ('''2016'''). ''[http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html Mastering the game of Go with deep neural networks and tree search]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 529 » [[AlphaGo]]
 
* [[David Silver]], [[Shih-Chieh Huang|Aja Huang]], [[Chris J. Maddison]], [[Arthur Guez]], [[Laurent Sifre]], [[George van den Driessche]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Veda Panneershelvam]], [[Marc Lanctot]], [[Sander Dieleman]], [[Dominik Grewe]], [[John Nham]], [[Nal Kalchbrenner]], [[Ilya Sutskever]], [[Timothy Lillicrap]], [[Madeleine Leach]], [[Koray Kavukcuoglu]], [[Thore Graepel]], [[Demis Hassabis]] ('''2016'''). ''[http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html Mastering the game of Go with deep neural networks and tree search]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 529 » [[AlphaGo]]
 +
'''2017'''
 
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815] » [[AlphaZero]]
 
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815] » [[AlphaZero]]
 
* [[Tristan Cazenave]] ('''2017'''). ''[http://ieeexplore.ieee.org/document/7875402/ Residual Networks for Computer Go]''.  [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. PP, No. 99, [http://www.lamsade.dauphine.fr/~cazenave/papers/resnet.pdf pdf]
 
* [[Tristan Cazenave]] ('''2017'''). ''[http://ieeexplore.ieee.org/document/7875402/ Residual Networks for Computer Go]''.  [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. PP, No. 99, [http://www.lamsade.dauphine.fr/~cazenave/papers/resnet.pdf pdf]
 +
'''2018'''
 
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2018'''). ''[http://science.sciencemag.org/content/362/6419/1140 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play]''. [https://en.wikipedia.org/wiki/Science_(journal) Science], Vol. 362, No. 6419
 
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2018'''). ''[http://science.sciencemag.org/content/362/6419/1140 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play]''. [https://en.wikipedia.org/wiki/Science_(journal) Science], Vol. 362, No. 6419
  
Line 145: Line 152:
 
==2010 ...==
 
==2010 ...==
 
* [http://www.talkchess.com/forum/viewtopic.php?t=32750 Using the GPU] by [[Louis Zulli]], [[CCC]], February 19, 2010
 
* [http://www.talkchess.com/forum/viewtopic.php?t=32750 Using the GPU] by [[Louis Zulli]], [[CCC]], February 19, 2010
 +
'''2011'''
 
* [http://www.talkchess.com/forum/viewtopic.php?t=38002 GPGPU and computer chess] by Wim Sjoho, [[CCC]], February 09, 2011
 
* [http://www.talkchess.com/forum/viewtopic.php?t=38002 GPGPU and computer chess] by Wim Sjoho, [[CCC]], February 09, 2011
 
* [http://www.talkchess.com/forum/viewtopic.php?t=38478 Possible Board Presentation and Move Generation for GPUs?] by [[Srdja Matovic]], [[CCC]], March 19, 2011
 
* [http://www.talkchess.com/forum/viewtopic.php?t=38478 Possible Board Presentation and Move Generation for GPUs?] by [[Srdja Matovic]], [[CCC]], March 19, 2011
Line 150: Line 158:
 
* [http://www.talkchess.com/forum/viewtopic.php?t=39459 Zeta plays chess on a gpu] by [[Srdja Matovic]], [[CCC]], June 23, 2011 » [[Zeta]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=39459 Zeta plays chess on a gpu] by [[Srdja Matovic]], [[CCC]], June 23, 2011 » [[Zeta]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=39606 GPU Search Methods] by [[Joshua Haglund]], [[CCC]], July 04, 2011
 
* [http://www.talkchess.com/forum/viewtopic.php?t=39606 GPU Search Methods] by [[Joshua Haglund]], [[CCC]], July 04, 2011
 +
'''2012'''
 
* [http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=442052&t=41853 Possible Search Algorithms for GPUs?] by [[Srdja Matovic]], [[CCC]], January 07, 2012 <ref>[[Yaron Shoham]], [[Sivan Toledo]] ('''2002'''). ''[https://www.sciencedirect.com/science/article/pii/S0004370202001959 Parallel Randomized Best-First Minimax Search]''. [https://en.wikipedia.org/wiki/Artificial_Intelligence_(journal) Artificial Intelligence], Vol. 137, Nos. 1-2</ref>  <ref>[[Alberto Maria Segre]], [[Sean Forman]], [[Giovanni Resta]], [[Andrew Wildenberg]] ('''2002'''). ''[https://www.sciencedirect.com/science/article/pii/S000437020200228X Nagging: A Scalable Fault-Tolerant Paradigm for Distributed Search]''. [https://en.wikipedia.org/wiki/Artificial_Intelligence_%28journal%29 Artificial Intelligence], Vol. 140, Nos. 1-2</ref>
 
* [http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=442052&t=41853 Possible Search Algorithms for GPUs?] by [[Srdja Matovic]], [[CCC]], January 07, 2012 <ref>[[Yaron Shoham]], [[Sivan Toledo]] ('''2002'''). ''[https://www.sciencedirect.com/science/article/pii/S0004370202001959 Parallel Randomized Best-First Minimax Search]''. [https://en.wikipedia.org/wiki/Artificial_Intelligence_(journal) Artificial Intelligence], Vol. 137, Nos. 1-2</ref>  <ref>[[Alberto Maria Segre]], [[Sean Forman]], [[Giovanni Resta]], [[Andrew Wildenberg]] ('''2002'''). ''[https://www.sciencedirect.com/science/article/pii/S000437020200228X Nagging: A Scalable Fault-Tolerant Paradigm for Distributed Search]''. [https://en.wikipedia.org/wiki/Artificial_Intelligence_%28journal%29 Artificial Intelligence], Vol. 140, Nos. 1-2</ref>
 
* [http://www.talkchess.com/forum/viewtopic.php?t=42590 uct on gpu] by [[Daniel Shawul]], [[CCC]], February 24, 2012 » [[UCT]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=42590 uct on gpu] by [[Daniel Shawul]], [[CCC]], February 24, 2012 » [[UCT]]
Line 155: Line 164:
 
* [http://www.talkchess.com/forum/viewtopic.php?t=44014 Choosing a GPU platform: AMD and Nvidia] by [[John Hamlen]], [[CCC]], June 10, 2012
 
* [http://www.talkchess.com/forum/viewtopic.php?t=44014 Choosing a GPU platform: AMD and Nvidia] by [[John Hamlen]], [[CCC]], June 10, 2012
 
* [http://www.talkchess.com/forum/viewtopic.php?t=46277 Nvidias K20 with Recursion] by [[Srdja Matovic]], [[CCC]], December 04, 2012 <ref>[http://www.techpowerup.com/173846/Tesla-K20-GPU-Compute-Processor-Specifications-Released.html Tesla K20 GPU Compute Processor Specifications Released | techPowerUp]</ref>
 
* [http://www.talkchess.com/forum/viewtopic.php?t=46277 Nvidias K20 with Recursion] by [[Srdja Matovic]], [[CCC]], December 04, 2012 <ref>[http://www.techpowerup.com/173846/Tesla-K20-GPU-Compute-Processor-Specifications-Released.html Tesla K20 GPU Compute Processor Specifications Released | techPowerUp]</ref>
 +
'''2013'''
 
* [http://www.talkchess.com/forum/viewtopic.php?t=46974 Kogge Stone, Vector Based] by [[Srdja Matovic]], [[CCC]], January 22, 2013 » [[Kogge-Stone Algorithm]] <ref>[https://en.wikipedia.org/wiki/Parallel_Thread_Execution Parallel Thread Execution from Wikipedia]</ref> <ref>NVIDIA Compute PTX: Parallel Thread Execution, ISA Version 1.4, March 31, 2009, [http://www.nvidia.com/content/CUDA-ptx_isa_1.4.pdf pdf]</ref>
 
* [http://www.talkchess.com/forum/viewtopic.php?t=46974 Kogge Stone, Vector Based] by [[Srdja Matovic]], [[CCC]], January 22, 2013 » [[Kogge-Stone Algorithm]] <ref>[https://en.wikipedia.org/wiki/Parallel_Thread_Execution Parallel Thread Execution from Wikipedia]</ref> <ref>NVIDIA Compute PTX: Parallel Thread Execution, ISA Version 1.4, March 31, 2009, [http://www.nvidia.com/content/CUDA-ptx_isa_1.4.pdf pdf]</ref>
 
* [http://www.talkchess.com/forum/viewtopic.php?t=47344 GPU chess engine] by Samuel Siltanen, [[CCC]], February 27, 2013
 
* [http://www.talkchess.com/forum/viewtopic.php?t=47344 GPU chess engine] by Samuel Siltanen, [[CCC]], February 27, 2013
Line 162: Line 172:
 
* [http://www.talkchess.com/forum/viewtopic.php?t=61761 Jetson GPU architecture] by [[Dann Corbit]], [[CCC]], October 18, 2016 » [[GPU#Astro|Astro]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=61761 Jetson GPU architecture] by [[Dann Corbit]], [[CCC]], October 18, 2016 » [[GPU#Astro|Astro]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=61925 Pigeon is now running on the GPU] by [[Stuart Riffle]], [[CCC]], November 02, 2016 » [[Pigeon]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=61925 Pigeon is now running on the GPU] by [[Stuart Riffle]], [[CCC]], November 02, 2016 » [[Pigeon]]
 +
'''2017'''
 
* [http://www.talkchess.com/forum/viewtopic.php?t=63346 Back to the basics, generating moves on gpu in parallel...] by [[Srdja Matovic]], [[CCC]], March 05, 2017 » [[Move Generation]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=63346 Back to the basics, generating moves on gpu in parallel...] by [[Srdja Matovic]], [[CCC]], March 05, 2017 » [[Move Generation]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=64983&start=9 Re: Perft(15): comparison of estimates with Ankan's result] by [[Ankan Banerjee]], [[CCC]], August 26, 2017 » [[Perft#15|Perft(15)]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=64983&start=9 Re: Perft(15): comparison of estimates with Ankan's result] by [[Ankan Banerjee]], [[CCC]], August 26, 2017 » [[Perft#15|Perft(15)]]
 
* [http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?tid=32317 Chess Engine and GPU] by Fishpov , [[Computer Chess Forums|Rybka Forum]], October 09, 2017  
 
* [http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?tid=32317 Chess Engine and GPU] by Fishpov , [[Computer Chess Forums|Rybka Forum]], October 09, 2017  
 
* [http://www.talkchess.com/forum/viewtopic.php?t=66025 To TPU or not to TPU...] by [[Srdja Matovic]], [[CCC]], December 16, 2017 » [[Deep Learning]] <ref>[https://en.wikipedia.org/wiki/Tensor_processing_unit Tensor processing unit from Wikipedia]</ref>
 
* [http://www.talkchess.com/forum/viewtopic.php?t=66025 To TPU or not to TPU...] by [[Srdja Matovic]], [[CCC]], December 16, 2017 » [[Deep Learning]] <ref>[https://en.wikipedia.org/wiki/Tensor_processing_unit Tensor processing unit from Wikipedia]</ref>
 +
'''2018'''
 
* [http://www.talkchess.com/forum/viewtopic.php?t=66280 Announcing lczero] by [[Gary Linscott|Gary]], [[CCC]], January 09, 2018 » [[Leela Chess Zero]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=66280 Announcing lczero] by [[Gary Linscott|Gary]], [[CCC]], January 09, 2018 » [[Leela Chess Zero]]
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=67347 GPU ANN, how to deal with host-device latencies?] by [[Srdja Matovic]], [[CCC]], May 06, 2018 » [[Neural Networks]]
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=67347 GPU ANN, how to deal with host-device latencies?] by [[Srdja Matovic]], [[CCC]], May 06, 2018 » [[Neural Networks]]
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=68448 How good is the RTX 2080 Ti for Leela?] by Hai, September 15, 2018 » [[Leela Chess Zero]] <ref>[https://en.wikipedia.org/wiki/GeForce_20_series GeForce 20 series from Wikipedia]</ref>
 +
: [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=68448&start=2 Re: How good is the RTX 2080 Ti for Leela?] by [[Ankan Banerjee]], [[CCC]], September 16, 2018
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=68973 My non-OC RTX 2070 is very fast with Lc0] by [[Kai Laskos]], [[CCC]], November 19, 2018 » [[Leela Chess Zero]]
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=68973 My non-OC RTX 2070 is very fast with Lc0] by [[Kai Laskos]], [[CCC]], November 19, 2018 » [[Leela Chess Zero]]
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=69400 LC0 using 4 x 2080 Ti GPU's on Chess.com tourney?] by M. Ansari, [[CCC]], December 28, 2018 » [[Leela Chess Zero]]
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=69400 LC0 using 4 x 2080 Ti GPU's on Chess.com tourney?] by M. Ansari, [[CCC]], December 28, 2018 » [[Leela Chess Zero]]
 +
'''2019'''
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=69447 Generate EGTB with graphics cards?] by [[Pham Hong Nguyen|Nguyen Pham]], [[CCC]], January 01, 2019 » [[Endgame Tablebases]]
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=69447 Generate EGTB with graphics cards?] by [[Pham Hong Nguyen|Nguyen Pham]], [[CCC]], January 01, 2019 » [[Endgame Tablebases]]
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=69478 LCZero FAQ is missing one important fact] by [[Jouni Uski]], [[CCC]], January 01, 2019 » [[Leela Chess Zero]]
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=69478 LCZero FAQ is missing one important fact] by [[Jouni Uski]], [[CCC]], January 01, 2019 » [[Leela Chess Zero]]

Revision as of 12:36, 14 March 2019

Home * Hardware * GPU

GPU (Graphics Processing Unit),
a specialized processor primarily intended for graphic cards to rapidly manipulate and alter memory for fast image processing, usually but not necessarily mapped to a framebuffer of a display. GPUs have more raw computing power than general purpose CPUs but need a limited, specialized and massive parallelized way of programming, not that conform with the serial nature of alpha-beta if it is about a massive parallel search in chess. Instead, Best-first Monte-Carlo Tree Search (MCTS) approaches in conjunction with deep learning proved a successful way to go on GPU architectures.

GPGPU

There are various frameworks for GPGPU, General Purpose computing on Graphics Processing Unit. Despite language wrappers and mobile devices with special APIs, there are in main three ways to make use of GPGPU.

Mapping to an API

Native Compilers

Intermediate Languages

Inside

Modern GPUs consist of up to hundreds of SIMD or Vector units, coupled to compute units. Each compute unit processes multiple Warps (Nvidia term) resp. Wavefronts (AMD term) in SIMT fashion. Each Warp resp. Wavefront runs n (32 or 64) threads simultaneously.

The Nvidia GeForce GTX 580, for example, is able to run 32 threads in one Warp, in total of 24576 threads, spread on 16 compute units with a total of 512 cores. [2] The AMD Radeon HD 7970 is able to run 64 threads in one Wavefront, in total of 81920 threads, spread on 32 compute units with a total of 2048 cores. [3]. In real life the register and shared memory size limits the amount of total threads.

Memory

The memory hierarchy of an GPU consists in main of private memory (registers accessed by an single thread resp. work-item), local memory (shared by threads of an block resp. work-items of an work-group ), constant memory, different types of cache and global memory. Size, latency and bandwidth vary between vendors and architectures.

Here the data for the Nvidia GeForce GTX 580 (Fermi) as an example: [4]

  • 128 KiB private memory per compute unit
  • 48 KiB (16 KiB) local memory per compute unit (configurable)
  • 64 KiB constant memory
  • 8 KiB constant cache per compute unit
  • 16 KiB (48 KiB) L1 cache per compute unit (configurable)
  • 768 KiB L2 cache
  • 1.5 GiB to 3 GiB global memory

Here the data for the AMD Radeon HD 7970 (GCN) as an example: [5]

  • 256 KiB private memory per compute unit
  • 64 KiB local memory per compute unit
  • 64 KiB constant memory
  • 16 KiB constant cache per four compute units
  • 16 KiB L1 cache per compute unit
  • 768 KiB L2 cache
  • 3 GiB to 6 GiB global memory

Instruction Throughput

GPUs are used in HPC environments because of their good FLOP/Watt ratio. The instruction throughput in general depends on the architecture (like Nvidia's Tesla, Fermi, Kepler, Maxwell or AMD's Terascale, GCN), the brand (like Nvidia GeForce, Quadro, Tesla or AMD Radeon, Radeon Pro, Radeon Instinct) and the specific model.

  • 32 bit Integer Performance
The 32 bit integer performance can be architecture and operation depended less than 32 bit FLOP or 24 bit integer performance.
  • 64 bit Integer Performance
Current GPU registers and Vector-ALUs are 32 bit wide and have to emulate 64 bit integer operations.[6] [7]
  • Mixed Precision Support
Newer architectures like Nvidia Turing and AMD Vega have mixed precision support. Vega doubles the FP16 and quadruples the INT8 throughput.[8]Turing doubles the FP16 throughput of its FPUs.[9]
  • TensorCores
With Nvidia Volta series TensorCores were introduced. They offer fp16*fp16+fp32, matrix-multiplication-accumulate-units, used to accelerate neural networks.[10] Turings 2nd gen TensorCores add FP16, INT8, INT4 optimized computation.[11]

Throughput Examples

Nvidia GeForce GTX 580 (Fermi, CC 2.0) - 32 bit integer operations/clock cycle per compute unit [12]

   MAD 16
   MUL 16
   ADD 32
   Bit-shift 16
   Bitwise XOR 32

Max theoretic ADD operation throughput: 32 Ops * 16 CUs * 1544 MHz = 790.528 GigaOps/sec

AMD Radeon HD 7970 (GCN 1.0) - 32 bit integer operations/clock cycle per processing element [13]

   MAD 1/4
   MUL 1/4
   ADD 1
   Bit-shift 1
   Bitwise XOR 1

Max theoretic ADD operation throughput: 1 Op * 2048 PEs * 925 MHz = 1894.4 GigaOps/sec

Host-Device Latencies

One reason GPUs are not used as accelerators for chess engines is the host-device latency, aka. kernel-launch-overhead. Nvidia and AMD have not published official numbers, but in practice there is an measurable latency for null-kernels of 5 microseconds [14] up to 100s of microseconds [15]. One solution to overcome this limitation is to couple tasks to batches to be executed in one run [16].

Deep Learning

GPUs are much more suited than CPUs to implement and train Convolutional Neural Networks (CNN), and were therefore also responsible for the deep learning boom, also affecting game playing programs combining CNN with MCTS, as pioneered by Google DeepMind's AlphaGo and AlphaZero entities in Go, Shogi and Chess using TPUs, and the open source projects Leela Zero headed by Gian-Carlo Pascutto for Go and its Leela Chess Zero adaption.

See also

Publications

2009

2010...

2011

2012

2013

2014

2015 ...

2016

2017

2018

Forum Posts

2005 ...

2010 ...

2011

Re: Possible Board Presentation and Move Generation for GPUs by Steffan Westcott, CCC, March 20, 2011

2012

2013

2015 ...

2017

2018

Re: How good is the RTX 2080 Ti for Leela? by Ankan Banerjee, CCC, September 16, 2018

2019

External Links

OpenCL

CUDA

 :

Deep Learning

Game Programming

GitHub - gcp/leela-zero: Go engine with no human-provided knowledge, modeled after the AlphaGo Zero paper

Chess Programming

References

  1. Graphics processing unit - Wikimedia Commons
  2. CUDA C Programming Guide v7.0, Appendix G. COMPUTE CAPABILITIES, Table 12 Technical Specifications per Compute Capability
  3. AMD Accelerated Parallel Processing OpenCL Programming Guide rev2.7, Appendix D Device Parameters, Table D.1 Parameters for 7xxx Devices
  4. CUDA C Programming Guide v7.0, Appendix G.COMPUTE CAPABILITIES
  5. AMD Accelerated Parallel Processing OpenCL Programming Guide rev2.7, Appendix D Device Parameters, Table D.1 Parameters for 7xxx Devices
  6. AMD Vega White Paper
  7. Nvidia Turing White Paper
  8. Vega (GCN 5th generation) from Wikipedia
  9. AnandTech - Nvidia Turing Deep Dive page 4
  10. INSIDE VOLTA
  11. AnandTech - Nvidia Turing Deep Dive page 6
  12. CUDA C Programming Guide v7.0, Chapter 5.4.1. Arithmetic Instructions
  13. AMD_OpenCL_Programming_Optimization_Guide.pdf 3.0beta, Chapter 2.7.1 Instruction Bandwidths
  14. host-device latencies? by Srdja Matovic, Nvidia CUDA ZONE, Feb 28, 2019
  15. host-device latencies? by Srdja Matovic AMD Developer Community, Feb 28, 2019
  16. Re: GPU ANN, how to deal with host-device latencies? by Milos Stanisavljevic, CCC, May 06, 2018
  17. Jetson TK1 Embedded Development Kit | NVIDIA
  18. Jetson GPU architecture by Dann Corbit, CCC, October 18, 2016
  19. Yaron Shoham, Sivan Toledo (2002). Parallel Randomized Best-First Minimax Search. Artificial Intelligence, Vol. 137, Nos. 1-2
  20. Alberto Maria Segre, Sean Forman, Giovanni Resta, Andrew Wildenberg (2002). Nagging: A Scalable Fault-Tolerant Paradigm for Distributed Search. Artificial Intelligence, Vol. 140, Nos. 1-2
  21. Tesla K20 GPU Compute Processor Specifications Released | techPowerUp
  22. Parallel Thread Execution from Wikipedia
  23. NVIDIA Compute PTX: Parallel Thread Execution, ISA Version 1.4, March 31, 2009, pdf
  24. ankan-ban/perft_gpu · GitHub
  25. Tensor processing unit from Wikipedia
  26. GeForce 20 series from Wikipedia
  27. Re: Generate EGTB with graphics cards? by Graham Jones, CCC, January 01, 2019
  28. Fast perft on GPU (upto 20 Billion nps w/o hashing) by Ankan Banerjee, CCC, June 22, 2013

Up one Level