Changes

Jump to: navigation, search

GPU

24,471 bytes added, 00:47, 4 January 2019
Created page with "'''Home * Hardware * GPU''' [[FILE:6600GT GPU.jpg|border|right|thumb| [https://en.wikipedia.org/wiki/GeForce_6_series GeForce 6600GT (NV43)] GPU <ref>[https..."
'''[[Main Page|Home]] * [[Hardware]] * GPU'''

[[FILE:6600GT GPU.jpg|border|right|thumb| [https://en.wikipedia.org/wiki/GeForce_6_series GeForce 6600GT (NV43)] GPU <ref>[https://commons.wikimedia.org/wiki/Graphics_processing_unit Graphics processing unit - Wikimedia Commons]</ref> ]]

'''GPU''' (Graphics Processing Unit),<br/>
a specialized processor primarily intended to rapidly manipulate and alter [[Memory|memory]] for fast [https://en.wikipedia.org/wiki/Image_processing image processing], usually but not necessarily mapped to a [https://en.wikipedia.org/wiki/Framebuffer framebuffer] of a display.
GPUs have more raw computing power than general purpose [https://en.wikipedia.org/wiki/Central_processing_unit CPUs] but need a limited, specialized and massive parallelized way of programming,
not that conform with the serial nature of [[Alpha-Beta|alpha-beta]] if it is about a massive [[Parallel Search|parallel search]] in chess. Instead, [[Best-First|Best-first]] [[Monte-Carlo Tree Search|Monte-Carlo Tree Search]] (MCTS) approaches in conjunction with [[Deep Learning|deep learning]] proved a successful way to go on GPU architectures.

=GPGPU=
There are various frameworks for [https://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units GPGPU], General Purpose computing on Graphics Processing Unit.
Despite language wrappers and mobile devices with special APIs, there are in main three ways to make use of GPGPU.

==Mapping to an API==
* [https://en.wikipedia.org/wiki/BrookGPU BrookGPU] (translates to [https://en.wikipedia.org/wiki/OpenGL OpenGL] and [https://en.wikipedia.org/wiki/DirectX DirectX])
* [https://en.wikipedia.org/wiki/C%2B%2B_AMP C++ AMP] (Open standard by [[Microsoft|Microsoft]] that extends [[Cpp|C++]])
* [https://en.wikipedia.org/wiki/DirectCompute DirectCompute] (GPGPU API by Microsoft)
==Native Compilers==
* [https://en.wikipedia.org/wiki/CUDA CUDA] (GPGPU framework by [https://en.wikipedia.org/wiki/Nvidia Nvidia])
* [https://en.wikipedia.org/wiki/OpenCL OpenCL] (Open Compute Language specified by [https://en.wikipedia.org/wiki/Khronos_Group Khronos Group])
==Intermediate Languages==
* [https://en.wikipedia.org/wiki/Heterogeneous_System_Architecture#HSA_Intermediate_Layer HSAIL]
* [https://en.wikipedia.org/wiki/Parallel_Thread_Execution PTX]
* [https://en.wikipedia.org/wiki/Standard_Portable_Intermediate_Representation SPIR]

=Inside=
Modern GPUs consist of up to hundreds of [[SIMD and SWAR Techniques|SIMD]] or [https://en.wikipedia.org/wiki/Vector_processor Vector] units, coupled to compute units.
Each compute unit processes multiple [https://en.wikipedia.org/wiki/Thread_block#Warps Warps] (Nvidia term) resp. Wavefronts ([[AMD]] term) in [https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads SIMT] fashion.
Each Warp resp. Wavefront runs n (32 or 64) [[Thread|threads]] simultaneously.

The Nvidia [https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_500_Series GeForce GTX 580], for example, is able to run 32 threads in one Warp, in total of 24576 threads, spread on 16 compute units with a total of 512 cores. <ref>CUDA C Programming Guide v7.0, Appendix G. COMPUTE CAPABILITIES, Table 12 Technical Specifications per Compute Capability</ref>
The AMD [https://en.wikipedia.org/wiki/Radeon_HD_7000_Series#Radeon_HD_7900 Radeon HD 7970] is able to run 64 threads in one Wavefront, in total of 81920 threads, spread on 32 compute units with a total of 2048 cores. <ref>AMD Accelerated Parallel Processing OpenCL Programming Guide rev2.7, Appendix D Device Parameters, Table D.1 Parameters for 7xxx Devices</ref>. In real life the register and shared memory size limits this total.

=Memory=
The memory hierarchy of an GPU consists in main of private memory (registers accessed by an single thread resp. work-item), local memory (shared by threads of an block resp. work-items of an work-group ), constant memory, different types of cache and global memory. Size, latency and bandwidth vary between vendors and architectures.

Here the data for the Nvidia GeForce GTX 580 ([https://en.wikipedia.org/wiki/Fermi_%28microarchitecture%29 Fermi)] as an example: <ref>CUDA C Programming Guide v7.0, Appendix G.COMPUTE CAPABILITIES</ref>
* 128 KiB private memory per compute unit
* 48 KiB (16 KiB) local memory per compute unit (configurable)
* 64 KiB constant memory
* 8 KiB constant cache per compute unit
* 16 KiB (48 KiB) L1 cache per compute unit (configurable)
* 768 KiB L2 cache
* 1.5 GiB to 3 GiB global memory
Here the data for the AMD Radeon HD 7970 ([https://en.wikipedia.org/wiki/Graphics_Core_Next GCN]) as an example: <ref>AMD Accelerated Parallel Processing OpenCL Programming Guide rev2.7, Appendix D Device Parameters, Table D.1 Parameters for 7xxx Devices</ref>
* 256 KiB private memory per compute unit
* 64 KiB local memory per compute unit
* 64 KiB constant memory
* 16 KiB constant cache per four compute units
* 16 KiB L1 cache per compute unit
* 768 KiB L2 cache
* 3 GiB to 6 GiB global memory

=Integer Throughput=
GPUs are used in [https://en.wikipedia.org/wiki/High-performance_computing HPC] environments because of their good [https://en.wikipedia.org/wiki/FLOP FLOP]/Watt ratio. The 32 bit integer performance can be less than 32 bit FLOP or 24 bit integer performance.
The instruction throughput depends on the architecture (like Nvidia's [https://en.wikipedia.org/wiki/Tesla_%28microarchitecture%29 Tesla], [https://en.wikipedia.org/wiki/Fermi_%28microarchitecture%29 Fermi], [https://en.wikipedia.org/wiki/Kepler_%28microarchitecture%29 Kepler], [https://en.wikipedia.org/wiki/Maxwell_%28microarchitecture%29 Maxwell] or AMD's [https://en.wikipedia.org/wiki/TeraScale_%28microarchitecture%29 Terascale], [https://en.wikipedia.org/wiki/Graphics_Core_Next GCN]), the brand (like Nvidia [https://en.wikipedia.org/wiki/GeForce GeForce], [https://en.wikipedia.org/wiki/Nvidia_Quadro Quadro], [https://en.wikipedia.org/wiki/Nvidia_Tesla Tesla] or AMD [https://en.wikipedia.org/wiki/Radeon Radeon], [https://en.wikipedia.org/wiki/AMD_FirePro FirePro], [https://en.wikipedia.org/wiki/AMD_FireStream FireStream]) and the specific model.

As an example, here the 32 bit integer performance of the Nvidia GeForce GTX 580 (Fermi, CC 2.0) and AMD Radeon HD 7970 (GCN 1.0):

Nvidia GeForce GTX 580 - 32 bit integer operations/clock cycle per compute unit <ref>CUDA C Programming Guide v7.0, Chapter 5.4.1. Arithmetic Instructions</ref>
* MAD 16
* MUL 16
* ADD 32
* Bit-shift 16
* Bitwise XOR 32
Max theoretic ADD operation throughput: 32 Ops * 16 CUs * 1544 MHz = 790.528 GigaOps/sec

AMD Radeon HD 7970 - 32 bit integer operations/clock cycle per processing element <ref>AMD_OpenCL_Programming_Optimization_Guide.pdf 3.0beta, Chapter 2.7.1 Instruction Bandwidths</ref>
* MAD 1/4
* MUL 1/4
* ADD 1
* Bit-shift 1
* Bitwise XOR 1
Max theoretic ADD operation throughput: 1 Op * 2048 PEs * 925 MHz = 1894.4 GigaOps/sec

=Deep Learning=
GPUs are much more suited than CPUs to implement and train [[Neural Networks#Convolutional|Convolutional Neural Networks]] (CNN), and were therefore also responsible for the [[Deep Learning|deep learning]] boom,
also affecting game playing programs combining CNN with [[Monte-Carlo Tree Search|MCTS]], as pioneered by [[Google]] [[DeepMind|DeepMind's]] [[AlphaGo]] and [[AlphaZero]] entities in [[Go]], [[Shogi]] and [[Chess]] using [https://en.wikipedia.org/wiki/Tensor_processing_unit TPUs], and the open source projects [[Leela Zero]] headed by [[Gian-Carlo Pascutto]] for [[Go]] and its [[Leela Chess Zero]] adaption.

=See also=
* [[Deep Learning]]
** [[AlphaGo]]
** [[AlphaZero]]
** [[Neural Networks#Convolutional|Convolutional Neural Networks]]
** [[Leela Zero]]
** [[Leela Chess Zero]]
* [[FPGA]]
* [[Monte-Carlo Tree Search]]
** [[MCαβ]]
** [[UCT]]
* [[Parallel Search]]
* [[Perft#15|Perft(15)]]
* [[SIMD and SWAR Techniques]]
* [[Thread]]

=Publications=
==2009==
* [[Ren Wu]], [http://www.cedar.buffalo.edu/~binzhang/ Bin Zhang], [http://www.hpl.hp.com/people/meichun_hsu/ Meichun Hsu] ('''2009'''). ''[http://portal.acm.org/citation.cfm?id=1531668 Clustering billions of data points using GPUs]''. [http://www.computingfrontiers.org/2009/ ACM International Conference on Computing Frontiers]
* [http://www.esrl.noaa.gov/research/review/bios/mark.govett.html Mark Govett], [http://www.esrl.noaa.gov/gsd/media/tierney.html Craig Tierney], [[Jacques Middlecoff]], [http://www.cira.colostate.edu/people/view.php?id=297 Tom Henderson] ('''2009'''). ''Using Graphical Processing Units (GPUs) for Next Generation Weather and Climate Prediction Models''. [http://www.cisl.ucar.edu/dir/CAS2K9/ CAS2K9 Workshop], [http://www.cisl.ucar.edu/dir/CAS2K9/Presentations/govett.pdf pdf]
==2010...==
* [https://www.linkedin.com/in/avi-bleiweiss-456a5644 Avi Bleiweiss] ('''2010'''). ''Playing Zero-Sum Games on the GPU''. [https://en.wikipedia.org/wiki/Nvidia NVIDIA Corporation], [http://www.nvidia.com/object/io_1269574709099.html GPU Technology Conference 2010], [http://www.nvidia.com/content/gtc-2010/pdfs/2207_gtc2010.pdf slides as pdf]
* [http://www.esrl.noaa.gov/research/review/bios/mark.govett.html Mark Govett], [[Jacques Middlecoff]], [http://www.cira.colostate.edu/people/view.php?id=297 Tom Henderson] ('''2010'''). ''[http://dl.acm.org/citation.cfm?id=1845128 Running the NIM Next-Generation Weather Model on GPUs]''. [http://www.informatik.uni-trier.de/~ley/db/conf/ccgrid/ccgrid2010.html#GovettMH10 CCGRID 2010]
* [http://www.esrl.noaa.gov/research/review/bios/mark.govett.html Mark Govett], [[Jacques Middlecoff]], [http://www.cira.colostate.edu/people/view.php?id=297 Tom Henderson], [http://www.cug.org/5-publications/proceedings_attendee_lists/CUG09CD/S09_Proceedings/pages/authors/11-15Wednesday/12A-Rosinski/Rosinski-paper.html Jim Rosinski], [http://www.linkedin.com/pub/craig-tierney/5/854/956 Craig Tierney] ('''2011'''). ''Parallelization of the NIM Dynamical Core for GPUs''. [https://is.enes.org/documents/Govett.pdf slides as pdf]
* [[Ľubomír Lackovič]] ('''2011'''). ''[https://hgpu.org/?p=5772 Parallel Game Tree Search Using GPU]''. Institute of Informatics and Software Engineering, [https://en.wikipedia.org/wiki/Faculty_of_Informatics_and_Information_Technologies Faculty of Informatics and Information Technologies], [https://en.wikipedia.org/wiki/Slovak_University_of_Technology_in_Bratislava Slovak University of Technology in Bratislava], [http://acmbulletin.fiit.stuba.sk/vol3num2/lackovic.pdf pdf]
* [[Dan Anthony Feliciano Alcantara]] ('''2011'''). ''Efficient Hash Tables on the GPU''. Ph. D. thesis, [https://en.wikipedia.org/wiki/University_of_California,_Davis University of California, Davis], [http://idav.ucdavis.edu/~dfalcant//downloads/dissertation.pdf pdf] » [[Hash Table]]
* [[Damian Sulewski]] ('''2011'''). ''Large-Scale Parallel State Space Search Utilizing Graphics Processing Units and Solid State Disks''. Ph.D. thesis, [[University of Dortmund]], [https://eldorado.tu-dortmund.de/dspace/bitstream/2003/29418/1/Dissertation.pdf pdf]
* [[Damjan Strnad]], [[Nikola Guid]] ('''2011'''). ''[http://cit.fer.hr/index.php/CIT/article/view/2029 Parallel Alpha-Beta Algorithm on the GPU]''. [http://cit.fer.hr/index.php/CIT CIT. Journal of Computing and Information Technology], Vol. 19, No. 4 » [[Parallel Search]], [[Othello|Reversi]]
* [[Liang Li]], [[Hong Liu]], [[Peiyu Liu]], [[Taoying Liu]], [[Wei Li]], [[Hao Wang]] ('''2012'''). ''[https://www.semanticscholar.org/paper/A-Node-based-Parallel-Game-Tree-Algorithm-Using-Li-Liu/be21d7b9b91957b700aab4ce002e6753b826ff54 A Node-based Parallel Game Tree Algorithm Using GPUs]''. CLUSTER 2012 » [[Parallel Search]]
* [[S. Ali Mirsoleimani]], [http://dblp.uni-trier.de/pers/hd/k/Karami:Ali Ali Karami], [http://dblp.uni-trier.de/pers/hd/k/Khunjush:Farshad Farshad Khunjush] ('''2013'''). ''[https://scholar.google.de/citations?view_op=view_citation&hl=en&user=VvkRESgAAAAJ&citation_for_view=VvkRESgAAAAJ:ufrVoPGSRksC A parallel memetic algorithm on GPU to solve the task scheduling problem in heterogeneous environments]''. [http://www.sigevo.org/gecco-2013/program.html GECCO '13]
* [https://dblp.uni-trier.de/pers/hd/d/Dang:Qingqing Qingqing Dang], [https://dblp.uni-trier.de/pers/hd/y/Yan:Shengen Shengen Yan], [[Ren Wu]] ('''2014'''). ''[https://ieeexplore.ieee.org/document/7097862 A fast integral image generation algorithm on GPUs]''. [https://dblp.uni-trier.de/db/conf/icpads/icpads2014.html ICPADS 2014]
==2015 ...==
* [[Peter H. Jin]], [[Kurt Keutzer]] ('''2015'''). ''Convolutional Monte Carlo Rollouts in Go''. [http://arxiv.org/abs/1512.03375 arXiv:1512.03375] » [[Deep Learning]], [[Go]], [[Monte-Carlo Tree Search|MCTS]]
* [[Liang Li]], [[Hong Liu]], [[Hao Wang]], [[Taoying Liu]], [[Wei Li]] ('''2015'''). ''[https://ieeexplore.ieee.org/document/6868996 A Parallel Algorithm for Game Tree Search Using GPGPU]''. [[IEEE#TPDS|IEEE Transactions on Parallel and Distributed Systems]], Vol. 26, No. 8 » [[Parallel Search]]
* <span id="Astro"></span>[https://www.linkedin.com/in/sean-sheen-b99aba89 Sean Sheen] ('''2016'''). ''[https://digitalcommons.calpoly.edu/theses/1567/ Astro - A Low-Cost, Low-Power Cluster for CPU-GPU Hybrid Computing using the Jetson TK1]''. Master's thesis, [https://en.wikipedia.org/wiki/California_Polytechnic_State_University California Polytechnic State University], [https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?referer=&httpsredir=1&article=2723&context=theses pdf] <ref>[http://www.nvidia.com/object/jetson-tk1-embedded-dev-kit.html Jetson TK1 Embedded Development Kit | NVIDIA]</ref> <ref>[http://www.talkchess.com/forum/viewtopic.php?t=61761 Jetson GPU architecture] by [[Dann Corbit]], [[CCC]], October 18, 2016</ref>
* [[David Silver]], [[Shih-Chieh Huang|Aja Huang]], [[Chris J. Maddison]], [[Arthur Guez]], [[Laurent Sifre]], [[George van den Driessche]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Veda Panneershelvam]], [[Marc Lanctot]], [[Sander Dieleman]], [[Dominik Grewe]], [[John Nham]], [[Nal Kalchbrenner]], [[Ilya Sutskever]], [[Timothy Lillicrap]], [[Madeleine Leach]], [[Koray Kavukcuoglu]], [[Thore Graepel]], [[Demis Hassabis]] ('''2016'''). ''[http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html Mastering the game of Go with deep neural networks and tree search]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 529 » [[AlphaGo]]
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815] » [[AlphaZero]]
* [[Tristan Cazenave]] ('''2017'''). ''[http://ieeexplore.ieee.org/document/7875402/ Residual Networks for Computer Go]''. [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. PP, No. 99, [http://www.lamsade.dauphine.fr/~cazenave/papers/resnet.pdf pdf]
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2018'''). ''[http://science.sciencemag.org/content/362/6419/1140 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play]''. [https://en.wikipedia.org/wiki/Science_(journal) Science], Vol. 362, No. 6419

=Forum Posts=
==2005 ...==
* [http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=5480 Hardware assist] by [[Nicolai Czempin]], [[Computer Chess Forums|Winboard Forum]], August 27, 2006
* [http://www.talkchess.com/forum/viewtopic.php?t=22732 Monte carlo on a NVIDIA GPU ?] by [[Marco Costalba]], [[CCC]], August 01, 2008
==2010 ...==
* [http://www.talkchess.com/forum/viewtopic.php?t=32750 Using the GPU] by [[Louis Zulli]], [[CCC]], February 19, 2010
* [http://www.talkchess.com/forum/viewtopic.php?t=38002 GPGPU and computer chess] by Wim Sjoho, [[CCC]], February 09, 2011
* [http://www.talkchess.com/forum/viewtopic.php?t=38478 Possible Board Presentation and Move Generation for GPUs?] by [[Srdja Matovic]], [[CCC]], March 19, 2011
* [http://www.talkchess.com/forum/viewtopic.php?t=39459 Zeta plays chess on a gpu] by [[Srdja Matovic]], [[CCC]], June 23, 2011 » [[Zeta]]
* [http://www.talkchess.com/forum/viewtopic.php?t=39606 GPU Search Methods] by [[Joshua Haglund]], [[CCC]], July 04, 2011
* [http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=442052&t=41853 Possible Search Algorithms for GPUs?] by [[Srdja Matovic]], [[CCC]], January 07, 2012 <ref>[[Yaron Shoham]], [[Sivan Toledo]] ('''2001'''). ''Parallel randomized best-first minimax search''. School of Computer Science, [https://en.wikipedia.org/wiki/Tel_Aviv_University Tel-Aviv University], [http://www.tau.ac.il/%7Estoledo/Pubs/rbf-ai-revised.pdf pdf]</ref> <ref>[[Alberto Maria Segre]], [[Sean Forman]], [[Giovanni Resta]], [[Andrew Wildenberg]] ('''2002'''). ''Nagging: A Scalable Fault-Tolerant Paradigm for Distributed Search''. [https://en.wikipedia.org/wiki/Artificial_Intelligence_%28journal%29 Artificial Intelligence] 140, [http://jmvidal.cse.sc.edu/library/segre02a.pdf pdf], [http://compepi.cs.uiowa.edu/uploads/Profiles/Segre/nag.pdf pdf]</ref>
* [http://www.talkchess.com/forum/viewtopic.php?t=42590 uct on gpu] by [[Daniel Shawul]], [[CCC]], February 24, 2012 » [[UCT]]
* [http://www.talkchess.com/forum/viewtopic.php?t=43971 Is there such a thing as branchless move generation?] by [[John Hamlen]], [[CCC]], June 07, 2012 » [[Move Generation]]
* [http://www.talkchess.com/forum/viewtopic.php?t=44014 Choosing a GPU platform: AMD and Nvidia] by [[John Hamlen]], [[CCC]], June 10, 2012
* [http://www.talkchess.com/forum/viewtopic.php?t=46277 Nvidias K20 with Recursion] by [[Srdja Matovic]], [[CCC]], December 04, 2012 <ref>[http://www.techpowerup.com/173846/Tesla-K20-GPU-Compute-Processor-Specifications-Released.html Tesla K20 GPU Compute Processor Specifications Released | techPowerUp]</ref>
* [http://www.talkchess.com/forum/viewtopic.php?t=46974 Kogge Stone, Vector Based] by [[Srdja Matovic]], [[CCC]], January 22, 2013 » [[Kogge-Stone Algorithm]] <ref>[https://en.wikipedia.org/wiki/Parallel_Thread_Execution Parallel Thread Execution from Wikipedia]</ref> <ref>NVIDIA Compute PTX: Parallel Thread Execution, ISA Version 1.4, March 31, 2009, [http://www.nvidia.com/content/CUDA-ptx_isa_1.4.pdf pdf]</ref>
* [http://www.talkchess.com/forum/viewtopic.php?t=47344 GPU chess engine] by Samuel Siltanen, [[CCC]], February 27, 2013
* [http://www.talkchess.com/forum/viewtopic.php?t=48387 Fast perft on GPU (upto 20 Billion nps w/o hashing)] by [[Ankan Banerjee]], [[CCC]], June 22, 2013 » [[Perft]], [[Kogge-Stone Algorithm]] <ref>[https://github.com/ankan-ban/perft_gpu ankan-ban/perft_gpu · GitHub]</ref>
==2015 ...==
* [http://www.talkchess.com/forum/viewtopic.php?t=60386 GPU chess update, local memory...] by [[Srdja Matovic]], [[CCC]], June 06, 2016
* [http://www.talkchess.com/forum/viewtopic.php?t=61761 Jetson GPU architecture] by [[Dann Corbit]], [[CCC]], October 18, 2016 » [[GPU#Astro|Astro]]
* [http://www.talkchess.com/forum/viewtopic.php?t=61925 Pigeon is now running on the GPU] by [[Stuart Riffle]], [[CCC]], November 02, 2016 » [[Pigeon]]
* [http://www.talkchess.com/forum/viewtopic.php?t=63346 Back to the basics, generating moves on gpu in parallel...] by [[Srdja Matovic]], [[CCC]], March 05, 2017 » [[Move Generation]]
* [http://www.talkchess.com/forum/viewtopic.php?t=64983&start=9 Re: Perft(15): comparison of estimates with Ankan's result] by [[Ankan Banerjee]], [[CCC]], August 26, 2017 » [[Perft#15|Perft(15)]]
* [http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?tid=32317 Chess Engine and GPU] by Fishpov , [[Computer Chess Forums|Rybka Forum]], October 09, 2017
* [http://www.talkchess.com/forum/viewtopic.php?t=66025 To TPU or not to TPU...] by [[Srdja Matovic]], [[CCC]], December 16, 2017 » [[Deep Learning]] <ref>[https://en.wikipedia.org/wiki/Tensor_processing_unit Tensor processing unit from Wikipedia]</ref>
* [http://www.talkchess.com/forum/viewtopic.php?t=66280 Announcing lczero] by [[Gary Linscott|Gary]], [[CCC]], January 09, 2018 » [[Leela Chess Zero]]

=External Links=
* [https://en.wikipedia.org/wiki/Graphics_processing_unit Graphics processing unit from Wikipedia]
* [https://en.wikipedia.org/wiki/Heterogeneous_System_Architecture Heterogeneous System Architecture from Wikipedia]
* [https://en.wikipedia.org/wiki/Tensor_processing_unit Tensor processing unit from Wikipedia]
* [https://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units General-purpose computing on graphics processing units (GPGPU) from Wikipedia
* [https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units List of AMD graphics processing units from Wikipedia]
* [https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units List of Nvidia graphics processing units from Wikipedia]
* [https://developer.nvidia.com/ NVIDIA Developer]
* [https://developer.nvidia.com/nvidia-gpu-programming-guide NVIDIA GPU Programming Guide]
==OpenCL==
* [https://en.wikipedia.org/wiki/OpenCL OpenCL from Wikipedia]
* [https://www.codeproject.com/Articles/110685/Part-1-OpenCL-Portable-Parallelism Part 1: OpenCL™ – Portable Parallelism - CodeProject]
* [https://www.codeproject.com/Articles/122405/Part-2-OpenCL-Memory-Spaces Part 2: OpenCL™ – Memory Spaces - CodeProject]
==CUDA==
* [https://en.wikipedia.org/wiki/CUDA CUDA from Wikipedia]
* [https://developer.nvidia.com/cuda-zone CUDA Zone | NVIDIA Developer]
* [http://parse.ele.tue.nl/education/cluster2 Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster]
==Deep Learning==
* [https://developer.nvidia.com/deep-learning Deep Learning | NVIDIA Developer] » [[Deep Learning]]
* [https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/ Deep Learning in a Nutshell: Core Concepts] by [http://timdettmers.com/ Tim Dettmers], [https://devblogs.nvidia.com/parallelforall/ Parallel Forall], November 3, 2015
* [https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-history-training/ Deep Learning in a Nutshell: History and Training] by [http://timdettmers.com/ Tim Dettmers], [https://devblogs.nvidia.com/parallelforall/ Parallel Forall], December 16, 2015
* [https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-sequence-learning/ Deep Learning in a Nutshell: Sequence Learning] by [http://timdettmers.com/ Tim Dettmers], [https://devblogs.nvidia.com/parallelforall/ Parallel Forall], March 7, 2016
* [https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-reinforcement-learning/ Deep Learning in a Nutshell: Reinforcement Learning] by [http://timdettmers.com/ Tim Dettmers], [https://devblogs.nvidia.com/parallelforall/ Parallel Forall], September 8, 2016
* [https://blog.dominodatalab.com/gpu-computing-and-deep-learning/ Faster deep learning with GPUs and Theano]
* [https://en.wikipedia.org/wiki/Theano_(software) Theano (software) from Wikipedia]
* [https://en.wikipedia.org/wiki/TensorFlow TensorFlow from Wikipedia]
==Game Programming==
* [http://andy-thomason.github.io/lecture_notes/agp/agp_gpgpu_programming.html Advanced game programming | Session 5 - GPGPU programming] by [[Andy Thomason]]
* [https://zero.sjeng.org/ Leela Zero] by [[Gian-Carlo Pascutto]] » [[Leela Zero]]
: [https://github.com/gcp/leela-zero GitHub - gcp/leela-zero: Go engine with no human-provided knowledge, modeled after the AlphaGo Zero paper]
==Chess Programming==
* [http://gpuchess.blogspot.com/ GPU Chess Blog]
* [https://chessgpgpu.blogspot.com/ Chess on a GPGPU]
* [https://web.archive.org/web/20140604141114/http://zeta-chess.blogspot.com/ Zeta OpenCL Chess]
* [https://github.com/ankan-ban/perft_gpu ankan-ban/perft_gpu · GitHub] » [[Perft]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=48387 Fast perft on GPU (upto 20 Billion nps w/o hashing)] by [[Ankan Banerjee]], [[CCC]], June 22, 2013</ref>
* [https://github.com/LeelaChessZero LCZero · GitHub] » [[Leela Chess Zero]]

=References=
<references />
'''[[Hardware|Up one Level]]'''

Navigation menu