Changes

Jump to: navigation, search

GPU

4,024 bytes added, 24 January
m
2020 ...: link to Chinese gpus
'''GPU''' (Graphics Processing Unit),<br/>
a specialized processor primarily initially intended to for fast [https://en.wikipedia.org/wiki/Image_processing image processing]. GPUs may have more raw computing power than general purpose [https://en.wikipedia.org/wiki/Central_processing_unit CPUs] but need a specialized and parallelized way of programming. [[Leela Chess Zero]] has proven that a [[Best-First|Best-first]] [[Monte-Carlo Tree Search|Monte-Carlo Tree Search]] (MCTS) with [[Deep Learning|deep learning]] methodology will work with GPU architectures.
=History=
In the 1970s and 1980s RAM was expensive and Home Computers used custom graphics chips to operate directly on registers/memory without a dedicated frame buffer resp. texture buffer, like [https://en.wikipedia.org/wiki/Television_Interface_Adaptor TIA]in the [[Atari 8-bit|Atari VCS]] gaming system, [https://en.wikipedia.org/wiki/CTIA_and_GTIA GTIA]+[https://en.wikipedia.org/wiki/ANTIC ANTIC] in the [[Atari 8-bit|Atari 400/800]] series, or [https://en.wikipedia.org/wiki/Original_Chip_Set#Denise Denise]+[https://en.wikipedia.org/wiki/Original_Chip_Set#Agnus Agnus] in the [[Amiga|Commodore Amiga]] series. The 1990s would make 3D graphics and 3D modeling more popular, especially for video games. Cards specifically designed to accelerate 3D math, such as the [https://en.wikipedia.org/wiki/Voodoo2 IMPACT_(computer_graphics) SGI Impact] (1995) in 3D graphics-workstations or [https://en.wikipedia.org/wiki/3dfx#Voodoo_Graphics_PCI 3dfx Voodoo2Voodoo](1996) for playing 3D games on PCs, were used by the video game community to play 3D graphicsemerged. Some game engines could use instead the [[SIMD and SWAR Techniques|SIMD-capabilities]] of CPUs such as the [[Intel]] [[MMX]] instruction set or [[AMD|AMD's]] [[X86#3DNow!|3DNow!]] for [https://en.wikipedia.org/wiki/Real-time_computer_graphics real-time rendering]. Sony's 3D capable chip [https://en.wikipedia.org/wiki/PlayStation_technical_specifications#Graphics_processing_unit_(GPU) GTE] used in the PlayStation (1994) and Nvidia's 2D/3D combi chips like [https://en.wikipedia.org/wiki/NV1 NV1 ] (1995) coined the term GPU for 3D graphics hardware acceleration. With the advent of the [https://en.wikipedia.org/wiki/Unified_shader_model unified shader architecture], like in Nvidia [https://en.wikipedia.org/wiki/Tesla_(microarchitecture) Tesla] (2006), ATI/AMD [https://en.wikipedia.org/wiki/TeraScale_(microarchitecture) TeraScale] (2007) or Intel [https://en.wikipedia.org/wiki/Intel_GMA#GMA_X3000 GMA X3000] (2006), GPGPU frameworks like [https://en.wikipedia.org/wiki/CUDA CUDA] and [[OpenCL|OpenCL]] emerged and gained in popularity.
=GPU in Computer Chess=
There are in main three approaches four ways how to use a GPU for Chesschess:
* As an accelerator in [[Leela_Chess_Zero|Lc0]]: run a neural network for position evaluation on GPU.* Offload the search in [[Zeta|Zeta]]: run a parallel game tree search with move generation and position evaluation on GPU.* As an a hybrid in [http://www.talkchess.com/forum3/viewtopic.php?t=64983&start=4#p729152 perft_gpu]: expand the game tree to a certain degree on CPU and offload to GPU to compute the sub-tree* Neural network training such as [https://github.com/glinscott/nnue-pytorch Stockfish NNUE trainer in Pytorch]<ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=75724 Pytorch NNUE training] by [[Gary Linscott]], [[CCC]], November 08, 2020</ref> or [https://github.com/LeelaChessZero/lczero-training Lc0 TensorFlow Training]
=GPU Chess Engines=
* [https://community.amd.com/t5/opencl/bd-p/opencl-discussions AMD OpenCL Developer Community]
* [https://rocmrocmdocs.githubamd.iocom/en/ ROCm Homepagelatest/index.html AMD ROCm™ documentation]* [httphttps://developer.amdmanualzz.com/wordpressdoc/mediao/2013cggy6/07/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guideamd-opencl-programming-user-revguide-2.7.pdf contents AMD OpenCL Programming Guide]
* [http://developer.amd.com/wordpress/media/2013/12/AMD_OpenCL_Programming_Optimization_Guide2.pdf AMD OpenCL Optimization Guide]
* [https://gpuopen.com/wpamd-content/uploads/2019/08/RDNA_Shader_ISA_5August2019.pdf RDNA Instruction Set]* [https://developer.amd.com/wpisa-content/resourcesdocumentation/Vega_Shader_ISA_28July2017.pdf Vega Instruction SetAMD GPU ISA documentation]
== Apple ==
* [https://docs.nvidia.com/cuda/parallel-thread-execution/index.html Nvidia PTX ISA]
* [https://docs.nvidia.com/cuda/index.html Nvidia CUDA Toolkit Documentation]
* [https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html Nvidia CUDA C++ Programming Guide]
* [https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html Nvidia CUDA C++ Best Practices Guide]
== Further ==
* [https://en.wikipedia.org/wiki/Vulkan#Planned_features Vulkan] (OpenGL sucessor of Khronos Group)* [https://en.wikipedia.org/wiki/DirectCompute DirectCompute] (Microsoft)
* [https://en.wikipedia.org/wiki/C%2B%2B_AMP C++ AMP] (Microsoft)
* [https://en.wikipedia.org/wiki/DirectCompute DirectCompute] (Microsoft)
* [https://en.wikipedia.org/wiki/OpenACC OpenACC] (offload directives)
* [https://en.wikipedia.org/wiki/OpenMP OpenMP] (offload directives)
* 8 KiB constant cache per compute unit
* 16 KiB (48 KiB) L1 cache per compute unit (configurable)
* 768 KiB L2 cachein total
* 1.5 GiB to 3 GiB global memory
AMD Radeon HD 7970 ([https://en.wikipedia.org/wiki/Graphics_Core_Next GCN]) <ref>AMD Accelerated Parallel Processing OpenCL Programming Guide rev2.7, Appendix D Device Parameters, Table D.1 Parameters for 7xxx Devices</ref>
* 16 KiB constant cache per four compute units
* 16 KiB L1 cache per compute unit
* 768 KiB L2 cachein total
* 3 GiB to 6 GiB global memory
==AMD Matrix Cores==
: AMD released 2020 its server-class [https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf CDNA] architecture with Matrix Cores which support MFMA (matrix-fused-multiply-add) operations on various data types like INT8, FP16, BF16, FP32. AMD's CDNA 2 architecture adds FP64 optimized throughput for matrix operations. AMD's RDNA 3 architecture features dedicated AI tensor operation acceleratorsacceleration. AMD's CDNA 3 architecture adds support for FP8 and sparse matrix data (sparsity).
==Intel XMX Cores==
: Intel added XMX, Xe Matrix eXtensions, cores to some of the [https://en.wikipedia.org/wiki/List_of_Intel_graphics_processing_unitsIntel_Xe Intel Xe] GPU series, like [https://en.wikipedia.org/wiki/Intel_Arc#Arc_Alchemist Alchemist Arc Alchemist] and [https://www.intel.com/content/www/us/en/products/sku/232876/intel-data-center-gpu-max-1100/specifications.html Intel Data Center GPU seriesMax Series].
=Host-Device Latencies=
* [https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units List of AMD graphics processing units on Wikipedia]
=== CDNA3 === CDNA3 HPC architecture was unveiled in December, 2023. With MI300A APU model (CPU+GPU+HBM) and MI300X GPU model, both with multi-chip modules design. Featuring Matrix Cores with support for a broad type of precision, as INT8, FP8, BF16, FP16, TF32, FP32, FP64, as well as sparse matrix data (sparsity). Supported by AMD's ROCm open software stack for AMD Instinct accelerators.  * [https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/white-papers/amd-cdna-3-white-paper.pdf AMD CDNA3 Whitepaper]* [https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf AMD Instinct MI300/CDNA3 Instruction Set Architecture]* [https://www.amd.com/en/developer/resources/rocm-hub.html AMD ROCm Developer Hub] === Navi 3x RDNA 3 RDNA3 === RDNA 3 RDNA3 architecture in Radeon RX 7000 series was announced on November 3, 2022, featuring dedicated AI tensor operation acceleratorsacceleration.
* [https://en.wikipedia.org/wiki/Radeon_RX_7000_series AMD Radeon RX 7000 on Wikipedia]
* [https://developer.amd.com/wp-content/resources/RDNA3_Shader_ISA_December2022.pdf RDNA3 Instruction Set Architecture]
=== CDNA 2 CDNA2 === CDNA 2 CDNA2 architecture in MI200 HPC-GPU with optimized FP64 throughput (matrix and vector), multi-chip-module design and Infinity Fabric was unveiled in November, 2021.
* [https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf AMD CDNA2 Whitepaper]
* [https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf AMD CDNA Whitepaper]
* [https://developer.amd.com/wp-content/resources/CDNA1_Shader_ISA_14December2020.pdf CDNA1 CDNA Instruction Set Architecture]
=== Navi 2x RDNA 2 RDNA2 === [https://en.wikipedia.org/wiki/RDNA_(microarchitecture)#RDNA_2 RDNA 2RDNA2] cards were unveiled on October 28, 2020.
* [https://en.wikipedia.org/wiki/Radeon_RX_6000_series AMD Radeon RX 6000 on Wikipedia]
* [https://developer.amd.com/wp-content/resources/RDNA2_Shader_ISA_November2020.pdf RDNA 2 Instruction Set Architecture]
=== Navi RDNA 1 === [https://en.wikipedia.org/wiki/RDNA_(microarchitecture) RDNA 1] cards were unveiled on July 7, 2019.
* [https://www.amd.com/system/files/documents/rdna-whitepaper.pdf RDNA Whitepaper]
* [https://gpuopen.com/wp-content/uploads/2019/08/RDNA_Architecture_public.pdf Architecture Slide Deck]
* [https://gpuopen.com/wp-content/uploads/2019/08/RDNA_Shader_ISA_5August2019.pdf RDNA Instruction SetArchitecture]
=== Vega GCN 5th gen ===
* [https://www.techpowerup.com/gpu-specs/docs/amd-vega-architecture.pdf Architecture Whitepaper]
* [https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf Vega Instruction SetArchitecture]
=== Polaris GCN 4th gen ===
* [https://www.amd.com/system/files/documents/polaris-whitepaper.pdf Architecture Whitepaper]
* [https://developer.amd.com/wordpress/media/2013/12/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf GCN3/4 Instruction Set Architecture]
 
=== Southern Islands GCN 1st gen ===
 
Southern Island cards introduced the [https://en.wikipedia.org/wiki/Graphics_Core_Next GCN] architecture in 2012.
 
* [https://en.wikipedia.org/wiki/Radeon_HD_7000_series AMD Radeon HD 7000 on Wikipedia]
* [https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/programmer-references/si_programming_guide_v2.pdf Southern Islands Programming Guide]
* [https://developer.amd.com/wordpress/media/2012/12/AMD_Southern_Islands_Instruction_Set_Architecture.pdf Southern Islands Instruction Set Architecture]
== Apple ==
* [https://en.wikipedia.org/wiki/List_of_Intel_graphics_processing_units#Gen12 List of Intel Gen12 GPUs on Wikipedia]
* [https://en.wikipedia.org/wiki/List_of_Intel_graphics_processing_unitsIntel_Arc#Arc_Alchemist Alchemist Arc Alchemist series on Wikipedia]
==Nvidia==
* [https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units List of Nvidia graphics processing units on Wikipedia]
 
=== Grace Hopper Superchip ===
The Nvidia GH200 Grace Hopper Superchip was unveiled August, 2023 and combines the Nvidia Grace CPU ([[ARM|ARM v9]]) and Nvidia Hopper GPU architectures via NVLink to deliver a CPU+GPU coherent memory model for accelerated AI and HPC applications.
 
* [https://resources.nvidia.com/en-us-grace-cpu/grace-hopper-superchip NVIDIA Grace Hopper Superchip Data Sheet]
* [https://resources.nvidia.com/en-us-grace-cpu/nvidia-grace-hopper NVIDIA Grace Hopper Superchip Architecture Whitepaper]
=== Ada Lovelace Architecture ===
 
The [https://en.wikipedia.org/wiki/Ada_Lovelace_(microarchitecture) Ada Lovelace microarchitecture] was announced on September 20, 2022, featuring 4th-generation Tensor Cores with FP8, FP16, BF16, TF32 and sparsity acceleration.
* [https://images.nvidia.com/aem-dam/Solutions/geforce/ada/nvidia-ada-gpu-architecture.pdf Ada GPU Whitepaper]
* [https://docs.nvidia.com/cuda/ada-tuning-guide/index.html Ada Tuning Guide]
=== Hopper Architecture ===
* [https://resources.nvidia.com/en-us-tensor-core Hopper H100 Whitepaper]
* [https://docs.nvidia.com/cuda/hopper-tuning-guide/index.html Hopper Tuning Guide]
=== Ampere Architecture ===
* [https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf Ampere GA100 Whitepaper]
* [https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf Ampere GA102 Whitepaper]
* [https://docs.nvidia.com/cuda/ampere-tuning-guide/index.html Ampere GPU Architecture Tuning Guide]
=== Turing Architecture ===
[https://en.wikipedia.org/wiki/Turing_(microarchitecture) Turing] cards were first released in 2018. They are the first consumer cores to launch with RTX, for [https://en.wikipedia.org/wiki/Ray_tracing_(graphics) raytracing], features. These are also the first consumer cards to launch with TensorCores used for matrix multiplications to accelerate [[Neural Networks#Convolutional|convolutional neural networks]]. The Turing GTX line of chips do not offer RTX or TensorCores.
* [https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf Architectural Turing Architecture Whitepaper]* [https://docs.nvidia.com/cuda/turing-tuning-guide/index.html Turing Tuning Guide]
=== Volta Architecture ===
[https://en.wikipedia.org/wiki/Volta_(microarchitecture) Volta] cards were released in 2017. They were the first cards to launch with TensorCores, supporting matrix multiplications to accelerate [[Neural Networks#Convolutional|convolutional neural networks]].
* [https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf Volta Architecture Whitepaper]* [https://docs.nvidia.com/cuda/volta-tuning-guide/index.html Volta Tuning Guide]
=== Pascal Architecture ===
[https://en.wikipedia.org/wiki/Pascal_(microarchitecture) Pascal] cards were first released in 2016.
* [https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf Pascal Architecture Whitepaper]* [https://docs.nvidia.com/cuda/pascal-tuning-guide/index.html Pascal Tuning Guide]
=== Maxwell Architecture ===
[https://en.wikipedia.org/wiki/Maxwell(microarchitecture) Maxwell] cards were first released in 2014.
* [https://web.archive.org/web/20170721113746/http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_980_Whitepaper_FINAL.PDF Maxwell Architecture Whitepaper on archiv.org]* [https://docs.nvidia.com/cuda/maxwell-tuning-guide/index.html Maxwell Tuning Guide]
== PowerVR ==
* [https://talkchess.com/forum3/viewtopic.php?f=2&t=77097 GPU rumors 2021] by [[Srdja Matovic]], [[CCC]], April 16, 2021
* [https://www.talkchess.com/forum3/viewtopic.php?f=7&t=79078 Comparison of all known Sliding lookup algorithms <nowiki>[CUDA]</nowiki>] by [[Daniel Infuehr]], [[CCC]], January 08, 2022 » [[Sliding Piece Attacks]]
* [https://talkchess.com/forum3/viewtopic.php?f=7&t=72566&p=955538#p955538 Re: China boosts in silicon...] by [[Srdja Matovic]], [[CCC]], January 13, 2024
=External Links=
422
edits

Navigation menu