Changes

Jump to: navigation, search

GPU

2,667 bytes added, 24 January
m
2020 ...: link to Chinese gpus
'''GPU''' (Graphics Processing Unit),<br/>
a specialized processor primarily initially intended to for fast [https://en.wikipedia.org/wiki/Image_processing image processing]. GPUs may have more raw computing power than general purpose [https://en.wikipedia.org/wiki/Central_processing_unit CPUs] but need a specialized and parallelized way of programming. [[Leela Chess Zero]] has proven that a [[Best-First|Best-first]] [[Monte-Carlo Tree Search|Monte-Carlo Tree Search]] (MCTS) with [[Deep Learning|deep learning]] methodology will work with GPU architectures.
=History=
In the 1970s and 1980s RAM was expensive and Home Computers used custom graphics chips to operate directly on registers/memory without a dedicated frame buffer resp. texture buffer, like [https://en.wikipedia.org/wiki/Television_Interface_Adaptor TIA]in the [[Atari 8-bit|Atari VCS]] gaming system, [https://en.wikipedia.org/wiki/CTIA_and_GTIA GTIA]+[https://en.wikipedia.org/wiki/ANTIC ANTIC] in the [[Atari 8-bit|Atari 400/800]] series, or [https://en.wikipedia.org/wiki/Original_Chip_Set#Denise Denise]+[https://en.wikipedia.org/wiki/Original_Chip_Set#Agnus Agnus] in the [[Amiga|Commodore Amiga]] series. The 1990s would make 3D graphics and 3D modeling more popular, especially for video games. Cards specifically designed to accelerate 3D math, such as the [https://en.wikipedia.org/wiki/Voodoo2 IMPACT_(computer_graphics) SGI Impact] (1995) in 3D graphics-workstations or [https://en.wikipedia.org/wiki/3dfx#Voodoo_Graphics_PCI 3dfx Voodoo2Voodoo](1996) for playing 3D games on PCs, were used by the video game community to play 3D graphicsemerged. Some game engines could use instead the [[SIMD and SWAR Techniques|SIMD-capabilities]] of CPUs such as the [[Intel]] [[MMX]] instruction set or [[AMD|AMD's]] [[X86#3DNow!|3DNow!]] for [https://en.wikipedia.org/wiki/Real-time_computer_graphics real-time rendering]. Sony's 3D capable chip [https://en.wikipedia.org/wiki/PlayStation_technical_specifications#Graphics_processing_unit_(GPU) GTE] used in the PlayStation (1994) and Nvidia's 2D/3D combi chips like [https://en.wikipedia.org/wiki/NV1 NV1 ] (1995) coined the term GPU for 3D graphics hardware acceleration. With the advent of the [https://en.wikipedia.org/wiki/Unified_shader_model unified shader architecture], like in Nvidia [https://en.wikipedia.org/wiki/Tesla_(microarchitecture) Tesla] (2006), ATI/AMD [https://en.wikipedia.org/wiki/TeraScale_(microarchitecture) TeraScale] (2007) or Intel [https://en.wikipedia.org/wiki/Intel_GMA#GMA_X3000 GMA X3000] (2006), GPGPU frameworks like [https://en.wikipedia.org/wiki/CUDA CUDA] and [[OpenCL|OpenCL]] emerged and gained in popularity.
=GPU in Computer Chess=
There are in main three approaches four ways how to use a GPU for Chesschess:
* As an accelerator in [[Leela_Chess_Zero|Lc0]]: run a neural network for position evaluation on GPU.* Offload the search in [[Zeta|Zeta]]: run a parallel game tree search with move generation and position evaluation on GPU.* As an a hybrid in [http://www.talkchess.com/forum3/viewtopic.php?t=64983&start=4#p729152 perft_gpu]: expand the game tree to a certain degree on CPU and offload to GPU to compute the sub-tree* Neural network training such as [https://github.com/glinscott/nnue-pytorch Stockfish NNUE trainer in Pytorch]<ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=75724 Pytorch NNUE training] by [[Gary Linscott]], [[CCC]], November 08, 2020</ref> or [https://github.com/LeelaChessZero/lczero-training Lc0 TensorFlow Training]
=GPU Chess Engines=
* [https://community.amd.com/t5/opencl/bd-p/opencl-discussions AMD OpenCL Developer Community]
* [https://rocmrocmdocs.githubamd.iocom/en/ ROCm Homepagelatest/index.html AMD ROCm™ documentation]* [httphttps://developer.amdmanualzz.com/wordpressdoc/mediao/2013cggy6/07/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guideamd-opencl-programming-user-revguide-2.7.pdf contents AMD OpenCL Programming Guide]
* [http://developer.amd.com/wordpress/media/2013/12/AMD_OpenCL_Programming_Optimization_Guide2.pdf AMD OpenCL Optimization Guide]
* [https://gpuopen.com/wpamd-content/uploads/2019/08/RDNA_Shader_ISA_5August2019.pdf RDNA Instruction Set]* [https://developer.amd.com/wpisa-content/resourcesdocumentation/Vega_Shader_ISA_28July2017.pdf Vega Instruction SetAMD GPU ISA documentation]
== Apple ==
* [https://docs.nvidia.com/cuda/parallel-thread-execution/index.html Nvidia PTX ISA]
* [https://docs.nvidia.com/cuda/index.html Nvidia CUDA Toolkit Documentation]
* [https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html Nvidia CUDA C++ Programming Guide]
* [https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html Nvidia CUDA C++ Best Practices Guide]
== Further ==
* [https://en.wikipedia.org/wiki/Vulkan#Planned_features Vulkan] (OpenGL sucessor of Khronos Group)* [https://en.wikipedia.org/wiki/DirectCompute DirectCompute] (Microsoft)
* [https://en.wikipedia.org/wiki/C%2B%2B_AMP C++ AMP] (Microsoft)
* [https://en.wikipedia.org/wiki/DirectCompute DirectCompute] (Microsoft)
* [https://en.wikipedia.org/wiki/OpenACC OpenACC] (offload directives)
* [https://en.wikipedia.org/wiki/OpenMP OpenMP] (offload directives)
* 8 KiB constant cache per compute unit
* 16 KiB (48 KiB) L1 cache per compute unit (configurable)
* 768 KiB L2 cachein total
* 1.5 GiB to 3 GiB global memory
AMD Radeon HD 7970 ([https://en.wikipedia.org/wiki/Graphics_Core_Next GCN]) <ref>AMD Accelerated Parallel Processing OpenCL Programming Guide rev2.7, Appendix D Device Parameters, Table D.1 Parameters for 7xxx Devices</ref>
* 16 KiB constant cache per four compute units
* 16 KiB L1 cache per compute unit
* 768 KiB L2 cachein total
* 3 GiB to 6 GiB global memory
==AMD Matrix Cores==
: AMD released 2020 its server-class [https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf CDNA] architecture with Matrix Cores which support MFMA (matrix-fused-multiply-add) operations on various data types like INT8, FP16, BF16, FP32. AMD's CDNA 2 architecture adds FP64 optimized throughput for matrix operations. AMD's RDNA 3 architecture features dedicated AI tensor operation acceleratorsacceleration. AMD's CDNA 3 architecture adds support for FP8 and sparse matrix data (sparsity).
==Intel XMX Cores==
: Intel added XMX, Xe Matrix eXtensions, cores to some of the [https://en.wikipedia.org/wiki/List_of_Intel_graphics_processing_unitsIntel_Xe Intel Xe] GPU series, like [https://en.wikipedia.org/wiki/Intel_Arc#Arc_Alchemist Alchemist Arc Alchemist] and [https://www.intel.com/content/www/us/en/products/sku/232876/intel-data-center-gpu-max-1100/specifications.html Intel Data Center GPU seriesMax Series].
=Host-Device Latencies=
* [https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units List of AMD graphics processing units on Wikipedia]
 
=== CDNA3 ===
CDNA3 HPC architecture was unveiled in December, 2023. With MI300A APU model (CPU+GPU+HBM) and MI300X GPU model, both with multi-chip modules design. Featuring Matrix Cores with support for a broad type of precision, as INT8, FP8, BF16, FP16, TF32, FP32, FP64, as well as sparse matrix data (sparsity). Supported by AMD's ROCm open software stack for AMD Instinct accelerators.
 
* [https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/white-papers/amd-cdna-3-white-paper.pdf AMD CDNA3 Whitepaper]
* [https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf AMD Instinct MI300/CDNA3 Instruction Set Architecture]
* [https://www.amd.com/en/developer/resources/rocm-hub.html AMD ROCm Developer Hub]
=== Navi 3x RDNA3 ===
RDNA3 architecture in Radeon RX 7000 series was announced on November 3, 2022, featuring dedicated AI tensor operation acceleratorsacceleration.
* [https://en.wikipedia.org/wiki/Radeon_RX_7000_series AMD Radeon RX 7000 on Wikipedia]
* [https://www.amd.com/system/files/documents/rdna-whitepaper.pdf RDNA Whitepaper]
* [https://gpuopen.com/wp-content/uploads/2019/08/RDNA_Architecture_public.pdf Architecture Slide Deck]
* [https://gpuopen.com/wp-content/uploads/2019/08/RDNA_Shader_ISA_5August2019.pdf RDNA Instruction SetArchitecture]
=== Vega GCN 5th gen ===
* [https://www.techpowerup.com/gpu-specs/docs/amd-vega-architecture.pdf Architecture Whitepaper]
* [https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf Vega Instruction SetArchitecture]
=== Polaris GCN 4th gen ===
* [https://www.amd.com/system/files/documents/polaris-whitepaper.pdf Architecture Whitepaper]
* [https://developer.amd.com/wordpress/media/2013/12/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf GCN3/4 Instruction Set Architecture]
=== Southern Islands GCN 1st gen ===
* [https://en.wikipedia.org/wiki/Radeon_HD_7000_series AMD Radeon HD 7000 on Wikipedia]
* [https://www.amd.wpenginepowered.com/wordpresscontent/dam/amd/en/mediadocuments/2013radeon-tech-docs/10programmer-references/si_programming_guide_v2.pdf Southern Islands Programming Guide]* [https://developer.amd.wpenginepowered.com/wordpress/media/20132012/0712/AMD_Southern_Islands_Instruction_Set_Architecture1AMD_Southern_Islands_Instruction_Set_Architecture.pdf Southern Islands Instruction Set Architecture]
== Apple ==
* [https://en.wikipedia.org/wiki/List_of_Intel_graphics_processing_units#Gen12 List of Intel Gen12 GPUs on Wikipedia]
* [https://en.wikipedia.org/wiki/List_of_Intel_graphics_processing_unitsIntel_Arc#Arc_Alchemist Alchemist Arc Alchemist series on Wikipedia]
==Nvidia==
* [https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units List of Nvidia graphics processing units on Wikipedia]
 
=== Grace Hopper Superchip ===
The Nvidia GH200 Grace Hopper Superchip was unveiled August, 2023 and combines the Nvidia Grace CPU ([[ARM|ARM v9]]) and Nvidia Hopper GPU architectures via NVLink to deliver a CPU+GPU coherent memory model for accelerated AI and HPC applications.
 
* [https://resources.nvidia.com/en-us-grace-cpu/grace-hopper-superchip NVIDIA Grace Hopper Superchip Data Sheet]
* [https://resources.nvidia.com/en-us-grace-cpu/nvidia-grace-hopper NVIDIA Grace Hopper Superchip Architecture Whitepaper]
=== Ada Lovelace Architecture ===
 
The [https://en.wikipedia.org/wiki/Ada_Lovelace_(microarchitecture) Ada Lovelace microarchitecture] was announced on September 20, 2022, featuring 4th-generation Tensor Cores with FP8, FP16, BF16, TF32 and sparsity acceleration.
* [https://talkchess.com/forum3/viewtopic.php?f=2&t=77097 GPU rumors 2021] by [[Srdja Matovic]], [[CCC]], April 16, 2021
* [https://www.talkchess.com/forum3/viewtopic.php?f=7&t=79078 Comparison of all known Sliding lookup algorithms <nowiki>[CUDA]</nowiki>] by [[Daniel Infuehr]], [[CCC]], January 08, 2022 » [[Sliding Piece Attacks]]
* [https://talkchess.com/forum3/viewtopic.php?f=7&t=72566&p=955538#p955538 Re: China boosts in silicon...] by [[Srdja Matovic]], [[CCC]], January 13, 2024
=External Links=
422
edits

Navigation menu