Changes

← Older edit

GPU

4,024 bytes added, 24 January

m

→‎2020 ...: link to Chinese gpus

'''GPU''' (Graphics Processing Unit),<br/>

a specialized processor ~~primarily~~ initially intended to for fast [https://en.wikipedia.org/wiki/Image_processing image processing]. GPUs may have more raw computing power than general purpose [https://en.wikipedia.org/wiki/Central_processing_unit CPUs] but need a specialized and parallelized way of programming. [[Leela Chess Zero]] has proven that a [[Best-First|Best-first]] [[Monte-Carlo Tree Search|Monte-Carlo Tree Search]] (MCTS) with [[Deep Learning|deep learning]] methodology will work with GPU architectures.

=History=

In the 1970s and 1980s RAM was expensive and Home Computers used custom graphics chips to operate directly on registers/memory without a dedicated frame buffer resp. texture buffer, like [https://en.wikipedia.org/wiki/Television_Interface_Adaptor TIA]in the [[Atari 8-bit|Atari VCS]] gaming system, [https://en.wikipedia.org/wiki/CTIA_and_GTIA GTIA]+[https://en.wikipedia.org/wiki/ANTIC ANTIC] in the [[Atari 8-bit|Atari 400/800]] series, or [https://en.wikipedia.org/wiki/Original_Chip_Set#Denise Denise]+[https://en.wikipedia.org/wiki/Original_Chip_Set#Agnus Agnus] in the [[Amiga|Commodore Amiga]] series. The 1990s would make 3D graphics and 3D modeling more popular, especially for video games. Cards specifically designed to accelerate 3D math, such as ~~the~~ [https://en.wikipedia.org/wiki/~~Voodoo2~~ IMPACT_(computer_graphics) SGI Impact] (1995) in 3D graphics-workstations or [https://en.wikipedia.org/wiki/3dfx#Voodoo_Graphics_PCI 3dfx ~~Voodoo2~~Voodoo](1996) for playing 3D games on PCs, ~~were used by the video game community to play 3D graphics~~emerged. Some game engines could use instead the [[SIMD and SWAR Techniques|SIMD-capabilities]] of CPUs such as the [[Intel]] [[MMX]] instruction set or [[AMD|AMD's]] [[X86#3DNow!|3DNow!]] for [https://en.wikipedia.org/wiki/Real-time_computer_graphics real-time rendering]. Sony's 3D capable chip [https://en.wikipedia.org/wiki/PlayStation_technical_specifications#Graphics_processing_unit_(GPU) GTE] used in the PlayStation (1994) and Nvidia's 2D/3D combi chips like [https://en.wikipedia.org/wiki/NV1 NV1 ] (1995) coined the term GPU for 3D graphics hardware acceleration. With the advent of the [https://en.wikipedia.org/wiki/Unified_shader_model unified shader architecture], like in Nvidia [https://en.wikipedia.org/wiki/Tesla_(microarchitecture) Tesla] (2006), ATI/AMD [https://en.wikipedia.org/wiki/TeraScale_(microarchitecture) TeraScale] (2007) or Intel [https://en.wikipedia.org/wiki/Intel_GMA#GMA_X3000 GMA X3000] (2006), GPGPU frameworks like [https://en.wikipedia.org/wiki/CUDA CUDA] and [[OpenCL|OpenCL]] emerged and gained in popularity.

=GPU in Computer Chess=

There are in main ~~three approaches~~ four ways how to use a GPU for ~~Chess~~chess:

* As an accelerator in [[Leela_Chess_Zero|Lc0]]: run a neural network for position evaluation on GPU.* Offload the search in [[Zeta|Zeta]]: run a parallel game tree search with move generation and position evaluation on GPU.* As an a hybrid in [http://www.talkchess.com/forum3/viewtopic.php?t=64983&start=4#p729152 perft_gpu]: expand the game tree to a certain degree on CPU and offload to GPU to compute the sub-tree* Neural network training such as [https://github.com/glinscott/nnue-pytorch Stockfish NNUE trainer in Pytorch]<ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=75724 Pytorch NNUE training] by [[Gary Linscott]], [[CCC]], November 08, 2020</ref> or [https://github.com/LeelaChessZero/lczero-training Lc0 TensorFlow Training]

=GPU Chess Engines=

* [https://community.amd.com/t5/opencl/bd-p/opencl-discussions AMD OpenCL Developer Community]

* [https://~~rocm~~rocmdocs.~~github~~amd.iocom/en/ ~~ROCm Homepage~~latest/index.html AMD ROCm™ documentation]* [~~http~~https://~~developer.amd~~manualzz.com/~~wordpress~~doc/~~media~~o/~~2013~~cggy6/~~07/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide~~amd-opencl-programming-user-~~rev~~guide-~~2.7.pdf~~ contents AMD OpenCL Programming Guide]

* [http://developer.amd.com/wordpress/media/2013/12/AMD_OpenCL_Programming_Optimization_Guide2.pdf AMD OpenCL Optimization Guide]

* [https://gpuopen.com/wpamd-~~content/uploads/2019/08/RDNA_Shader_ISA_5August2019.pdf RDNA Instruction Set]~~* [https://developer.amd.com/wpisa-~~content/resources~~documentation/~~Vega_Shader_ISA_28July2017.pdf Vega Instruction Set~~AMD GPU ISA documentation]

== Apple ==

* [https://docs.nvidia.com/cuda/parallel-thread-execution/index.html Nvidia PTX ISA]

* [https://docs.nvidia.com/cuda/index.html Nvidia CUDA Toolkit Documentation]

* [https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html Nvidia CUDA C++ Programming Guide]

* [https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html Nvidia CUDA C++ Best Practices Guide]

== Further ==

* [https://en.wikipedia.org/wiki/Vulkan#Planned_features Vulkan] (OpenGL sucessor of Khronos Group)* [https://en.wikipedia.org/wiki/DirectCompute DirectCompute] (Microsoft)

* [https://en.wikipedia.org/wiki/C%2B%2B_AMP C++ AMP] (Microsoft)

* [https://en.wikipedia.org/wiki/DirectCompute DirectCompute] (Microsoft)

* [https://en.wikipedia.org/wiki/OpenACC OpenACC] (offload directives)

* [https://en.wikipedia.org/wiki/OpenMP OpenMP] (offload directives)

* 8 KiB constant cache per compute unit

* 16 KiB (48 KiB) L1 cache per compute unit (configurable)

* 768 KiB L2 cachein total

* 1.5 GiB to 3 GiB global memory

AMD Radeon HD 7970 ([https://en.wikipedia.org/wiki/Graphics_Core_Next GCN]) <ref>AMD Accelerated Parallel Processing OpenCL Programming Guide rev2.7, Appendix D Device Parameters, Table D.1 Parameters for 7xxx Devices</ref>

* 16 KiB constant cache per four compute units

* 16 KiB L1 cache per compute unit

* 768 KiB L2 cachein total

* 3 GiB to 6 GiB global memory

==AMD Matrix Cores==

: AMD released 2020 its server-class [https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf CDNA] architecture with Matrix Cores which support MFMA (matrix-fused-multiply-add) operations on various data types like INT8, FP16, BF16, FP32. AMD's CDNA 2 architecture adds FP64 optimized throughput for matrix operations. AMD's RDNA 3 architecture features dedicated AI tensor operation ~~accelerators~~acceleration. AMD's CDNA 3 architecture adds support for FP8 and sparse matrix data (sparsity).

==Intel XMX Cores==

: Intel added XMX, Xe Matrix eXtensions, cores to some of the [https://en.wikipedia.org/wiki/~~List_of_Intel_graphics_processing_units~~Intel_Xe Intel Xe] GPU series, like [https://en.wikipedia.org/wiki/Intel_Arc#~~Arc_Alchemist~~ Alchemist Arc Alchemist] and [https://www.intel.com/content/www/us/en/products/sku/232876/intel-data-center-gpu-max-1100/specifications.html Intel Data Center GPU ~~series~~Max Series].

=Host-Device Latencies=

* [https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units List of AMD graphics processing units on Wikipedia]

=== CDNA3 === CDNA3 HPC architecture was unveiled in December, 2023. With MI300A APU model (CPU+GPU+HBM) and MI300X GPU model, both with multi-chip modules design. Featuring Matrix Cores with support for a broad type of precision, as INT8, FP8, BF16, FP16, TF32, FP32, FP64, as well as sparse matrix data (sparsity). Supported by AMD's ROCm open software stack for AMD Instinct accelerators. * [https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/white-papers/amd-cdna-3-white-paper.pdf AMD CDNA3 Whitepaper]* [https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf AMD Instinct MI300/CDNA3 Instruction Set Architecture]* [https://www.amd.com/en/developer/resources/rocm-hub.html AMD ROCm Developer Hub] === Navi 3x ~~RDNA 3~~ RDNA3 === ~~RDNA 3~~ RDNA3 architecture in Radeon RX 7000 series was announced on November 3, 2022, featuring dedicated AI tensor operation ~~accelerators~~acceleration.

* [https://en.wikipedia.org/wiki/Radeon_RX_7000_series AMD Radeon RX 7000 on Wikipedia]

* [https://developer.amd.com/wp-content/resources/RDNA3_Shader_ISA_December2022.pdf RDNA3 Instruction Set Architecture]

=== ~~CDNA 2~~ CDNA2 === ~~CDNA 2~~ CDNA2 architecture in MI200 HPC-GPU with optimized FP64 throughput (matrix and vector), multi-chip-module design and Infinity Fabric was unveiled in November, 2021.

* [https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf AMD CDNA2 Whitepaper]

* [https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf AMD CDNA Whitepaper]

* [https://developer.amd.com/wp-content/resources/CDNA1_Shader_ISA_14December2020.pdf ~~CDNA1~~ CDNA Instruction Set Architecture]

=== Navi 2x ~~RDNA 2~~ RDNA2 === [https://en.wikipedia.org/wiki/RDNA_(microarchitecture)#RDNA_2 ~~RDNA 2~~RDNA2] cards were unveiled on October 28, 2020.

* [https://en.wikipedia.org/wiki/Radeon_RX_6000_series AMD Radeon RX 6000 on Wikipedia]

* [https://developer.amd.com/wp-content/resources/RDNA2_Shader_ISA_November2020.pdf RDNA 2 Instruction Set Architecture]

=== Navi RDNA 1 === [https://en.wikipedia.org/wiki/RDNA_(microarchitecture) RDNA 1] cards were unveiled on July 7, 2019.

* [https://www.amd.com/system/files/documents/rdna-whitepaper.pdf RDNA Whitepaper]

* [https://gpuopen.com/wp-content/uploads/2019/08/RDNA_Architecture_public.pdf Architecture Slide Deck]

* [https://gpuopen.com/wp-content/uploads/2019/08/RDNA_Shader_ISA_5August2019.pdf RDNA Instruction SetArchitecture]

=== Vega GCN 5th gen ===

* [https://www.techpowerup.com/gpu-specs/docs/amd-vega-architecture.pdf Architecture Whitepaper]

* [https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf Vega Instruction SetArchitecture]

=== Polaris GCN 4th gen ===

* [https://www.amd.com/system/files/documents/polaris-whitepaper.pdf Architecture Whitepaper]

* [https://developer.amd.com/wordpress/media/2013/12/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf GCN3/4 Instruction Set Architecture]

=== Southern Islands GCN 1st gen ===

Southern Island cards introduced the [https://en.wikipedia.org/wiki/Graphics_Core_Next GCN] architecture in 2012.

* [https://en.wikipedia.org/wiki/Radeon_HD_7000_series AMD Radeon HD 7000 on Wikipedia]

* [https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/programmer-references/si_programming_guide_v2.pdf Southern Islands Programming Guide]

* [https://developer.amd.com/wordpress/media/2012/12/AMD_Southern_Islands_Instruction_Set_Architecture.pdf Southern Islands Instruction Set Architecture]

== Apple ==

* [https://en.wikipedia.org/wiki/List_of_Intel_graphics_processing_units#Gen12 List of Intel Gen12 GPUs on Wikipedia]

* [https://en.wikipedia.org/wiki/~~List_of_Intel_graphics_processing_units~~Intel_Arc#~~Arc_Alchemist~~ Alchemist Arc Alchemist series on Wikipedia]

==Nvidia==

* [https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units List of Nvidia graphics processing units on Wikipedia]

=== Grace Hopper Superchip ===

The Nvidia GH200 Grace Hopper Superchip was unveiled August, 2023 and combines the Nvidia Grace CPU ([[ARM|ARM v9]]) and Nvidia Hopper GPU architectures via NVLink to deliver a CPU+GPU coherent memory model for accelerated AI and HPC applications.

* [https://resources.nvidia.com/en-us-grace-cpu/grace-hopper-superchip NVIDIA Grace Hopper Superchip Data Sheet]

* [https://resources.nvidia.com/en-us-grace-cpu/nvidia-grace-hopper NVIDIA Grace Hopper Superchip Architecture Whitepaper]

=== Ada Lovelace Architecture ===

The [https://en.wikipedia.org/wiki/Ada_Lovelace_(microarchitecture) Ada Lovelace microarchitecture] was announced on September 20, 2022, featuring 4th-generation Tensor Cores with FP8, FP16, BF16, TF32 and sparsity acceleration.

* [https://images.nvidia.com/aem-dam/Solutions/geforce/ada/nvidia-ada-gpu-architecture.pdf Ada GPU Whitepaper]

* [https://docs.nvidia.com/cuda/ada-tuning-guide/index.html Ada Tuning Guide]

=== Hopper Architecture ===

* [https://resources.nvidia.com/en-us-tensor-core Hopper H100 Whitepaper]

* [https://docs.nvidia.com/cuda/hopper-tuning-guide/index.html Hopper Tuning Guide]

=== Ampere Architecture ===

* [https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf Ampere GA100 Whitepaper]

* [https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf Ampere GA102 Whitepaper]

* [https://docs.nvidia.com/cuda/ampere-tuning-guide/index.html Ampere GPU Architecture Tuning Guide]

=== Turing Architecture ===

[https://en.wikipedia.org/wiki/Turing_(microarchitecture) Turing] cards were first released in 2018. They are the first consumer cores to launch with RTX, for [https://en.wikipedia.org/wiki/Ray_tracing_(graphics) raytracing], features. These are also the first consumer cards to launch with TensorCores used for matrix multiplications to accelerate [[Neural Networks#Convolutional|convolutional neural networks]]. The Turing GTX line of chips do not offer RTX or TensorCores.

* [https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf ~~Architectural~~ Turing Architecture Whitepaper]* [https://docs.nvidia.com/cuda/turing-tuning-guide/index.html Turing Tuning Guide]

=== Volta Architecture ===

[https://en.wikipedia.org/wiki/Volta_(microarchitecture) Volta] cards were released in 2017. They were the first cards to launch with TensorCores, supporting matrix multiplications to accelerate [[Neural Networks#Convolutional|convolutional neural networks]].

* [https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf Volta Architecture Whitepaper]* [https://docs.nvidia.com/cuda/volta-tuning-guide/index.html Volta Tuning Guide]

=== Pascal Architecture ===

[https://en.wikipedia.org/wiki/Pascal_(microarchitecture) Pascal] cards were first released in 2016.

* [https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf Pascal Architecture Whitepaper]* [https://docs.nvidia.com/cuda/pascal-tuning-guide/index.html Pascal Tuning Guide]

=== Maxwell Architecture ===

[https://en.wikipedia.org/wiki/Maxwell(microarchitecture) Maxwell] cards were first released in 2014.

* [https://web.archive.org/web/20170721113746/http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_980_Whitepaper_FINAL.PDF Maxwell Architecture Whitepaper on archiv.org]* [https://docs.nvidia.com/cuda/maxwell-tuning-guide/index.html Maxwell Tuning Guide]

== PowerVR ==

* [https://talkchess.com/forum3/viewtopic.php?f=2&t=77097 GPU rumors 2021] by [[Srdja Matovic]], [[CCC]], April 16, 2021

* [https://www.talkchess.com/forum3/viewtopic.php?f=7&t=79078 Comparison of all known Sliding lookup algorithms <nowiki>[CUDA]</nowiki>] by [[Daniel Infuehr]], [[CCC]], January 08, 2022 » [[Sliding Piece Attacks]]

* [https://talkchess.com/forum3/viewtopic.php?f=7&t=72566&p=955538#p955538 Re: China boosts in silicon...] by [[Srdja Matovic]], [[CCC]], January 13, 2024

=External Links=

Smatovic

422

edits

Changes

GPU

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools