Changes

← Older edit

Talk:GPU

3,274 bytes added, 14 January

m

→‎AMD architectures

== AMD architectures ==

~~AMD has some kind of NDA in their newest whitepapers, so I will put this into the discussion section...my~~ My own conclusions are:

* TeraScale has VLIW design.

* GCN has 16 wide SIMD, executing a Wavefront of 64 threads over 4 cycles.

* RDNA has 32 wide SIMD, executing a Wavefront:32 over 1 cycle and Wavefront:64 over two cycles.

* CDNA is advanced GCN

[[User:Smatovic|Smatovic]] ([[User talk:Smatovic|talk]]) 1017:1622, ~~22 April 2021~~ 14 January 2024 (~~CEST~~CET)

== Nvidia architectures ==

== SIMD + Scalar Unit ==

It seems every SIMD unit has one scalar unit on GPU architectures, executing ~~things like~~ control flow (branches ~~and~~ , loops) or special functions the SIMD ALUs are not capable of.

[[User:Smatovic|Smatovic]] ([[User talk:Smatovic|talk]]) 1015:2418, ~~22 April 2021~~ 4 January 2023 (~~CEST~~CET)

== embedded CPU controller ==

It is not documented in the ~~white papers~~whitepapers, but it seems that every discrete GPU has an embedded CPU controller (e.g. Nvidia Falcon) who (speculation) launches the kernels.

[[User:Smatovic|Smatovic]] ([[User talk:Smatovic|talk]]) 1110:4836, 18 22 April 2021 (CEST) == GPUs and Duncan's taxonomy ==It is not clear to me how the underlying hardware of GPU SIMD units of architectures with unified shader architecture is realized by different vendors, there is the concept of bit-sliced ALUs, there is the concept of pipelined vector processors, there is the concept of SIMD units with fix bit-width ALUs. The white papers from different vendors leave room for speculation, the different instruction throughputs for higher precision and lower precision too, what is left to the programmer is to do microbenchmarking and make conclusions on their own. https://en.wikipedia.org/wiki/Duncan%27s_taxonomy https://en.wikipedia.org/wiki/Flynn%27s_taxonomy [[User:Smatovic|Smatovic]] ([[User talk:Smatovic|talk]]) 13:58, 16 December 2021 (CET) == CPW GPU article == A suggestion of mine, keep this GPU article as an generalized overview of GPUs, with incremental updates for different frameworks and architectures. GPUs and GPGPU is a moving target with different platforms offering new feature sets, better open own articles for things like GPGPU, SIMT, CUDA, ROCm, oneAPI, Metal or simply link to Wikipedia containing the newest specs and infos. [[User:Smatovic|Smatovic]] ([[User talk:Smatovic|talk]]) 21:29, 27 April 2021 (CEST) == GPGPU architectures ==Regarding GPGPU architectures or frameworks, a link to the architecture white paper, instruction set architecture, programming guide, and link to Wikipedia with a list of the concrete models with specs would be nice, if available. [[User:Smatovic|Smatovic]] ([[User talk:Smatovic|talk]]) 09:21, 25 October 2021 (CEST) == Legacy GPGPU == This article does not cover legacy, pre 2007, GPGPU methods, how to use pixel, vertex, geometry, tessellation and compute shaders via OpenGL or DirectX for GPGPU. I can imagine it is possible to backport a neural network Lc0 backend to a certain DirextX/OpenGL API, but I doubt it has real contemporary relevance (running Lc0 on an SGI Indy or alike). [[User:Smatovic|Smatovic]] ([[User talk:Smatovic|talk]]) 14:09, 14 November 2022 (CET) == Alternative Architectures == There was for example the IBM PowerXCell 8i, used in the IBM Roadrunner super-computer from 2008, the first heterogeneous petaFLOP, a smaller version ran in the PlayStation 3: https://en.wikipedia.org/wiki/Cell_%28processor%29#PowerXCell_8i There was the Intel Larrabee project, a lot of simple x64 cores with AVX-512 vector unit from 2010, later released as Xeon Phi accelerator: https://en.wikipedia.org/wiki/Larrabee_%28microarchitecture%29 https://en.wikipedia.org/wiki/Xeon_Phi There is still the NEC SX Aurora (>=2017), a vector-processor on a PCIe card, descendant from the NEC SX super-computer series as used e.g. in the Earth Simulator super-computer: https://en.wikipedia.org/wiki/NEC_SX-Aurora_TSUBASA There is the Chinese Matrix 2000/3000 many-core accelerator (>=2017), used in the Tianhe super-computer: https://en.wikichip.org/wiki/nudt/matrix-2000 AFAIK, none of the above was used to play computer chess....on the other side: IBM Deep Blue used ASICs:https://www.chessprogramming.org/Deep_Blue Hydra used FPGAs:https://www.chessprogramming.org/Hydra AlphaZero used TPUs:https://www.chessprogramming.org/AlphaZero [[User:Smatovic|Smatovic]] ([[User talk:Smatovic|talk]]) 08:04, 22 September 2023 (CEST)

Smatovic

422

edits

Changes

Talk:GPU

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools