Changes

Jump to: navigation, search

Talk:GPU

1,490 bytes added, 21:29, 27 April 2021
CPW GPU article: new section
== AMD architectures ==
 
My own conclusions are:
 
* TeraScale has VLIW design.
* GCN has 16 wide SIMD, executing a Wavefront of 64 threads over 4 cycles.
* RDNA has 32 wide SIMD, executing a Wavefront:32 over 1 cycle and Wavefront:64 over two cycles.
 
[[User:Smatovic|Smatovic]] ([[User talk:Smatovic|talk]]) 10:16, 22 April 2021 (CEST)
 
== Nvidia architectures ==
Nevertheless, my own conclusions are:
* Tesla has 8 wide SIMD, executing a Warp of 32 threads over 4 cycles. * Fermi has 16 wide SIMD, executing a Warp of 32 threads over 2 cycles. * Kepler is somehow odd, not sure how the compute units are partitioned. * Maxwell and Pascal have 32 wide SIMD, executing a Warp of 32 threads over 1 cycle. * Volta and Turing seem to have 16 wide FPU SIMDs, but my own experiments show 32 wide VALU. [[User:Smatovic|Smatovic]] ([[User talk:Smatovic|talk]]) 10:17, 22 April 2021 (CEST) == SIMD + Scalar Unit == It seems every SIMD unit has one scalar unit on GPU architectures, executing things like branch-conditions or special functions the SIMD ALUs are not capable of. [[User:Smatovic|Smatovic]] ([[User talk:Smatovic|talk]]) 20:21, 22 April 2021 (CEST) == embedded CPU controller == It is not documented in the whitepapers, but it seems that every discrete GPU has an embedded CPU controller (e.g. Nvidia Falcon) who (speculation) launches the kernels.
Fermi has 16 wide SIMD[[User:Smatovic|Smatovic]] ([[User talk:Smatovic|talk]]) 10:36, executing a Warp of 32 threads over 2 cycles.22 April 2021 (CEST)
Kepler is somehow odd, not sure how the compute units are partitioned.== CPW GPU article ==
Maxwell A suggestion of mine, keep this GPU article as an generalized overview of GPUs, with incremental updates for different frameworks and Pascal have 32 wide SIMDarchitectures. GPUs and GPGPU is a moving target with different platforms offering new feature sets, better open own articles for things like GPGPU, SIMT, CUDA, ROCm, executing a Warp of 32 threads over 1 cycleoneAPI, Metal or simply link to Wikipedia containing the newest specs and infos.
Volta and Turing seem to have 16 wide FPU SIMDs[[User:Smatovic|Smatovic]] ([[User talk:Smatovic|talk]]) 21:29, but my own experiments show 32 wide VALU.27 April 2021 (CEST)
422
edits

Navigation menu