Difference between revisions of "Talk:GPU"

From Chessprogramming wiki
Jump to: navigation, search
(SIMT: new section)
(Nvidia architectures: new section)
Line 9: Line 9:
  
 
SIMT is not only about running multiple threads in a Warp resp. Wavefront, it is more about running multiple waves of Warps resp. Wavefronts on the same SIMD unit to hide memory latencies.
 
SIMT is not only about running multiple threads in a Warp resp. Wavefront, it is more about running multiple waves of Warps resp. Wavefronts on the same SIMD unit to hide memory latencies.
 +
 +
== Nvidia architectures ==
 +
 +
Afaik Nvidia did never official mention SIMD in their papers as hardware architecture, with Tesla they only referred to as SIMT.
 +
 +
Nevertheless, my own conclusions are:
 +
 +
Tesla has 8 wide SIMD, executing a Warp of 32 threads in 4 cycles.
 +
 +
Fermi has 16 wide SIMD, executing a Warp of 32 threads in 2 cycles.
 +
 +
Kepler is somehow odd, not sure how the compute units are partitioned.
 +
 +
Maxwell and Pascal have 32 wide SIMD, executing a Warp of 32 threads in 1 cycle.
 +
 +
Volta and Turing seem to have 16 wide FPU SIMDs, but my own experiments show 32 wide VALU.

Revision as of 16:28, 30 December 2019

Heyho, just a minor thing, the notation of Nvidia should be unified.

The official name is Nvidia Corporation, on their webpage they refer as NVIDIA, and in their logo styling nVIDIA, formerly nVidia.

Bests, Srdja

SIMT

SIMT is not only about running multiple threads in a Warp resp. Wavefront, it is more about running multiple waves of Warps resp. Wavefronts on the same SIMD unit to hide memory latencies.

Nvidia architectures

Afaik Nvidia did never official mention SIMD in their papers as hardware architecture, with Tesla they only referred to as SIMT.

Nevertheless, my own conclusions are:

Tesla has 8 wide SIMD, executing a Warp of 32 threads in 4 cycles.

Fermi has 16 wide SIMD, executing a Warp of 32 threads in 2 cycles.

Kepler is somehow odd, not sure how the compute units are partitioned.

Maxwell and Pascal have 32 wide SIMD, executing a Warp of 32 threads in 1 cycle.

Volta and Turing seem to have 16 wide FPU SIMDs, but my own experiments show 32 wide VALU.