Changes

Jump to: navigation, search

GPU

1,175 bytes added, 21:51, 9 August 2019
Navi RDNA 1.0
[https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf Architectural Whitepaper]
Turing cards were first released in 2018. They are the first consumer cores to launch with RTX, or raytracing, features. RTX instructions will more quickly traverse an aabb tree to discover ray-intersections with lists of objectsbounding-boxes, accelerating raytracing performance. These are also the first consumer cards to launch with Tensor cores, 4x4 matrix multiplication FP16 instructions to accelerate convolutional neural networks.
* RTX 2080 Ti
* RTX 2060 Super
* RTX 2060
* GTX 1660-- Low-end GPU without Tensor cores or RTX Cores.
=== Volta Architecture ===
[https://gpuopen.com/wp-content/uploads/2019/08/RDNA_Architecture_public.pdf Architecture Slide Deck]
RDNA cards were first released in 2019. RDNA is a major change for AMD cards: the underlying hardware supports both Wave32 and Wave64 gangs of threads. Compute Units have 2x32 wide SIMD units, each of which executes 32 threads per clock tick. A Wave64 workgroup will execute on a single SIMD unit, but over two clock ticks. It should be noted that these Wave32 still have 5 cycles of latency before registers can be reused, so a Wave64 executing over two clock ticks will have fewer stalls than a Wave32.
* Radeon 5700 XT
[https://www.techpowerup.com/gpu-specs/docs/amd-vega-architecture.pdf Architecture Whitepaper]
Vega cards were first released in 2017. Vega is the last in the line of the GCN Architecture: 64 threads per wavefront. Each compute unit contains 4x SIMD units, supporting a total of 40 wavefronts per compute unit (a queue of 10-wavefronts per SIMD Unit). Each SIMD unit contains 16 vALUs for general compute + 1 sALU for branching and constant logic. Each SIMD unit executes the same instruction over four clock ticks (16 vALUs x 4 clock ticks == 64 threads per Wavefront). Vega specifically added Packed FP16 instructions, such as dot-product and packed add and packed multiply. From a programming level, these packed FP16 instructions are SIMD-within-SIMD, each SIMD thread could operate its own SIMD FP16 instruction akin to AVX or SSE from the x64 architecture.
* Radeon VII

Navigation menu