Changes

Jump to: navigation, search

GPU

2,590 bytes added, 16:31, 9 August 2019
Inside
The challenge of GPU Compute Languages is to provide the programmer the flexibility to take advantage of memory optimizations at the CUDA Block or OpenCL Workgroup level (~1024 threads), while still being able to specify the tens-of-thousands of physical threads working on the typical GPU.
 
= Architectures and Physical Hardware =
 
Each generation, the manufacturers create a series of cards, with set vRAM and SIMD Cores. The market is split into three categories: server, professional, and consumer. Consumer cards are cheapest and are primarily targeted for the video game market. Professional cards have better driver support for 3d programs. Finally, server cards provide virtualization services, allowing cloud companies to virtually split their cards between customers.
 
While server and professional cards have more vRAM, consumer cards are best for starting GPU programming.
 
GPUs use high-bandwidth RAM, such as GDDR6 or HBM2. These specialized RAM are designed for the extremely parallel nature of GPUs, and can provide 200GBps to 1000GBps throughput. In comparison: a typical DDR4 channel can provide 20GBps. A dual channel desktop will typically have under 50GBps bandwidth to DDR4 main memory.
 
== NVidia ==
 
NVidia's consumer line of cards is Geforce, branded with RTX or GTX labels. Nvidia's professional line of cards is Quadro. Finally, Tesla cards constitute NVidia's server line.
 
NVidia's "Titan" line of Geforce cards are technically consumer cards, but internally are using professional or server class chips. As such, the Titan line can cost anywhere from $1000 to $3000 per card.
 
=== Turing Architecture ===
 
Turing cards were first released in 2018. They are the first consumer cores to launch with RTX, or raytracing, features. RTX instructions will more quickly traverse an aabb tree to discover ray-intersections with lists of objects. These are also the first consumer cards to launch with Tensor cores, 4x4 matrix multiplication FP16 instructions to accelerate convolutional neural networks.
 
* RTX 2080 Ti
* RTX 2080
* RTX 2070 Ti
* RTX 2070 Super
* RTX 2060 Super
* RTX 2060
* GTX 1660
 
=== Volta Architecture ===
 
Volta cards were released in 2018. Only Tesla and Titan cards were produced in this generation, constituting the highest end of the market. They were the first cards to launch with Tensor cores, supporting 4x4 FP16 matrix multiplications to accelerate convolutional neural networks.
 
* Tesla V100
* Titan V
 
== Pascal Architecture ==
 
Pascal cards were first released in 2016.
 
* GTX 1080 Ti
* GTX 1080
* GTX 1070 Ti
* GTX 1060
* GTX 1050
* GTX 1030
 
== AMD ==
 
== RDNA 1.0 ==
 
RDNA is a major change for AMD cards: the underlying hardware supports both Wave32 and Wave64 gangs of threads.
 
* 5700 XT
* 5700
 
== Vega GCN 5th gen ==
 
* Radeon VII
* Vega64
* Vega56
* MI25
 
== Polaris GCN 4th gen ==
 
* RX 580
* RX 570
* RX 560
=Inside=

Navigation menu