Changes

GPU

343 bytes added, 13:39, 14 November 2022

→‎Hardware Model

A common scheme on GPUs with unified shader architecture is to run multiple threads in [https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads SIMT] fashion and a multitude of SIMT waves on the same [https://en.wikipedia.org/wiki/SIMD SIMD] unit to hide memory latencies. Multiple processing elements (GPU cores) are members of a SIMD unit, multiple SIMD units are coupled to a compute unit, with up to hundreds of compute units present on a discrete GPU. The actual SIMD units may have architecture dependent different numbers of cores (SIMD8, SIMD16, SIMD32), and different computation abilities - floating-point and/or integer with specific bit-width of the FPU/ALU and registers. There is a difference between a vector-processor with variable bit-width and SIMD units with fix bit-width cores. Different architecture white papers from different vendors leave room for speculation about the concrete underlying hardware implementation and the concrete classification as [https://en.wikipedia.org/wiki/Flynn%27s_taxonomy hardware architecture]. Scalar units present in the compute unit perform special functions the SIMD units are not capable of and MMAC units (matrix-multiply-accumulate units) are used to speed up neural networks further.

{| class="wikitable" style="margin:auto"

|+ Vendor Terminology

|-

! AMD Terminology !! Nvidia Terminology

|-

| Compute Unit || Streaming Multiprocessor

|-

| Stream Core || CUDA Core

|-

| Wavefront || Warp

|}

===Hardware Examples===

* 16 SMs - Streaming Multiprocessors

* organized in 2x16 CUDA cores per SM

* Warp size of 32 threads ~~(number of SIMT threads)~~

AMD Radeon HD 7970 ([https://en.wikipedia.org/wiki/Graphics_Core_Next GCN)]<ref>[https://en.wikipedia.org/wiki/Graphics_Core_Next Graphics Core Next on Wikipedia]</ref><ref>[https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units#Radeon_HD_7000_series Radeon HD 7000 series on Wikipedia]</ref>

* 2048 ~~stream~~ Stream cores @0.925GHz

* 32 Compute Units

* organized in 4xSIMD16/, each SIMT4 , per Compute Unit* Wavefront size of 64 work-items ~~(number~~ ===Wavefront and Warp===Generalized the definition of the Wavefront and Warp size is the amount of threads executed in SIMT ~~threads)~~fashion on a GPU with unified shader architecture.

=Programming Model=

Smatovic

422

edits

Changes

GPU

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools