Changes

GPU

1,260 bytes added, 20:35, 9 August 2019

→‎Grids and NDRange

== Grids and NDRange ==

~~CUDA~~ While warps, blocks, wavefronts and workgroups are concepts that the machine executes... Grids and ~~OpenCL NDRange is~~ NDRanges are the ~~end~~ scope of ~~scaling for~~ the ~~programming model~~problem specified by a programmer. ~~Many blocks can be specified in~~ For example, a pixel-shader executing over a 1920x1080 screen will have 2,073,600 pixels to process. GPUs are designed such that each of these pixels could get its own thread of execution. Specifying these 2,073,600 work items is the purpose of a CUDA Grid~~, while many workgroups operate over an~~ or OpenCL NDRange.

~~The underlying hardware supports running many workgroups in parallel, across different compute units~~A typical midrange GPU will "only" be able to process tens-of-thousands of threads at a time. ~~An AMD Vega64 has 64 compute units for example~~In practice, ~~while an NVidia RTX 2070 has 40 symmetric multiprocessors~~the device driver will cut up a Grid or NDRange (usually consisting of millions of items) into Blocks or Workgroups. ~~The hardware scheduler can fit many~~ These blocks and workgroups ~~per compute unit~~will execute with as much parallel processing as the underlying hardware can support (maybe 10,000 at a time on a midrange GPU). The ~~exact number is dependent on~~ device driver will implicitly iterate these blocks over the entire Grid or NDRange to complete the task the ~~amount of registers, memory~~programmer has specified, ~~and wavefronts~~ similar to a ~~particular workgroup uses~~for-loop.

~~CUDA~~ Grids and ~~OpenCL~~ NDRanges can be 1-dimensional, 2-dimensional, or 3-dimensional. 2-dimensional grids are common for screen-space operation such as pixel shaders. While 3-dimensional grids are useful for specifying many operations per pixel (such as a raytracer, which may ~~operate in parallel~~launch 5000 rays per pixel). The most important note is that Grids and NDRanges may not execute concurrently with each other. Some degree of sequential processing may happen. As such, communication across a Grid or NDRange is difficult to achieve (If thread #0 creates a Spinlock or ~~may be traversed sequentially if~~ Mutex waiting for thread #1000000 to communicate with it, modern hardware will probably never have the two threads executing concurrently with each other and the code would deadlock). In practice, the easiest mechanism for Grid or NDRange sized synchronization is to wait for the ~~GPU doesn't~~ kernel to finish executing, and to have ~~enough parallel resources~~the CPU split tasks as appropriate. CPUs and GPUs can easily work as a team to accomplish their tasks.

= Architectures and Physical Hardware =

== AMD ==

=== Navi RDNA 1.0 ===

[https://gpuopen.com/wp-content/uploads/2019/08/RDNA_Architecture_public.pdf Architecture Slide Deck]

RDNA cards were first released in 2019. RDNA is a major change for AMD cards: the underlying hardware supports both Wave32 and Wave64 gangs of threads.

* Radeon 5700 XT* Radeon 5700

=== Vega GCN 5th gen ===

PercivalTiglao

85

edits

Changes

GPU

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools