Changes

Jump to: navigation, search

GPU

1,330 bytes removed, 22:10, 26 August 2019
Memory
* RX 570
* RX 560
 
=Memory=
The memory hierarchy of an GPU consists in main of private memory (registers accessed by an single thread resp. work-item), local memory (shared by threads of an block resp. work-items of an work-group ), constant memory, different types of cache and global memory. Size, latency and bandwidth vary between vendors and architectures.
 
Here the data for the Nvidia GeForce GTX 580 ([https://en.wikipedia.org/wiki/Fermi_%28microarchitecture%29 Fermi)] as an example: <ref>CUDA C Programming Guide v7.0, Appendix G.COMPUTE CAPABILITIES</ref>
* 128 KiB private memory per compute unit
* 48 KiB (16 KiB) local memory per compute unit (configurable)
* 64 KiB constant memory
* 8 KiB constant cache per compute unit
* 16 KiB (48 KiB) L1 cache per compute unit (configurable)
* 768 KiB L2 cache
* 1.5 GiB to 3 GiB global memory
Here the data for the AMD Radeon HD 7970 ([https://en.wikipedia.org/wiki/Graphics_Core_Next GCN]) as an example: <ref>AMD Accelerated Parallel Processing OpenCL Programming Guide rev2.7, Appendix D Device Parameters, Table D.1 Parameters for 7xxx Devices</ref>
* 256 KiB private memory per compute unit
* 64 KiB local memory per compute unit
* 64 KiB constant memory
* 16 KiB constant cache per four compute units
* 16 KiB L1 cache per compute unit
* 768 KiB L2 cache
* 3 GiB to 6 GiB global memory
=Instruction Throughput=

Navigation menu