Changes

Jump to: navigation, search

GPU

146 bytes added, 20:23, 17 February 2019
update on instruction throughput section
* 64 bit Integer Performance
: Current GPU [https://en.wikipedia.org/wiki/Processor_register registers] and Vector-[https://en.wikipedia.org/wiki/Arithmetic_logic_unit ALUs] are 32 bit wide and have to emulate 64 bit integer operations.<ref>[https://en.wikichip.org/w/images/a/a1/vega-whitepaper.pdf |AMD Vega White Paper]</ref> <ref>[https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf Nvidia Turing White Paper]</ref> * Mixed Precision Support: Newer architectures like Nvidia [https://en.wikipedia.org/wiki/Turing_(microarchitecture) Turing] and AMD [https://en.wikipedia.org/wiki/AMD_RX_Vega_series Vega] have mixed precision support. Vega doubles the [https://en.wikipedia.org/w/index.php?title=FP16&redirect=no FP16] and quadruples the [https://en.wikipedia.org/wiki/Integer_(computer_science)#Common_integral_data_types INT8] throughput.<ref>[https://en.wikipedia.org/wiki/Graphics_Core_Next#fifth Vega (GCN 5th generation) from Wikipedia]</ref>Turing doubles the FP16 throughput of its [https://en.wikipedia.org/wiki/Floating-point_unit FPUs].<ref>[https://www.anandtech.com/show/13282/nvidia-turing-architecture-deep-dive/4 AnandTech - Nvidia Turing Deep Dive page 4]</ref>
* TensorCores
: With Nvidia [https://en.wikipedia.org/wiki/Volta_(microarchitecture) Volta] series TensorCores were introduced. They offer fp16*fp16+fp32, matrix-multiplication-accumulate-units, used to accelerate neural networks.<ref>[https://on-demand.gputechconf.com/gtc/2017/presentation/s7798-luke-durant-inside-volta.pdf INSIDE VOLTA]</ref> * Mixed Precision Support: Newer architectures like Nvidia [https://en.wikipedia.org/wiki/Turing_(microarchitecture) Turing] and AMD [https://en.wikipedia.org/wiki/AMD_RX_Vega_series Vega] have mixed precision support. Vega doubles the [https://en.wikipedia.org/w/index.php?title=FP16&redirect=no FP16] and quadruples the [https://en.wikipedia.org/wiki/Integer_(computer_science)#Common_integral_data_types INT8] throughput.<ref>[https://en.wikipedia.org/wiki/Graphics_Core_Next#fifth Vega (GCN 5th generation)]</ref>Turing doubles the FP16 throughput of its [https://en.wikipedia.org/wiki/Floating-point_unit FPUs] and its Turings 2nd gen TensorCores support now add FP16, INT8, INT4 optimized computation. <ref>[https://www.anandtech.com/show/13282/nvidia-turing-architecture-deep-dive/6 AnandTech - Nvidia Turing Deep Divepage 6]</ref>
==Throughput Examples==
422
edits

Navigation menu