Changes

GPU

111 bytes removed, 11:41, 28 November 2020

m

→‎Integer Instruction Throughput

* INT8

: Some architectures like ~~Nvidia~~ AMD [https://en.wikipedia.org/wiki/~~Turing_(microarchitecture) Turing~~AMD_RX_Vega_series Vega] ~~and AMD~~ or Intel [https://en.wikipedia.org/wiki/~~AMD_RX_Vega_series Vega~~Intel_Xe Xe] offer higher throughput with lower precision. ~~Vega doubles~~ They double the [https://en.wikipedia.org/wiki/FP16 FP16] and ~~quadruples~~ quadruple the [https://en.wikipedia.org/wiki/Integer_(computer_science)#Common_integral_data_types INT8] throughput.<ref>[https://en.wikipedia.org/wiki/Graphics_Core_Next#fifth Vega (GCN 5th generation) from Wikipedia]</ref>~~Turing doubles the FP16 throughput of its [https://en.wikipedia.org/wiki/Floating-point_unit FPUs].~~<ref>[https://www.~~anandtech~~servethehome.com/~~show~~intel-xe-sg1-hp-and-dg1-at-architecture-day-2020/~~13282/nvidia~~intel-architecture-day-2020-~~turing~~xe-~~architecture~~lp-~~deep~~int8-~~dive~~increase/~~4 AnandTech~~ xe- ~~Nvidia Turing Deep Dive page 4~~lp-int8 from servethehome]</ref>

==Floating Point Instruction Throughput==

==Tensors==

===Nvidia TensorCores===

: With Nvidia [https://en.wikipedia.org/wiki/Volta_(microarchitecture) Volta] series TensorCores were introduced. They offer ~~fp16*fp16~~FP16xFP16+~~fp32~~FP32, matrix-multiplication-accumulate-units, used to accelerate neural networks.<ref>[https://on-demand.gputechconf.com/gtc/2017/presentation/s7798-luke-durant-inside-volta.pdf INSIDE VOLTA]</ref> Turing's 2nd gen TensorCores add FP16, INT8, INT4 optimized computation.<ref>[https://www.anandtech.com/show/13282/nvidia-turing-architecture-deep-dive/6 AnandTech - Nvidia Turing Deep Dive page 6]</ref> Amperes's 3rd gen adds support for ~~bfloat16~~BF16, ~~TensorFloat-32 (~~TF32), FP64 and sparsity acceleration.<ref>[https://en.wikipedia.org/wiki/Ampere_(microarchitecture)#Details Wikipedia - Ampere microarchitecture]</ref>

===AMD Matrix Cores===

: AMD released 2020 its server-class [https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf CDNA] architecture with Matrix Cores which support MFMA, ~~Matrix~~matrix-~~Fused~~fused-~~Multiply~~multiply-~~Add~~add, operations on various data types like INT8, FP16, BF16, FP32.

===Intel XMX Cores===

Smatovic

422

edits

Changes

GPU

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools