Changes

GPU

1,340 bytes removed, 13:57, 16 December 2021

→‎Instruction Throughput

* FP16

: Some GPGPU architectures offer half-precision (16-bit) floating-point operation throughput with an FP32:FP16 ratio of 1:2.

~~==Tensors==~~

~~===Nvidia TensorCores===~~

: With Nvidia [https://en.wikipedia.org/wiki/Volta_(microarchitecture) Volta] series TensorCores were introduced. They offer FP16xFP16+FP32, matrix-multiplication-accumulate-units, used to accelerate neural networks.<ref>[https://on-demand.gputechconf.com/gtc/2017/presentation/s7798-luke-durant-inside-volta.pdf INSIDE VOLTA]</ref> Turing's 2nd gen TensorCores add FP16, INT8, INT4 optimized computation.<ref>[https://www.anandtech.com/show/13282/nvidia-turing-architecture-deep-dive/6 AnandTech - Nvidia Turing Deep Dive page 6]</ref> Amperes's 3rd gen adds support for BF16, TF32, FP64 and sparsity acceleration.<ref>[https://en.wikipedia.org/wiki/Ampere_(microarchitecture)#Details Wikipedia - Ampere microarchitecture]</ref>

~~===AMD Matrix Cores===~~

: AMD released 2020 its server-class [https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf CDNA] architecture with Matrix Cores which support MFMA (matrix-fused-multiply-add) operations on various data types like INT8, FP16, BF16, FP32. AMD's CDNA 2 architecture adds FP64 optimized throughput for matrix operations.

~~===Intel XMX Cores===~~

~~: Intel plans XMX, Xe Matrix eXtensions, for its upcoming [https://www.anandtech.com/show/15973/the-intel-xelp-gpu-architecture-deep-dive-building-up-from-the-bottom/4 Xe discrete GPU] series.~~

==Throughput Examples==

Smatovic

415

edits

Changes

GPU

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools