GPU Benchmarks for Deep Learning

YoloV5 Inference GPU Benchmarks

Visualization

Metric

Precision

Methods

Model

RECORD_NAME	Relative Latency
RTX 8000	1
3080	0.94
A100 80GB PCIe	0.73
RTX A6000	0.7

GPU Benchmark Methodology

To measure the relative effectiveness of GPUs when it comes to training neural networks we’ve chosen training throughput as the measuring stick. Training throughput measures the number of samples (e.g. tokens, images, etc...) processed per second by the GPU.

Using throughput instead of Floating Point Operations per Second (FLOPS) brings GPU performance into the realm of training neural networks. Training throughput is strongly correlated with time to solution — since with high training throughput, the GPU can run a dataset more quickly through the model and teach it faster.

In order to maximize training throughput it’s important to saturate GPU resources with large batch sizes, switch to faster GPUs, or parallelize training with multiple GPUs. Additionally, it’s also important to test throughput using state of the art (SOTA) model implementations across frameworks as it can be affected by model implementation.

PyTorch®

We are working on new benchmarks using the same software version across all GPUs. First AI's PyTorch® benchmark code is available

The 2023 benchmarks used using NGC's PyTorch® 22.10 docker image with Ubuntu 20.04, PyTorch® 1.13.0a0+d0d6b1f, CUDA 11.8.0, cuDNN 8.6.0.163, NVIDIA driver 520.61.05, and our of NVIDIA's .

The 2022 benchmarks used using NGC's PyTorch® 21.07 docker image with Ubuntu 20.04, PyTorch® 1.10.0a0+ecc3718, CUDA 11.4.0, cuDNN 8.2.2.26, NVIDIA driver 470, and NVIDIA's optimized model implementations in side of the NGC container.

Get a Quote

Fill out the form below and we'll be in touch shortly

Deep Learning GPU Benchmarks

YoloV5 Inference GPU Benchmarks

GPU Benchmark Methodology

PyTorch®

PyTorch® is a registered trademark of The Linux Foundation. https://pytorch.org/

YoloV5