pts-result-gpu-30-05-2024.list 2 x AMD EPYC 7452 32-Core testing with a Supermicro AS-2124GQ-NART H12DSG-Q-CPU6 v1.01 (1.0a BIOS) and NVIDIA A100-SXM4-40GB on CentOS Linux 7 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2407234-NE-PTSRESULT61&rdt&grr .
pts-result-gpu-30-05-2024.list Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Display Server Display Driver Vulkan Compiler File-System Screen Resolution GPU-run-30-05-2024 pts-config-gpu-30-05-2024 2 x AMD EPYC 7452 32-Core @ 2.35GHz (64 Cores) Supermicro AS-2124GQ-NART H12DSG-Q-CPU6 v1.01 (1.0a BIOS) AMD Starship/Matisse 16 x 32 GB DDR4-3200MT/s Samsung M393A4K40DB3-CWE 252GB NVIDIA A100-SXM4-40GB 2 x Intel 10-Gigabit X540-AT2 CentOS Linux 7 5.4.265-1.el7.elrepo.x86_64 (x86_64) X Server NVIDIA 1.3.260 GCC 4.8.5 20150623 + CUDA 12.3 tmpfs 1024x768 GCC 8.3.0 + CUDA 12.3 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: always Compiler Details - GPU-run-30-05-2024: --build=x86_64-redhat-linux --disable-libgcj --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=x86-64 --with-linker-hash-style=gnu --with-tune=generic - pts-config-gpu-30-05-2024: --disable-multilib --enable-languages=c,c++ Processor Details - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0x830107a Graphics Details - BAR1 / Visible vRAM Size: 65536 MiB - vBIOS Version: 92.00.19.00.13 Python Details - GPU-run-30-05-2024: Python 2.7.5 + Python 3.6.8 - pts-config-gpu-30-05-2024: Python 2.7.5 + Python 3.9.7 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Vulnerable + spec_store_bypass: Vulnerable + spectre_v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers + spectre_v2: Vulnerable IBPB: disabled STIBP: disabled PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected Environment Details - pts-config-gpu-30-05-2024: EXTRA_NVCCFLAGS=-cudart=shared
pts-result-gpu-30-05-2024.list ncnn: Vulkan GPU - FastestDet ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU - mobilenet viennacl: CPU BLAS - dGEMM-TT viennacl: CPU BLAS - dGEMM-TN viennacl: CPU BLAS - dGEMM-NT viennacl: CPU BLAS - dGEMM-NN viennacl: CPU BLAS - dGEMV-T viennacl: CPU BLAS - dGEMV-N viennacl: CPU BLAS - dDOT viennacl: CPU BLAS - dAXPY viennacl: CPU BLAS - dCOPY viennacl: CPU BLAS - sDOT viennacl: CPU BLAS - sAXPY viennacl: CPU BLAS - sCOPY hashcat: MD5 clpeak: Integer Compute INT rodinia: OpenCL Particle Filter hashcat: TrueCrypt RIPEMD160 + XTS hashcat: SHA-512 clpeak: Double-Precision Double hashcat: SHA1 mixbench: OpenCL - Double Precision mixbench: NVIDIA CUDA - Double Precision hashcat: 7-Zip mixbench: OpenCL - Single Precision mixbench: NVIDIA CUDA - Half Precision mixbench: NVIDIA CUDA - Single Precision clpeak: Global Memory Bandwidth cl-mem: Copy cl-mem: Write cl-mem: Read mixbench: OpenCL - Integer clpeak: Single-Precision Float mixbench: NVIDIA CUDA - Integer financebench: Black-Scholes OpenCL shoc: OpenCL - S3D GPU-run-30-05-2024 pts-config-gpu-30-05-2024 18.25 156.26 41.64 37.74 70.32 39.73 58.77 17.89 28.99 124.62 38.14 8.97 23.16 15.40 21.59 18.30 18.97 39.73 76.3 85.1 68.8 74.6 450 157.2 627 948 742 418 736 470 170625637500 16043.49 3.302 3299133 12848366667 7979.31 88126433333 7699.84 7814.50 4385933 15389.52 44809.08 15201.95 1300.10 231.7 1242.6 780.7 14801.87 17926.38 12812.49 1.191 OpenBenchmarking.org
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet pts-config-gpu-30-05-2024 4 8 12 16 20 SE +/- 1.79, N = 9 18.25 MIN: 11.3 / MAX: 482.82
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer pts-config-gpu-30-05-2024 30 60 90 120 150 SE +/- 3.66, N = 9 156.26 MIN: 137.55 / MAX: 893.71
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m pts-config-gpu-30-05-2024 10 20 30 40 50 SE +/- 2.44, N = 9 41.64 MIN: 28.21 / MAX: 229.16
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd pts-config-gpu-30-05-2024 9 18 27 36 45 SE +/- 1.52, N = 9 37.74 MIN: 27.11 / MAX: 623.11
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny pts-config-gpu-30-05-2024 16 32 48 64 80 SE +/- 2.29, N = 9 70.32 MIN: 50.39 / MAX: 562.92
NCNN Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 pts-config-gpu-30-05-2024 9 18 27 36 45 SE +/- 1.15, N = 9 39.73 MIN: 29.4 / MAX: 277.23
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 pts-config-gpu-30-05-2024 13 26 39 52 65 SE +/- 1.60, N = 9 58.77 MIN: 44.97 / MAX: 337.56
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet pts-config-gpu-30-05-2024 4 8 12 16 20 SE +/- 1.37, N = 9 17.89 MIN: 13.81 / MAX: 182.2
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 pts-config-gpu-30-05-2024 7 14 21 28 35 SE +/- 1.18, N = 9 28.99 MIN: 19.8 / MAX: 221.98
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 pts-config-gpu-30-05-2024 30 60 90 120 150 SE +/- 5.02, N = 9 124.62 MIN: 66.23 / MAX: 682.25
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet pts-config-gpu-30-05-2024 9 18 27 36 45 SE +/- 1.64, N = 9 38.14 MIN: 27.74 / MAX: 523.7
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface pts-config-gpu-30-05-2024 3 6 9 12 15 SE +/- 0.98, N = 9 8.97 MIN: 4.17 / MAX: 172.12
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 pts-config-gpu-30-05-2024 6 12 18 24 30 SE +/- 2.28, N = 9 23.16 MIN: 12.29 / MAX: 221.56
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet pts-config-gpu-30-05-2024 4 8 12 16 20 SE +/- 1.11, N = 9 15.40 MIN: 8.91 / MAX: 466.02
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 pts-config-gpu-30-05-2024 5 10 15 20 25 SE +/- 1.17, N = 9 21.59 MIN: 12.52 / MAX: 248.51
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 pts-config-gpu-30-05-2024 5 10 15 20 25 SE +/- 1.67, N = 9 18.30 MIN: 9.64 / MAX: 63.36
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 pts-config-gpu-30-05-2024 5 10 15 20 25 SE +/- 3.03, N = 9 18.97 MIN: 10.12 / MAX: 542.78
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet pts-config-gpu-30-05-2024 9 18 27 36 45 SE +/- 1.15, N = 9 39.73 MIN: 29.4 / MAX: 277.23
ViennaCL Test: CPU BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT pts-config-gpu-30-05-2024 20 40 60 80 100 SE +/- 0.79, N = 15 76.3
ViennaCL Test: CPU BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN pts-config-gpu-30-05-2024 20 40 60 80 100 SE +/- 0.59, N = 15 85.1
ViennaCL Test: CPU BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT pts-config-gpu-30-05-2024 15 30 45 60 75 SE +/- 0.48, N = 15 68.8
ViennaCL Test: CPU BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN pts-config-gpu-30-05-2024 20 40 60 80 100 SE +/- 0.41, N = 15 74.6
ViennaCL Test: CPU BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T pts-config-gpu-30-05-2024 100 200 300 400 500 SE +/- 5.98, N = 15 450
ViennaCL Test: CPU BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N pts-config-gpu-30-05-2024 30 60 90 120 150 SE +/- 6.04, N = 15 157.2
ViennaCL Test: CPU BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT pts-config-gpu-30-05-2024 140 280 420 560 700 SE +/- 19.94, N = 15 627
ViennaCL Test: CPU BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY pts-config-gpu-30-05-2024 200 400 600 800 1000 SE +/- 42.05, N = 15 948
ViennaCL Test: CPU BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY pts-config-gpu-30-05-2024 160 320 480 640 800 SE +/- 36.07, N = 15 742
ViennaCL Test: CPU BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT pts-config-gpu-30-05-2024 90 180 270 360 450 SE +/- 1.84, N = 15 418
ViennaCL Test: CPU BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY pts-config-gpu-30-05-2024 160 320 480 640 800 SE +/- 25.06, N = 15 736
ViennaCL Test: CPU BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY pts-config-gpu-30-05-2024 100 200 300 400 500 SE +/- 17.64, N = 15 470
Hashcat Benchmark: MD5 OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: MD5 pts-config-gpu-30-05-2024 40000M 80000M 120000M 160000M 200000M SE +/- 26429817415.14, N = 16 170625637500
clpeak OpenCL Test: Integer Compute INT OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute INT pts-config-gpu-30-05-2024 3K 6K 9K 12K 15K SE +/- 199.00, N = 15 16043.49
Rodinia Test: OpenCL Particle Filter OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenCL Particle Filter pts-config-gpu-30-05-2024 0.743 1.486 2.229 2.972 3.715 SE +/- 0.027, N = 13 3.302 1. (CXX) g++ options: -m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl
Hashcat Benchmark: TrueCrypt RIPEMD160 + XTS OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: TrueCrypt RIPEMD160 + XTS pts-config-gpu-30-05-2024 700K 1400K 2100K 2800K 3500K SE +/- 592.55, N = 3 3299133
Hashcat Benchmark: SHA-512 OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: SHA-512 pts-config-gpu-30-05-2024 3000M 6000M 9000M 12000M 15000M SE +/- 3468589.21, N = 3 12848366667
clpeak OpenCL Test: Double-Precision Double OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Double pts-config-gpu-30-05-2024 2K 4K 6K 8K 10K SE +/- 119.81, N = 12 7979.31
Hashcat Benchmark: SHA1 OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: SHA1 pts-config-gpu-30-05-2024 20000M 40000M 60000M 80000M 100000M SE +/- 104285670.69, N = 3 88126433333
Mixbench Backend: OpenCL - Benchmark: Double Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Double Precision pts-config-gpu-30-05-2024 1700 3400 5100 6800 8500 SE +/- 113.22, N = 15 7699.84 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Double Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Double Precision pts-config-gpu-30-05-2024 2K 4K 6K 8K 10K SE +/- 160.94, N = 15 7814.50 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Hashcat Benchmark: 7-Zip OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: 7-Zip pts-config-gpu-30-05-2024 900K 1800K 2700K 3600K 4500K SE +/- 25606.34, N = 3 4385933
Mixbench Backend: OpenCL - Benchmark: Single Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Single Precision pts-config-gpu-30-05-2024 3K 6K 9K 12K 15K SE +/- 262.04, N = 15 15389.52 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Half Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Half Precision pts-config-gpu-30-05-2024 10K 20K 30K 40K 50K SE +/- 727.84, N = 15 44809.08 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Single Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Single Precision pts-config-gpu-30-05-2024 3K 6K 9K 12K 15K SE +/- 290.53, N = 12 15201.95 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth pts-config-gpu-30-05-2024 300 600 900 1200 1500 SE +/- 3.13, N = 3 1300.10
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy pts-config-gpu-30-05-2024 50 100 150 200 250 SE +/- 0.03, N = 3 231.7 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write pts-config-gpu-30-05-2024 300 600 900 1200 1500 SE +/- 0.45, N = 3 1242.6 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read pts-config-gpu-30-05-2024 200 400 600 800 1000 SE +/- 0.06, N = 3 780.7 1. (CC) gcc options: -O2 -flto -lOpenCL
Mixbench Backend: OpenCL - Benchmark: Integer OpenBenchmarking.org GIOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Integer pts-config-gpu-30-05-2024 3K 6K 9K 12K 15K SE +/- 4.35, N = 3 14801.87 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
clpeak OpenCL Test: Single-Precision Float OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Float pts-config-gpu-30-05-2024 4K 8K 12K 16K 20K SE +/- 165.14, N = 3 17926.38
Mixbench Backend: NVIDIA CUDA - Benchmark: Integer OpenBenchmarking.org GIOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Integer pts-config-gpu-30-05-2024 3K 6K 9K 12K 15K SE +/- 5.64, N = 3 12812.49 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
FinanceBench Benchmark: Black-Scholes OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL pts-config-gpu-30-05-2024 0.268 0.536 0.804 1.072 1.34 SE +/- 0.005, N = 3 1.191 1. (CXX) g++ options: -O3 -march=native -fopenmp
Phoronix Test Suite v10.8.5