gpuowl cs2 vkfft AMD Ryzen 9 7950X 16-Core testing with a ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) and NVIDIA GeForce RTX 3080 10GB on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402242-PTS-GPUOWLCS40&grw&sor .
gpuowl cs2 vkfft Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Compiler File-System Screen Resolution a b c AMD Ryzen 9 7950X 16-Core @ 5.88GHz (16 Cores / 32 Threads) ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) AMD Device 14d8 2 x 16GB DRAM-6000MT/s G Skill F5-6000J3038F16G 2000GB Samsung SSD 980 PRO 2TB + 4001GB Western Digital WD_BLACK SN850X 4000GB NVIDIA GeForce RTX 3080 10GB NVIDIA GA102 HD Audio DELL U2723QE Intel I225-V + Intel Wi-Fi 6 AX210/AX211/AX411 Ubuntu 23.10 6.7.0-060700-generic (x86_64) GNOME Shell 45.2 X Server 1.21.1.7 NVIDIA 550.40.07 4.6.0 OpenCL 3.0 CUDA 12.4.74 GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xa601203 Graphics Details - BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 94.02.20.00.07 OpenCL Details - GPU Compute Cores: 8704 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Vulnerable: Safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
gpuowl cs2 vkfft opencl-benchmark: INT64 Compute opencl-benchmark: INT32 Compute vkfft: FFT + iFFT C2C 1D batched in half precision opencl-benchmark: FP32 Compute cs2: 1920 x 1080 cs2: 2560 x 1440 vkfft: FFT + iFFT C2C Bluestein in single precision opencl-benchmark: Memory Bandwidth Coalesced Read vkfft: FFT + iFFT C2C 1D batched in double precision vkfft: FFT + iFFT C2C 1D batched in single precision vkfft: FFT + iFFT C2C multidimensional in single precision vkfft: FFT + iFFT C2C Bluestein benchmark in double precision vkfft: FFT + iFFT C2C 1D batched in single precision, no reshuffling opencl-benchmark: INT16 Compute gpuowl: 57885161 opencl-benchmark: INT8 Compute gpuowl: 77936867 opencl-benchmark: Memory Bandwidth Coalesced Write gpuowl: 332220523 cs2: 1920 x 1200 opencl-benchmark: FP64 Compute cs2: 3840 x 2160 vkfft: FFT + iFFT R2C / C2R a b c 3.231 16.861 148147 32.873 308.0 221.9 13286 702.72 25136 113874 46283 3724 116216 14.565 723.24 12.108 532.10 721.83 115.73 291.7 0.528 121.4 51046 3.222 16.666 143220 32.915 311.4 221.3 13225 702.78 23693 113952 47650 3763 116227 14.56 729.39 12.078 536.19 721.79 116.65 292.7 0.531 121.6 50803 3.225 16.921 145097 32.797 309.8 221.1 13358 702.84 25627 113935 48385 3758 116319 14.562 728.86 12.173 532.20 721.72 115.78 293.7 0.527 122.8 49951 OpenBenchmarking.org
ProjectPhysX OpenCL-Benchmark Operation: INT64 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT64 Compute a c b 0.727 1.454 2.181 2.908 3.635 SE +/- 0.009, N = 3 3.231 3.225 3.222 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT32 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT32 Compute c a b 4 8 12 16 20 SE +/- 0.04, N = 3 16.92 16.86 16.67 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
VkFFT Test: FFT + iFFT C2C 1D batched in half precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in half precision a c b 30K 60K 90K 120K 150K SE +/- 1616.73, N = 15 148147 145097 143220 1. (CXX) g++ options: -O3
ProjectPhysX OpenCL-Benchmark Operation: FP32 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP32 Compute b a c 8 16 24 32 40 SE +/- 0.03, N = 3 32.92 32.87 32.80 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
Counter-Strike 2 Resolution: 1920 x 1080 OpenBenchmarking.org Frames Per Second, More Is Better Counter-Strike 2 Resolution: 1920 x 1080 b c a 70 140 210 280 350 SE +/- 0.38, N = 3 311.4 309.8 308.0 MIN: 307.3 / MAX: 308.6
Counter-Strike 2 Resolution: 2560 x 1440 OpenBenchmarking.org Frames Per Second, More Is Better Counter-Strike 2 Resolution: 2560 x 1440 a b c 50 100 150 200 250 SE +/- 0.58, N = 3 221.9 221.3 221.1 MIN: 221.3 / MAX: 223.1
VkFFT Test: FFT + iFFT C2C Bluestein in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein in single precision c a b 3K 6K 9K 12K 15K SE +/- 60.40, N = 3 13358 13286 13225 1. (CXX) g++ options: -O3
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Read OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Read c b a 150 300 450 600 750 SE +/- 0.00, N = 3 702.84 702.78 702.72 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
VkFFT Test: FFT + iFFT C2C 1D batched in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in double precision c a b 5K 10K 15K 20K 25K SE +/- 311.90, N = 11 25627 25136 23693 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision b c a 20K 40K 60K 80K 100K SE +/- 22.70, N = 3 113952 113935 113874 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C multidimensional in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C multidimensional in single precision c b a 10K 20K 30K 40K 50K SE +/- 368.38, N = 15 48385 47650 46283 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein benchmark in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein benchmark in double precision b c a 800 1600 2400 3200 4000 SE +/- 8.17, N = 3 3763 3758 3724 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling c b a 20K 40K 60K 80K 100K SE +/- 85.34, N = 3 116319 116227 116216 1. (CXX) g++ options: -O3
ProjectPhysX OpenCL-Benchmark Operation: INT16 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT16 Compute a c b 4 8 12 16 20 SE +/- 0.00, N = 3 14.57 14.56 14.56 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
GpuOwl Exponent: 57885161 OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.5 Exponent: 57885161 b c a 160 320 480 640 800 SE +/- 0.17, N = 3 729.39 728.86 723.24 1. (CXX) g++ options: -O3 -lgmp -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT8 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT8 Compute c a b 3 6 9 12 15 SE +/- 0.03, N = 3 12.17 12.11 12.08 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
GpuOwl Exponent: 77936867 OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.5 Exponent: 77936867 b c a 120 240 360 480 600 SE +/- 0.09, N = 3 536.19 532.20 532.10 1. (CXX) g++ options: -O3 -lgmp -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Write OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Write a b c 160 320 480 640 800 SE +/- 0.03, N = 3 721.83 721.79 721.72 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
GpuOwl Exponent: 332220523 OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.5 Exponent: 332220523 b c a 30 60 90 120 150 SE +/- 0.01, N = 3 116.65 115.78 115.73 1. (CXX) g++ options: -O3 -lgmp -lOpenCL
Counter-Strike 2 Resolution: 1920 x 1200 OpenBenchmarking.org Frames Per Second, More Is Better Counter-Strike 2 Resolution: 1920 x 1200 c b a 60 120 180 240 300 SE +/- 1.27, N = 3 293.7 292.7 291.7 MIN: 289.4 / MAX: 293.8
ProjectPhysX OpenCL-Benchmark Operation: FP64 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP64 Compute b a c 0.1195 0.239 0.3585 0.478 0.5975 SE +/- 0.001, N = 3 0.531 0.528 0.527 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
Counter-Strike 2 Resolution: 3840 x 2160 OpenBenchmarking.org Frames Per Second, More Is Better Counter-Strike 2 Resolution: 3840 x 2160 c b a 30 60 90 120 150 SE +/- 0.29, N = 3 122.8 121.6 121.4 MIN: 120.9 / MAX: 121.9
VkFFT Test: FFT + iFFT R2C / C2R OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT R2C / C2R a b c 11K 22K 33K 44K 55K SE +/- 351.55, N = 15 51046 50803 49951 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5