more gpu comp

AMD Ryzen 9 7950X 16-Core testing with a ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) and NVIDIA GeForce RTX 4080 16GB on Ubuntu 23.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2402252-PTS-MOREGPUC88&grr&sor.

more gpu compProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen Resolution4080acdAMD Ryzen 9 7950X 16-Core @ 5.88GHz (16 Cores / 32 Threads)ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS)AMD Device 14d82 x 16GB DRAM-6000MT/s G Skill F5-6000J3038F16G2000GB Samsung SSD 980 PRO 2TB + 4001GB Western Digital WD_BLACK SN850X 4000GBNVIDIA GeForce RTX 4080 16GBNVIDIA Device 22bbDELL U2723QEIntel I225-V + Intel Wi-Fi 6 AX210/AX211/AX411Ubuntu 23.106.7.0-060700-generic (x86_64)GNOME Shell 45.2X Server 1.21.1.7NVIDIA 550.40.074.6.0OpenCL 3.0 CUDA 12.4.74GCC 13.2.0ext43840x2160OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa601203Graphics Details- BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.03.0e.00.04OpenCL Details- GPU Compute Cores: 9728Python Details- Python 3.11.6Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Vulnerable: Safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

more gpu compvkfft: FFT + iFFT C2C 1D batched in double precisiongpuowl: 77936867gpuowl: 332220523vkfft: FFT + iFFT C2C Bluestein benchmark in double precisiongpuowl: 57885161vkfft: FFT + iFFT C2C 1D batched in single precisionblender: Barbershop - NVIDIA CUDAvkfft: FFT + iFFT C2C 1D batched in single precision, no reshufflingvkfft: FFT + iFFT C2C Bluestein in single precisionfluidx3d: FP32-FP32libplacebo: gaussianlibplacebo: av1_grain_laplibplacebo: hdr_lutlibplacebo: hdr_peakdetectlibplacebo: polar_nocomputelibplacebo: deband_heavyblender: Barbershop - NVIDIA OptiXvkfft: FFT + iFFT C2C 1D batched in half precisionblender: Pabellon Barcelona - NVIDIA CUDAvkfft: FFT + iFFT C2C multidimensional in single precisionfluidx3d: FP32-FP16Sfluidx3d: FP32-FP16Cblender: Fishy Cat - NVIDIA CUDAvkfft: FFT + iFFT R2C / C2Rblender: Classroom - NVIDIA CUDAblender: Pabellon Barcelona - NVIDIA OptiXopencl-benchmark: Memory Bandwidth Coalesced Writeopencl-benchmark: Memory Bandwidth Coalesced Readopencl-benchmark: INT8 Computeopencl-benchmark: INT16 Computeopencl-benchmark: INT32 Computeopencl-benchmark: INT64 Computeopencl-benchmark: FP32 Computeopencl-benchmark: FP64 Computeblender: Classroom - NVIDIA OptiXblender: Fishy Cat - NVIDIA OptiXblender: BMW27 - NVIDIA CUDAblender: BMW27 - NVIDIA OptiX4080acd27333868.06183.4556291142.8610377361.021051971685238313343.213520.392837.372748.421866.331482.9140.7614057832.3718257694776815.86728014.6111.21604.74652.2620.1223.10326.5664.51651.5450.8310.038.527.845.4332480865.05183.1255991142.8610367761.071050771689538183339.533480.12834.072903.9418661481.2140.72140729684227695776515.857229414.62604.14652.3120.1223.10626.6474.51651.5490.8310.037.77.934.5829102865.05183.0856071142.8610367960.761051061701238363337.953416.932838.242939.61866.691481.7240.7314414632.35706697697777815.836668614.6511.21603.59652.3320.11623.12426.5544.53951.5490.8310.057.677.844.5933512865.05183.0856101142.8610368060.691051131693038373344.453518.332843.732936.071870.991483.340.8814175532.31700897694776315.856213514.6311.16603.94652.320.12523.10326.5674.51451.5570.8310.047.657.934.58OpenBenchmarking.org

VkFFT

Test: FFT + iFFT C2C 1D batched in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in double precisiondac40807K14K21K28K35K335123248029102273331. (CXX) g++ options: -O3

GpuOwl

Exponent: 77936867

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.5Exponent: 779368674080dca2004006008001000868.06865.05865.05865.051. (CXX) g++ options: -O3 -lgmp -lOpenCL

GpuOwl

Exponent: 332220523

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.5Exponent: 3322205234080adc4080120160200183.45183.12183.08183.081. (CXX) g++ options: -O3 -lgmp -lOpenCL

VkFFT

Test: FFT + iFFT C2C Bluestein benchmark in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C Bluestein benchmark in double precision4080dca1200240036004800600056295610560755991. (CXX) g++ options: -O3

GpuOwl

Exponent: 57885161

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.5Exponent: 57885161dca408020040060080010001142.861142.861142.861142.861. (CXX) g++ options: -O3 -lgmp -lOpenCL

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in single precision4080dca20K40K60K80K100K1037731036801036791036771. (CXX) g++ options: -O3

Blender

Blend File: Barbershop - Compute: NVIDIA CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Barbershop - Compute: NVIDIA CUDAdc4080a142842567060.6960.7661.0261.07

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling4080dca20K40K60K80K100K1051971051131051061050771. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C Bluestein in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C Bluestein in single precisioncda40804K8K12K16K20K170121693016895168521. (CXX) g++ options: -O3

FluidX3D

Test: FP32-FP32

OpenBenchmarking.orgMLUPs/s, More Is BetterFluidX3D 2.9Test: FP32-FP32dc4080a80016002400320040003837383638313818

Libplacebo

Test: gaussian

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 6.338.2Test: gaussiand4080ac70014002100280035003344.453343.213339.533337.951. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF

Libplacebo

Test: av1_grain_lap

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 6.338.2Test: av1_grain_lap4080dac80016002400320040003520.393518.333480.103416.931. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF

Libplacebo

Test: hdr_lut

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 6.338.2Test: hdr_lutdc4080a60012001800240030002843.732838.242837.372834.071. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF

Libplacebo

Test: hdr_peakdetect

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 6.338.2Test: hdr_peakdetectcda408060012001800240030002939.602936.072903.942748.421. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF

Libplacebo

Test: polar_nocompute

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 6.338.2Test: polar_nocomputedc4080a4008001200160020001870.991866.691866.331866.001. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF

Libplacebo

Test: deband_heavy

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 6.338.2Test: deband_heavyd4080ca300600900120015001483.301482.911481.721481.211. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF

Blender

Blend File: Barbershop - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Barbershop - Compute: NVIDIA OptiXac4080d91827364540.7240.7340.7640.88

VkFFT

Test: FFT + iFFT C2C 1D batched in half precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in half precisioncda408030K60K90K120K150K1441461417551407291405781. (CXX) g++ options: -O3

Blender

Blend File: Pabellon Barcelona - Compute: NVIDIA CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Pabellon Barcelona - Compute: NVIDIA CUDA4080dc81624324032.3032.3132.35

VkFFT

Test: FFT + iFFT C2C multidimensional in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C multidimensional in single precision4080cda15K30K45K60K75K718257066970089684221. (CXX) g++ options: -O3

FluidX3D

Test: FP32-FP16S

OpenBenchmarking.orgMLUPs/s, More Is BetterFluidX3D 2.9Test: FP32-FP16Scad4080160032004800640080007697769576947694

FluidX3D

Test: FP32-FP16C

OpenBenchmarking.orgMLUPs/s, More Is BetterFluidX3D 2.9Test: FP32-FP16Cc4080ad170034005100680085007778776877657763

Blender

Blend File: Fishy Cat - Compute: NVIDIA CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Fishy Cat - Compute: NVIDIA CUDA4080cad4812162015.8015.8315.8515.85

VkFFT

Test: FFT + iFFT R2C / C2R

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT R2C / C2Ra4080cd15K30K45K60K75K722946728066686621351. (CXX) g++ options: -O3

Blender

Blend File: Classroom - Compute: NVIDIA CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Classroom - Compute: NVIDIA CUDA4080adc4812162014.6114.6214.6314.65

Blender

Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Pabellon Barcelona - Compute: NVIDIA OptiXd4080c369121511.1611.2111.21

ProjectPhysX OpenCL-Benchmark

Operation: Memory Bandwidth Coalesced Write

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: Memory Bandwidth Coalesced Write4080adc130260390520650604.74604.14603.94603.591. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: Memory Bandwidth Coalesced Read

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: Memory Bandwidth Coalesced Readcad4080140280420560700652.33652.31652.30652.261. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT8 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT8 Computeda4080c51015202520.1320.1220.1220.121. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT16 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT16 Computecad408061218243023.1223.1123.1023.101. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT32 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT32 Computead4080c61218243026.6526.5726.5726.551. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT64 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT64 Computeca4080d1.02132.04263.06394.08525.10654.5394.5164.5164.5141. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: FP32 Compute

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: FP32 Computedca4080122436486051.5651.5551.5551.551. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: FP64 Compute

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: FP64 Computedca40800.18680.37360.56040.74720.9340.830.830.830.831. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

Blender

Blend File: Classroom - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Classroom - Compute: NVIDIA OptiX4080adc369121510.0310.0310.0410.05

Blender

Blend File: Fishy Cat - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Fishy Cat - Compute: NVIDIA OptiXdca40802468107.657.677.708.52

Blender

Blend File: BMW27 - Compute: NVIDIA CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: BMW27 - Compute: NVIDIA CUDA4080cad2468107.847.847.937.93

Blender

Blend File: BMW27 - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: BMW27 - Compute: NVIDIA OptiXadc40801.22182.44363.66544.88726.1094.584.584.595.43


Phoronix Test Suite v10.8.5