VIENNACL CL BLAS AMD Ryzen 9 7945HX testing with a Alienware 0DWD2H (1.13.1 BIOS) and NVIDIA GeForce RTX 4090 Laptop GPU 16GB on cachyos rolling via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2409213-EIRI-240307070&rdt&grr .
VIENNACL CL BLAS Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL OpenCL Compiler File-System Screen Resolution Display Driver Radeon HD 8790M IntelR HD Graphics 4600 HSW GT2 0x416 Intel HD Graphics 4600 HSW GT2 CLANG70 nVidia RTX 4090 mobile Intel Core i5-4300M @ 3.30GHz (2 Cores / 4 Threads) Dell 0VWNW8 (A26 BIOS) Intel Xeon E3-1200 v3/4th 8GB 128GB SAMSUNG SSD PM85 AMD Radeon HD 8790M (1250MHz) Intel Xeon E3-1200 v3/4th Intel I217-LM + Intel Centrino Ultimate-N 6300 cachyos rolling 6.6.2-4-cachyos-lto (x86_64) GNOME Shell 45.1 X Server 1.21.1.9 4.6 Mesa 24.0.0-devel (git-023fa0aa5d) (LLVM 16.0.6 DRM 3.54) OpenCL 1.1 Mesa 24.0.0-devel (git-023fa0aa5d) GCC 13.2.1 20231110 + Clang 16.0.6 + LLVM 16.0.6 + CUDA 12.3 xfs 1920x1080 Intel HD 4600 HSW GT2 2GB (1250MHz) 6.7.6-1-cachyos-rt-bore-lto (x86_64) KDE Plasma 5.27.10 X Server 1.21.1.11 4.6 Mesa 24.0.1-arch1.1 OpenCL 2.0 beignet 1.4 (git-f72309a5) GCC 13.2.1 20230801 + Clang 16.0.6 + LLVM 16.0.6 6.7.9-1-cachyos-rt-bore-lto (x86_64) KDE Plasma 6.0.1 4.6 Mesa 24.0.2-arch1.2 Clang 17.0.6 + GCC 13.2.1 20230801 + LLVM 17.0.6 AMD Ryzen 9 7945HX @ 5.46GHz (16 Cores / 32 Threads) Alienware 0DWD2H (1.13.1 BIOS) AMD Device 14d8 62GB PC SN810 NVMe WDC 2048GB + 4001GB CT4000P3SSD8 NVIDIA GeForce RTX 4090 Laptop GPU 16GB NVIDIA Device 22bb Realtek RTL8125 2.5GbE + Qualcomm QCNFA765 6.11.0-5-cachyos-lto (x86_64) GNOME Shell 47.0 X Server 1.21.1.13 NVIDIA 560.35.03 4.6.0 OpenCL 3.0 CUDA 12.6.65 GCC 14.2.1 20240910 + Clang 18.1.8 + LLVM 18.1.8 + CUDA 12.6 zfs 2560x1600 OpenBenchmarking.org Kernel Details - Radeon HD 8790M: cfg80211.cfg80211_disable_40mhz_24ghz=1 mac80211.minstrel_vht_only=1 - Transparent Huge Pages: always - nVidia RTX 4090 mobile: Transparent Huge Pages: always Environment Details - Radeon HD 8790M: DRI_PRIME=1 NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin" - nVidia RTX 4090 mobile: MUTTER_DEBUG_KMS_THREAD_TYPE=user Compiler Details - Radeon HD 8790M: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - IntelR HD Graphics 4600 HSW GT2 0x416: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - nVidia RTX 4090 mobile: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++,rust --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu Processor Details - Radeon HD 8790M: Scaling Governor: intel_cpufreq performance - CPU Microcode: 0x28 - IntelR HD Graphics 4600 HSW GT2 0x416: Scaling Governor: intel_cpufreq powersave - CPU Microcode: 0x28 - Intel HD Graphics 4600 HSW GT2 CLANG70: Scaling Governor: intel_cpufreq performance - CPU Microcode: 0x28 - nVidia RTX 4090 mobile: Scaling Governor: amd-pstate-epp performance (Boost: Enabled EPP: performance) - CPU Microcode: 0xa601206 Security Details - Radeon HD 8790M: gather_data_sampling: Not affected + itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Mitigation of PTE Inversion; VMX: vulnerable + mds: Vulnerable; SMT vulnerable + meltdown: Vulnerable + mmio_stale_data: Unknown: No mitigations + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers + spectre_v2: Vulnerable IBPB: disabled STIBP: disabled PBRSB-eIBRS: Not affected + srbds: Vulnerable + tsx_async_abort: Not affected - IntelR HD Graphics 4600 HSW GT2 0x416: gather_data_sampling: Not affected + itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Mitigation of PTE Inversion; VMX: conditional cache flushes SMT vulnerable + mds: Mitigation of Clear buffers; SMT vulnerable + meltdown: Mitigation of PTI + mmio_stale_data: Unknown: No mitigations + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: conditional RSB filling PBRSB-eIBRS: Not affected + srbds: Mitigation of Microcode + tsx_async_abort: Not affected - Intel HD Graphics 4600 HSW GT2 CLANG70: gather_data_sampling: Not affected + itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Mitigation of PTE Inversion; VMX: conditional cache flushes SMT vulnerable + mds: Mitigation of Clear buffers; SMT vulnerable + meltdown: Mitigation of PTI + mmio_stale_data: Unknown: No mitigations + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: conditional RSB filling PBRSB-eIBRS: Not affected + srbds: Mitigation of Microcode + tsx_async_abort: Not affected - nVidia RTX 4090 mobile: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected OpenCL Details - nVidia RTX 4090 mobile: GPU Compute Cores: 9728
VIENNACL CL BLAS viennacl: OpenCL BLAS - dGEMM-TT viennacl: OpenCL BLAS - dGEMM-TN viennacl: OpenCL BLAS - dGEMM-NT viennacl: OpenCL BLAS - dGEMM-NN viennacl: OpenCL BLAS - dGEMV-T viennacl: OpenCL BLAS - dGEMV-N viennacl: OpenCL BLAS - dDOT viennacl: OpenCL BLAS - dAXPY viennacl: OpenCL BLAS - dCOPY viennacl: OpenCL BLAS - sCOPY viennacl: OpenCL BLAS - sAXPY viennacl: OpenCL BLAS - sDOT Radeon HD 8790M IntelR HD Graphics 4600 HSW GT2 0x416 Intel HD Graphics 4600 HSW GT2 CLANG70 nVidia RTX 4090 mobile 38.5 37.7 39.7 39.0 22.7 34.5 44.7 40.6 35.5 29.5 31.1 27.3 14.4 13.6 15.6 14.7 13.4 15.1 681 666 637 620 373 196 518 521 469 267 437 375 OpenBenchmarking.org
ViennaCL Test: OpenCL BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TT Radeon HD 8790M nVidia RTX 4090 mobile 150 300 450 600 750 SE +/- 0.17, N = 3 SE +/- 0.54, N = 14 38.5 681.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN Radeon HD 8790M nVidia RTX 4090 mobile 140 280 420 560 700 SE +/- 0.06, N = 3 SE +/- 0.55, N = 14 37.7 666.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NT Radeon HD 8790M nVidia RTX 4090 mobile 140 280 420 560 700 SE +/- 0.13, N = 3 SE +/- 0.46, N = 14 39.7 637.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NN Radeon HD 8790M nVidia RTX 4090 mobile 130 260 390 520 650 SE +/- 0.07, N = 3 SE +/- 0.41, N = 14 39.0 620.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T Radeon HD 8790M nVidia RTX 4090 mobile 80 160 240 320 400 SE +/- 0.07, N = 3 SE +/- 1.41, N = 14 22.7 373.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N Radeon HD 8790M nVidia RTX 4090 mobile 40 80 120 160 200 SE +/- 0.28, N = 3 SE +/- 0.07, N = 14 34.5 196.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT Radeon HD 8790M nVidia RTX 4090 mobile 110 220 330 440 550 SE +/- 0.10, N = 3 SE +/- 0.45, N = 14 44.7 518.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY Radeon HD 8790M nVidia RTX 4090 mobile 110 220 330 440 550 SE +/- 0.03, N = 3 SE +/- 0.33, N = 14 40.6 521.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY Radeon HD 8790M nVidia RTX 4090 mobile 100 200 300 400 500 SE +/- 0.09, N = 3 SE +/- 0.39, N = 14 35.5 469.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY IntelR HD Graphics 4600 HSW GT2 0x416 Intel HD Graphics 4600 HSW GT2 CLANG70 Radeon HD 8790M nVidia RTX 4090 mobile 60 120 180 240 300 SE +/- 0.12, N = 7 SE +/- 0.28, N = 12 SE +/- 0.07, N = 3 SE +/- 4.49, N = 14 13.6 14.6 29.5 267.0
ViennaCL Test: OpenCL BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY IntelR HD Graphics 4600 HSW GT2 0x416 Intel HD Graphics 4600 HSW GT2 CLANG70 Radeon HD 8790M nVidia RTX 4090 mobile 90 180 270 360 450 SE +/- 0.25, N = 7 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.54, N = 14 13.9 13.5 31.1 437.0
ViennaCL Test: OpenCL BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT IntelR HD Graphics 4600 HSW GT2 0x416 Intel HD Graphics 4600 HSW GT2 CLANG70 Radeon HD 8790M nVidia RTX 4090 mobile 80 160 240 320 400 SE +/- 0.09, N = 15 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 SE +/- 1.06, N = 14 15.80 15.00 27.30 375.00
Phoronix Test Suite v10.8.5