Debian Linux GCC 8 Benchmark -mindirect-branch=thunk GCC 8 benchmarking of user-space with -mindirect-branch=thunk and -mindirect-branch=thunk-inline for retpolines. Tests by Michael Larabel for a future article on Phoronix.com.
HTML result view exported from: https://openbenchmarking.org/result/1801161-PTS-DEBIANTE65&rdt&grr .
Debian Linux GCC 8 Benchmark -mindirect-branch=thunk Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Driver OpenGL Compiler File-System Screen Resolution -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline Intel Core i9-7980XE @ 4.40GHz (18 Cores / 36 Threads) ASUS PRIME X299-A (1004 BIOS) Intel Device 2020 16384MB 120GB Force MP500 LLVMpipe Realtek ALC1220 Acer B286HK Intel Connection Debian 9.3 4.15.0-rc8-retpo-underflow (x86_64) 20180115 GNOME Shell 3.22.3 modesetting 1.19.2 3.3 Mesa 13.0.6 Gallium 0.4 (LLVM 3.9 256 bits) GCC 8.0.1 20180115 ext4 3840x2160 OpenBenchmarking.org Environment Details - -mindirect-branch=thunk: CXXFLAGS=-O3-march=native-mindirect-branch=thunk CFLAGS=-O3-march=native-mindirect-branch=thunk - Stock: CXXFLAGS=-O3-march=native CFLAGS=-O3-march=native - -mindirect-branch=thunk-inline: CXXFLAGS=-O3-march=native-mindirect-branch=thunk-inline CFLAGS=-O3-march=native-mindirect-branch=thunk-inline Compiler Details - --disable-multilib --enable-checking=release Disk Details - -mindirect-branch=thunk, Stock: NONE / data=ordered,errors=remount-ro,relatime,rw Processor Details - Scaling Governor: intel_pstate powersave Python Details - -mindirect-branch=thunk, Stock: Python 2.7.13 + Python 3.5.3 Security Details - KPTI Full retpoline with underflow protection Protection
Debian Linux GCC 8 Benchmark -mindirect-branch=thunk redis: SET redis: GET pgbench: Buffer Test - Heavy Contention - Read Write pgbench: Buffer Test - Normal Load - Read Write ffmpeg: H.264 HD To NTSC DV bullet: Convex Trimesh bullet: Prim Trimesh bullet: 1000 Convex bullet: 1000 Stack bullet: 3000 Fall bullet: Raytests stockfish: Total Time tscp: AI Chess Performance hpcg: hpcc: G-Ffte hpcc: G-HPL mpcbench: Multi-Precision Benchmark -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 1280528.47 2187123.33 11147.89 11104.86 13.89 1.23 1.00 4.65 4.57 4.11 2.78 3074 1185357 1.35 5.58564 86.04620 9643 1399046.62 2222262.58 11290.86 11387.09 13.29 1.08 0.92 4.37 4.44 3.88 2.53 2904 1386794 1.38 5.88475 85.93090 10013 1457026.92 2160702.13 10540.60 11460.43 14.13 1.28 1.03 4.93 4.83 6.53 2.89 3228 1116421 1.38 5.56022 85.97547 9830 OpenBenchmarking.org
Redis Test: SET OpenBenchmarking.org Requests Per Second, More Is Better Redis 3.0.1 Test: SET -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 300K 600K 900K 1200K 1500K SE +/- 160590.96, N = 6 SE +/- 34299.85, N = 6 SE +/- 2554.75, N = 3 1280528.47 1399046.62 1457026.92 1. (CC) gcc options: -ggdb -rdynamic -lm -pthread
Redis Test: GET OpenBenchmarking.org Requests Per Second, More Is Better Redis 3.0.1 Test: GET -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 500K 1000K 1500K 2000K 2500K SE +/- 40180.95, N = 6 SE +/- 43689.11, N = 3 SE +/- 41655.36, N = 6 2187123.33 2222262.58 2160702.13 1. (CC) gcc options: -ggdb -rdynamic -lm -pthread
PostgreSQL pgbench Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 10.0 Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Write -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 2K 4K 6K 8K 10K SE +/- 326.63, N = 6 SE +/- 264.90, N = 6 SE +/- 43.65, N = 3 11147.89 11290.86 10540.60 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -fPIC -lpgcommon -lpgport -lpthread -lrt -lcrypt -ldl -lm
PostgreSQL pgbench Scaling: Buffer Test - Test: Normal Load - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 10.0 Scaling: Buffer Test - Test: Normal Load - Mode: Read Write -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 2K 4K 6K 8K 10K SE +/- 276.43, N = 6 SE +/- 221.41, N = 6 SE +/- 63.39, N = 3 11104.86 11387.09 11460.43 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -fPIC -lpgcommon -lpgport -lpthread -lrt -lcrypt -ldl -lm
FFmpeg H.264 HD To NTSC DV OpenBenchmarking.org Seconds, Fewer Is Better FFmpeg 3.3.3 H.264 HD To NTSC DV -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 4 8 12 16 20 SE +/- 0.30, N = 6 SE +/- 0.31, N = 6 SE +/- 0.32, N = 6 13.89 13.29 14.13 1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -ldl -lxcb -lxcb-shm -lxcb-xfixes -lxcb-shape -lasound -lm -llzma -lbz2 -pthread -O3 -march=native -std=c11 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT
Bullet Physics Engine Test: Convex Trimesh OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Convex Trimesh -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 0.288 0.576 0.864 1.152 1.44 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.00, N = 3 1.23 1.08 1.28 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: Prim Trimesh OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Prim Trimesh -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 0.2318 0.4636 0.6954 0.9272 1.159 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 1.00 0.92 1.03 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 1000 Convex OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 1000 Convex -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 1.1093 2.2186 3.3279 4.4372 5.5465 SE +/- 0.11, N = 3 SE +/- 0.17, N = 3 SE +/- 0.06, N = 3 4.65 4.37 4.93 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 1000 Stack OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 1000 Stack -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 1.0868 2.1736 3.2604 4.3472 5.434 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 4.57 4.44 4.83 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 3000 Fall OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 3000 Fall -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 2.39, N = 3 4.11 3.88 6.53 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: Raytests OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Raytests -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 0.6503 1.3006 1.9509 2.6012 3.2515 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 2.78 2.53 2.89 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Stockfish Total Time OpenBenchmarking.org ms, Fewer Is Better Stockfish 2014-11-26 Total Time -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 700 1400 2100 2800 3500 SE +/- 44.96, N = 3 SE +/- 6.11, N = 3 SE +/- 56.05, N = 3 3074 2904 3228 1. (CXX) g++ options: -lpthread -O3 -march=native -fno-exceptions -fno-rtti -ansi -pedantic -msse -msse3 -mpopcnt -flto
TSCP AI Chess Performance OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 300K 600K 900K 1200K 1500K SE +/- 10473.03, N = 5 SE +/- 18404.85, N = 6 SE +/- 17216.49, N = 5 1185357 1386794 1116421 1. (CC) gcc options: -O3 -march=native
High Performance Conjugate Gradient OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.0 -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 0.3105 0.621 0.9315 1.242 1.5525 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 1.35 1.38 1.38
HPC Challenge Test / Class: G-Ffte OpenBenchmarking.org GFLOP/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 1.3241 2.6482 3.9723 5.2964 6.6205 SE +/- 0.01968, N = 3 SE +/- 0.18843, N = 3 SE +/- 0.02991, N = 3 5.58564 5.88475 5.56022 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 2.0.2
HPC Challenge Test / Class: G-Ffte OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 1.3241 2.6482 3.9723 5.2964 6.6205 SE +/- 0.01968, N = 3 SE +/- 0.18843, N = 3 SE +/- 0.02991, N = 3 5.58564 5.88475 5.56022 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 2.0.2
HPC Challenge Test / Class: G-HPL OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-HPL -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 20 40 60 80 100 SE +/- 0.10, N = 3 SE +/- 0.29, N = 3 SE +/- 0.17, N = 3 86.05 85.93 85.98 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 2.0.2
GNU MPC Multi-Precision Benchmark OpenBenchmarking.org Global Score, More Is Better GNU MPC 1.1.0 Multi-Precision Benchmark -mindirect-branch=thunk Stock -mindirect-branch=thunk-inline 2K 4K 6K 8K 10K SE +/- 84.52, N = 3 SE +/- 43.72, N = 3 SE +/- 75.72, N = 3 9643 10013 9830 1. (CC) gcc options: -lm -O3 -march=native -MT -MD -MP -MF
Phoronix Test Suite v10.8.5