Debian Linux GCC 8 Benchmark -mindirect-branch=thunk GCC 8 benchmarking of user-space with -mindirect-branch=thunk and -mindirect-branch=thunk-inline for retpolines. Tests by Michael Larabel for a future article on Phoronix.com.
HTML result view exported from: https://openbenchmarking.org/result/1801161-PTS-DEBIANTE65&grw&sor .
Debian Linux GCC 8 Benchmark -mindirect-branch=thunk Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Driver OpenGL Compiler File-System Screen Resolution Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline Intel Core i9-7980XE @ 4.40GHz (18 Cores / 36 Threads) ASUS PRIME X299-A (1004 BIOS) Intel Device 2020 16384MB 120GB Force MP500 LLVMpipe Realtek ALC1220 Acer B286HK Intel Connection Debian 9.3 4.15.0-rc8-retpo-underflow (x86_64) 20180115 GNOME Shell 3.22.3 modesetting 1.19.2 3.3 Mesa 13.0.6 Gallium 0.4 (LLVM 3.9 256 bits) GCC 8.0.1 20180115 ext4 3840x2160 OpenBenchmarking.org Environment Details - Stock: CXXFLAGS=-O3-march=native CFLAGS=-O3-march=native - -mindirect-branch=thunk: CXXFLAGS=-O3-march=native-mindirect-branch=thunk CFLAGS=-O3-march=native-mindirect-branch=thunk - -mindirect-branch=thunk-inline: CXXFLAGS=-O3-march=native-mindirect-branch=thunk-inline CFLAGS=-O3-march=native-mindirect-branch=thunk-inline Compiler Details - --disable-multilib --enable-checking=release Disk Details - Stock, -mindirect-branch=thunk: NONE / data=ordered,errors=remount-ro,relatime,rw Processor Details - Scaling Governor: intel_pstate powersave Python Details - Stock, -mindirect-branch=thunk: Python 2.7.13 + Python 3.5.3 Security Details - KPTI Full retpoline with underflow protection Protection
Debian Linux GCC 8 Benchmark -mindirect-branch=thunk bullet: Raytests bullet: 3000 Fall bullet: 1000 Stack bullet: 1000 Convex bullet: Prim Trimesh bullet: Convex Trimesh tscp: AI Chess Performance hpcc: G-HPL hpcc: G-Ffte hpcg: stockfish: Total Time ffmpeg: H.264 HD To NTSC DV redis: GET redis: SET pgbench: Buffer Test - Normal Load - Read Write mpcbench: Multi-Precision Benchmark pgbench: Buffer Test - Heavy Contention - Read Write Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 2.53 3.88 4.44 4.37 0.92 1.08 1386794 85.93090 5.88475 1.38 2904 13.29 2222262.58 1399046.62 11387.09 10013 11290.86 2.78 4.11 4.57 4.65 1.00 1.23 1185357 86.04620 5.58564 1.35 3074 13.89 2187123.33 1280528.47 11104.86 9643 11147.89 2.89 6.53 4.83 4.93 1.03 1.28 1116421 85.97547 5.56022 1.38 3228 14.13 2160702.13 1457026.92 11460.43 9830 10540.60 OpenBenchmarking.org
Bullet Physics Engine Test: Raytests OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Raytests Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 0.6503 1.3006 1.9509 2.6012 3.2515 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 2.53 2.78 2.89 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 3000 Fall OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 3000 Fall Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 2.39, N = 3 3.88 4.11 6.53 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 1000 Stack OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 1000 Stack Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 1.0868 2.1736 3.2604 4.3472 5.434 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 4.44 4.57 4.83 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 1000 Convex OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 1000 Convex Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 1.1093 2.2186 3.3279 4.4372 5.5465 SE +/- 0.17, N = 3 SE +/- 0.11, N = 3 SE +/- 0.06, N = 3 4.37 4.65 4.93 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: Prim Trimesh OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Prim Trimesh Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 0.2318 0.4636 0.6954 0.9272 1.159 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 0.92 1.00 1.03 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: Convex Trimesh OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Convex Trimesh Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 0.288 0.576 0.864 1.152 1.44 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 1.08 1.23 1.28 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
TSCP AI Chess Performance OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 300K 600K 900K 1200K 1500K SE +/- 18404.85, N = 6 SE +/- 10473.03, N = 5 SE +/- 17216.49, N = 5 1386794 1185357 1116421 1. (CC) gcc options: -O3 -march=native
HPC Challenge Test / Class: G-HPL OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-HPL -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock 20 40 60 80 100 SE +/- 0.10, N = 3 SE +/- 0.17, N = 3 SE +/- 0.29, N = 3 86.05 85.98 85.93 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 2.0.2
HPC Challenge Test / Class: G-Ffte OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 1.3241 2.6482 3.9723 5.2964 6.6205 SE +/- 0.18843, N = 3 SE +/- 0.01968, N = 3 SE +/- 0.02991, N = 3 5.88475 5.58564 5.56022 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 2.0.2
High Performance Conjugate Gradient OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.0 -mindirect-branch=thunk-inline Stock -mindirect-branch=thunk 0.3105 0.621 0.9315 1.242 1.5525 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 1.38 1.38 1.35
Stockfish Total Time OpenBenchmarking.org ms, Fewer Is Better Stockfish 2014-11-26 Total Time Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 700 1400 2100 2800 3500 SE +/- 6.11, N = 3 SE +/- 44.96, N = 3 SE +/- 56.05, N = 3 2904 3074 3228 1. (CXX) g++ options: -lpthread -O3 -march=native -fno-exceptions -fno-rtti -ansi -pedantic -msse -msse3 -mpopcnt -flto
FFmpeg H.264 HD To NTSC DV OpenBenchmarking.org Seconds, Fewer Is Better FFmpeg 3.3.3 H.264 HD To NTSC DV Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 4 8 12 16 20 SE +/- 0.31, N = 6 SE +/- 0.30, N = 6 SE +/- 0.32, N = 6 13.29 13.89 14.13 1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -ldl -lxcb -lxcb-shm -lxcb-xfixes -lxcb-shape -lasound -lm -llzma -lbz2 -pthread -O3 -march=native -std=c11 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT
Redis Test: GET OpenBenchmarking.org Requests Per Second, More Is Better Redis 3.0.1 Test: GET Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 500K 1000K 1500K 2000K 2500K SE +/- 43689.11, N = 3 SE +/- 40180.95, N = 6 SE +/- 41655.36, N = 6 2222262.58 2187123.33 2160702.13 1. (CC) gcc options: -ggdb -rdynamic -lm -pthread
Redis Test: SET OpenBenchmarking.org Requests Per Second, More Is Better Redis 3.0.1 Test: SET -mindirect-branch=thunk-inline Stock -mindirect-branch=thunk 300K 600K 900K 1200K 1500K SE +/- 2554.75, N = 3 SE +/- 34299.85, N = 6 SE +/- 160590.96, N = 6 1457026.92 1399046.62 1280528.47 1. (CC) gcc options: -ggdb -rdynamic -lm -pthread
PostgreSQL pgbench Scaling: Buffer Test - Test: Normal Load - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 10.0 Scaling: Buffer Test - Test: Normal Load - Mode: Read Write -mindirect-branch=thunk-inline Stock -mindirect-branch=thunk 2K 4K 6K 8K 10K SE +/- 63.39, N = 3 SE +/- 221.41, N = 6 SE +/- 276.43, N = 6 11460.43 11387.09 11104.86 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -fPIC -lpgcommon -lpgport -lpthread -lrt -lcrypt -ldl -lm
GNU MPC Multi-Precision Benchmark OpenBenchmarking.org Global Score, More Is Better GNU MPC 1.1.0 Multi-Precision Benchmark Stock -mindirect-branch=thunk-inline -mindirect-branch=thunk 2K 4K 6K 8K 10K SE +/- 43.72, N = 3 SE +/- 75.72, N = 3 SE +/- 84.52, N = 3 10013 9830 9643 1. (CC) gcc options: -lm -O3 -march=native -MT -MD -MP -MF
PostgreSQL pgbench Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 10.0 Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Write Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 2K 4K 6K 8K 10K SE +/- 264.90, N = 6 SE +/- 326.63, N = 6 SE +/- 43.65, N = 3 11290.86 11147.89 10540.60 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -fPIC -lpgcommon -lpgport -lpthread -lrt -lcrypt -ldl -lm
HPC Challenge Test / Class: G-Ffte OpenBenchmarking.org GFLOP/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 1.3241 2.6482 3.9723 5.2964 6.6205 SE +/- 0.18843, N = 3 SE +/- 0.01968, N = 3 SE +/- 0.02991, N = 3 5.88475 5.58564 5.56022 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 2.0.2
Phoronix Test Suite v10.8.5