Debian Linux GCC 8 Benchmark -mindirect-branch=thunk GCC 8 benchmarking of user-space with -mindirect-branch=thunk and -mindirect-branch=thunk-inline for retpolines. Tests by Michael Larabel for a future article on Phoronix.com.
HTML result view exported from: https://openbenchmarking.org/result/1801161-PTS-DEBIANTE65&grt&sor .
Debian Linux GCC 8 Benchmark -mindirect-branch=thunk Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Driver OpenGL Compiler File-System Screen Resolution Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline Intel Core i9-7980XE @ 4.40GHz (18 Cores / 36 Threads) ASUS PRIME X299-A (1004 BIOS) Intel Device 2020 16384MB 120GB Force MP500 LLVMpipe Realtek ALC1220 Acer B286HK Intel Connection Debian 9.3 4.15.0-rc8-retpo-underflow (x86_64) 20180115 GNOME Shell 3.22.3 modesetting 1.19.2 3.3 Mesa 13.0.6 Gallium 0.4 (LLVM 3.9 256 bits) GCC 8.0.1 20180115 ext4 3840x2160 OpenBenchmarking.org Environment Details - Stock: CXXFLAGS=-O3-march=native CFLAGS=-O3-march=native - -mindirect-branch=thunk: CXXFLAGS=-O3-march=native-mindirect-branch=thunk CFLAGS=-O3-march=native-mindirect-branch=thunk - -mindirect-branch=thunk-inline: CXXFLAGS=-O3-march=native-mindirect-branch=thunk-inline CFLAGS=-O3-march=native-mindirect-branch=thunk-inline Compiler Details - --disable-multilib --enable-checking=release Disk Details - Stock, -mindirect-branch=thunk: NONE / data=ordered,errors=remount-ro,relatime,rw Processor Details - Scaling Governor: intel_pstate powersave Python Details - Stock, -mindirect-branch=thunk: Python 2.7.13 + Python 3.5.3 Security Details - KPTI Full retpoline with underflow protection Protection
Debian Linux GCC 8 Benchmark -mindirect-branch=thunk bullet: Raytests bullet: 3000 Fall bullet: 1000 Stack bullet: 1000 Convex bullet: Prim Trimesh bullet: Convex Trimesh ffmpeg: H.264 HD To NTSC DV mpcbench: Multi-Precision Benchmark hpcg: hpcc: G-HPL hpcc: G-Ffte pgbench: Buffer Test - Normal Load - Read Write pgbench: Buffer Test - Heavy Contention - Read Write redis: GET redis: SET stockfish: Total Time tscp: AI Chess Performance Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 2.53 3.88 4.44 4.37 0.92 1.08 13.29 10013 1.38 85.93090 5.88475 11387.09 11290.86 2222262.58 1399046.62 2904 1386794 2.78 4.11 4.57 4.65 1.00 1.23 13.89 9643 1.35 86.04620 5.58564 11104.86 11147.89 2187123.33 1280528.47 3074 1185357 2.89 6.53 4.83 4.93 1.03 1.28 14.13 9830 1.38 85.97547 5.56022 11460.43 10540.60 2160702.13 1457026.92 3228 1116421 OpenBenchmarking.org
Bullet Physics Engine Test: Raytests OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Raytests Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 0.6503 1.3006 1.9509 2.6012 3.2515 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 2.53 2.78 2.89 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 3000 Fall OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 3000 Fall Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 2.39, N = 3 3.88 4.11 6.53 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 1000 Stack OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 1000 Stack Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 1.0868 2.1736 3.2604 4.3472 5.434 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 4.44 4.57 4.83 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 1000 Convex OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 1000 Convex Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 1.1093 2.2186 3.3279 4.4372 5.5465 SE +/- 0.17, N = 3 SE +/- 0.11, N = 3 SE +/- 0.06, N = 3 4.37 4.65 4.93 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: Prim Trimesh OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Prim Trimesh Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 0.2318 0.4636 0.6954 0.9272 1.159 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 0.92 1.00 1.03 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: Convex Trimesh OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Convex Trimesh Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 0.288 0.576 0.864 1.152 1.44 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 1.08 1.23 1.28 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
FFmpeg H.264 HD To NTSC DV OpenBenchmarking.org Seconds, Fewer Is Better FFmpeg 3.3.3 H.264 HD To NTSC DV Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 4 8 12 16 20 SE +/- 0.31, N = 6 SE +/- 0.30, N = 6 SE +/- 0.32, N = 6 13.29 13.89 14.13 1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -ldl -lxcb -lxcb-shm -lxcb-xfixes -lxcb-shape -lasound -lm -llzma -lbz2 -pthread -O3 -march=native -std=c11 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT
GNU MPC Multi-Precision Benchmark OpenBenchmarking.org Global Score, More Is Better GNU MPC 1.1.0 Multi-Precision Benchmark Stock -mindirect-branch=thunk-inline -mindirect-branch=thunk 2K 4K 6K 8K 10K SE +/- 43.72, N = 3 SE +/- 75.72, N = 3 SE +/- 84.52, N = 3 10013 9830 9643 1. (CC) gcc options: -lm -O3 -march=native -MT -MD -MP -MF
High Performance Conjugate Gradient OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.0 -mindirect-branch=thunk-inline Stock -mindirect-branch=thunk 0.3105 0.621 0.9315 1.242 1.5525 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 1.38 1.38 1.35
HPC Challenge Test / Class: G-HPL OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-HPL -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock 20 40 60 80 100 SE +/- 0.10, N = 3 SE +/- 0.17, N = 3 SE +/- 0.29, N = 3 86.05 85.98 85.93 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 2.0.2
HPC Challenge Test / Class: G-Ffte OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 1.3241 2.6482 3.9723 5.2964 6.6205 SE +/- 0.18843, N = 3 SE +/- 0.01968, N = 3 SE +/- 0.02991, N = 3 5.88475 5.58564 5.56022 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 2.0.2
HPC Challenge Test / Class: G-Ffte OpenBenchmarking.org GFLOP/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 1.3241 2.6482 3.9723 5.2964 6.6205 SE +/- 0.18843, N = 3 SE +/- 0.01968, N = 3 SE +/- 0.02991, N = 3 5.88475 5.58564 5.56022 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 2.0.2
PostgreSQL pgbench Scaling: Buffer Test - Test: Normal Load - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 10.0 Scaling: Buffer Test - Test: Normal Load - Mode: Read Write -mindirect-branch=thunk-inline Stock -mindirect-branch=thunk 2K 4K 6K 8K 10K SE +/- 63.39, N = 3 SE +/- 221.41, N = 6 SE +/- 276.43, N = 6 11460.43 11387.09 11104.86 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -fPIC -lpgcommon -lpgport -lpthread -lrt -lcrypt -ldl -lm
PostgreSQL pgbench Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 10.0 Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Write Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 2K 4K 6K 8K 10K SE +/- 264.90, N = 6 SE +/- 326.63, N = 6 SE +/- 43.65, N = 3 11290.86 11147.89 10540.60 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -fPIC -lpgcommon -lpgport -lpthread -lrt -lcrypt -ldl -lm
Redis Test: GET OpenBenchmarking.org Requests Per Second, More Is Better Redis 3.0.1 Test: GET Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 500K 1000K 1500K 2000K 2500K SE +/- 43689.11, N = 3 SE +/- 40180.95, N = 6 SE +/- 41655.36, N = 6 2222262.58 2187123.33 2160702.13 1. (CC) gcc options: -ggdb -rdynamic -lm -pthread
Redis Test: SET OpenBenchmarking.org Requests Per Second, More Is Better Redis 3.0.1 Test: SET -mindirect-branch=thunk-inline Stock -mindirect-branch=thunk 300K 600K 900K 1200K 1500K SE +/- 2554.75, N = 3 SE +/- 34299.85, N = 6 SE +/- 160590.96, N = 6 1457026.92 1399046.62 1280528.47 1. (CC) gcc options: -ggdb -rdynamic -lm -pthread
Stockfish Total Time OpenBenchmarking.org ms, Fewer Is Better Stockfish 2014-11-26 Total Time Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 700 1400 2100 2800 3500 SE +/- 6.11, N = 3 SE +/- 44.96, N = 3 SE +/- 56.05, N = 3 2904 3074 3228 1. (CXX) g++ options: -lpthread -O3 -march=native -fno-exceptions -fno-rtti -ansi -pedantic -msse -msse3 -mpopcnt -flto
TSCP AI Chess Performance OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 300K 600K 900K 1200K 1500K SE +/- 18404.85, N = 6 SE +/- 10473.03, N = 5 SE +/- 17216.49, N = 5 1386794 1185357 1116421 1. (CC) gcc options: -O3 -march=native
Phoronix Test Suite v10.8.5