onnx 119 AMD Ryzen Threadripper 7980X 64-Cores testing with a System76 Thelio Major (FA Z5 BIOS) and AMD Radeon Pro W7900 on Ubuntu 24.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2408227-PTS-ONNX119192&grs .
onnx 119 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d e AMD Ryzen Threadripper 7980X 64-Cores @ 7.79GHz (64 Cores / 128 Threads) System76 Thelio Major (FA Z5 BIOS) AMD Device 14a4 4 x 32GB DDR5-4800MT/s Micron MTC20F1045S1RC48BA2 1000GB CT1000T700SSD5 AMD Radeon Pro W7900 AMD Device 14cc DELL P2415Q Aquantia AQC113C NBase-T/IEEE + Realtek RTL8125 2.5GbE + Intel Wi-Fi 6E Ubuntu 24.10 6.8.0-31-generic (x86_64) GNOME Shell X Server + Wayland 4.6 Mesa 24.0.9-0ubuntu2 (LLVM 17.0.6 DRM 3.57) GCC 14.2.0 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-F5tscv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-F5tscv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa108105 Python Details - Python 3.12.5 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onnx 119 onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: ZFNet-512 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: yolov4 - CPU - Parallel svt-av1: Preset 13 - Bosphorus 1080p onnx: super-resolution-10 - CPU - Standard onnx: ZFNet-512 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Parallel svt-av1: Preset 8 - Bosphorus 1080p svt-av1: Preset 13 - Bosphorus 4K onnx: ResNet50 v1-12-int8 - CPU - Parallel svt-av1: Preset 3 - Bosphorus 4K onnx: super-resolution-10 - CPU - Parallel svt-av1: Preset 8 - Beauty 4K 10-bit onnx: ResNet101_DUC_HDC-12 - CPU - Standard svt-av1: Preset 5 - Bosphorus 4K onnx: T5 Encoder - CPU - Parallel svt-av1: Preset 8 - Bosphorus 4K onnx: GPT-2 - CPU - Parallel svt-av1: Preset 5 - Bosphorus 1080p svt-av1: Preset 3 - Beauty 4K 10-bit svt-av1: Preset 3 - Bosphorus 1080p svt-av1: Preset 13 - Beauty 4K 10-bit svt-av1: Preset 5 - Beauty 4K 10-bit onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: ZFNet-512 - CPU - Standard onnx: ZFNet-512 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Parallel a b c d e 204.479 454.723 53.4068 38.3476 5.27246 740.001 100.5927 81.7752 6.76854 201.103 28.7259 8.76007 12.6730 12.6039 140.958 44.5600 105.039 1.60987 271.676 231.111 74.2061 13.398 131.313 10.844 2.68248 45.614 335.503 96.210 160.706 119.221 1.864 36.605 19.520 7.775 22.4415 34.8288 372.789 621.226 9.94497 7.61417 4.97187 13.4748 26.0803 79.3400 257.990 3.92686 908.296 1.10358 2.19841 4.88925 78.9209 147.814 7.09442 2.97938 12.2268 18.7297 114.152 189.677 9.51832 6.21628 200.840 460.016 54.5779 36.9658 5.12789 740.790 100.6830 81.4158 6.86262 201.424 28.8766 8.71251 12.7377 12.4756 142.114 44.4566 105.791 1.59999 269.879 233.248 74.3624 13.350 130.489 10.827 2.70145 45.343 338.928 96.423 161.407 119.229 1.857 36.553 19.417 7.801 22.4918 34.6450 370.177 625.060 9.93654 7.66242 4.96437 13.4482 27.0582 80.1594 252.335 4.00404 913.539 1.09825 2.17339 4.99490 78.5721 145.784 7.03694 2.94936 12.2939 18.3266 114.827 195.128 9.45128 6.18911 178.126 442.117 50.7421 38.7816 5.30486 764.003 100.876 82.4953 6.9174 194.343 27.6526 9.05619 13.1796 12.3908 141.231 44.0419 105.59 1.61916 266.006 232.396 73.1208 13.53 130.766 10.852 2.69105 45.662 339.175 96.06 162.175 119.768 1.867 36.664 19.396 7.811 22.7032 36.1605 371.599 617.6 9.91284 7.6459 5.14489 13.6746 25.7836 80.7021 291.444 3.43115 913.494 1.09469 2.26126 5.61236 75.8716 144.559 7.08004 2.9469 12.1186 19.7054 110.418 188.5 9.46853 6.15989 212.81 419.421 54.304 39.2101 5.00717 783.122 105.021 81.6673 7.10593 203.835 27.6139 8.79279 12.9971 12.6112 138.164 45.2883 105.146 1.58526 269.428 228.834 73.4749 13.381 132.221 10.873 2.66966 45.165 335.859 97.046 161.862 118.798 1.874 36.707 19.442 7.809 22.0783 36.2111 374.577 630.805 9.52165 7.5618 4.90503 13.6088 25.5012 79.2916 288.124 3.47069 873.467 1.14486 2.38355 4.69742 76.9365 140.724 7.23734 2.97608 12.2426 18.4128 113.726 199.707 9.50891 6.17185 210.622 440.354 54.4155 36.68 5.11842 749.257 99.4083 78.1552 7.02121 202.692 27.7572 9.08788 12.7428 12.845 138.605 44.1099 107.723 1.6101 268.76 228.97 74.3302 13.413 131.269 10.965 2.69413 45.41 338.703 96.775 162.245 119.923 1.867 36.456 19.441 7.765 22.668 36.0242 371.175 621.076 10.0592 7.61659 4.93285 13.4522 27.2604 77.8483 276.923 3.61108 889.231 1.12456 2.27019 4.74627 78.4717 142.422 7.21415 2.95124 12.7923 18.3751 110.033 195.366 9.28112 6.15756 OpenBenchmarking.org
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d e 50 100 150 200 250 SE +/- 1.28, N = 3 SE +/- 3.47, N = 12 204.48 200.84 178.13 212.81 210.62 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d e 100 200 300 400 500 SE +/- 1.43, N = 3 SE +/- 3.83, N = 3 454.72 460.02 442.12 419.42 440.35 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel a b c d e 12 24 36 48 60 SE +/- 0.75, N = 3 SE +/- 0.73, N = 3 53.41 54.58 50.74 54.30 54.42 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d e 9 18 27 36 45 SE +/- 0.40, N = 3 SE +/- 0.45, N = 3 38.35 36.97 38.78 39.21 36.68 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel a b c d e 1.1936 2.3872 3.5808 4.7744 5.968 SE +/- 0.03706, N = 3 SE +/- 0.03454, N = 15 5.27246 5.12789 5.30486 5.00717 5.11842 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 1080p a b c d e 200 400 600 800 1000 SE +/- 7.99, N = 4 SE +/- 8.40, N = 4 740.00 740.79 764.00 783.12 749.26 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d e 20 40 60 80 100 SE +/- 0.94, N = 6 SE +/- 1.11, N = 5 100.59 100.68 100.88 105.02 99.41 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a b c d e 20 40 60 80 100 SE +/- 0.40, N = 3 SE +/- 0.72, N = 15 81.78 81.42 82.50 81.67 78.16 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d e 2 4 6 8 10 SE +/- 0.05485, N = 9 SE +/- 0.07644, N = 5 6.76854 6.86262 6.91740 7.10593 7.02121 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d e 40 80 120 160 200 SE +/- 1.01, N = 3 SE +/- 1.52, N = 3 201.10 201.42 194.34 203.84 202.69 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d e 7 14 21 28 35 SE +/- 0.21, N = 11 SE +/- 0.32, N = 5 28.73 28.88 27.65 27.61 27.76 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b c d e 3 6 9 12 15 SE +/- 0.01899, N = 3 SE +/- 0.10839, N = 4 8.76007 8.71251 9.05619 8.79279 9.08788 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d e 3 6 9 12 15 SE +/- 0.13, N = 3 SE +/- 0.10, N = 15 12.67 12.74 13.18 13.00 12.74 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d e 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.08, N = 3 12.60 12.48 12.39 12.61 12.85 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d e 30 60 90 120 150 SE +/- 0.88, N = 3 SE +/- 1.08, N = 3 140.96 142.11 141.23 138.16 138.61 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d e 10 20 30 40 50 SE +/- 0.33, N = 3 SE +/- 0.14, N = 3 44.56 44.46 44.04 45.29 44.11 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d e 20 40 60 80 100 SE +/- 0.21, N = 3 SE +/- 0.58, N = 3 105.04 105.79 105.59 105.15 107.72 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a b c d e 0.3643 0.7286 1.0929 1.4572 1.8215 SE +/- 0.01149, N = 3 SE +/- 0.01116, N = 3 1.60987 1.59999 1.61916 1.58526 1.61010 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 1080p a b c d e 60 120 180 240 300 SE +/- 0.46, N = 3 SE +/- 0.65, N = 3 271.68 269.88 266.01 269.43 268.76 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 4K a b c d e 50 100 150 200 250 SE +/- 1.99, N = 8 SE +/- 2.31, N = 3 231.11 233.25 232.40 228.83 228.97 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 20 40 60 80 100 SE +/- 0.25, N = 3 SE +/- 0.72, N = 3 74.21 74.36 73.12 73.47 74.33 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 4K a b c d e 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 13.40 13.35 13.53 13.38 13.41 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d e 30 60 90 120 150 SE +/- 0.20, N = 3 SE +/- 0.45, N = 3 131.31 130.49 130.77 132.22 131.27 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit a b c d e 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 10.84 10.83 10.85 10.87 10.97 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d e 0.6078 1.2156 1.8234 2.4312 3.039 SE +/- 0.00331, N = 3 SE +/- 0.00886, N = 3 2.68248 2.70145 2.69105 2.66966 2.69413 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 4K a b c d e 10 20 30 40 50 SE +/- 0.18, N = 3 SE +/- 0.22, N = 3 45.61 45.34 45.66 45.17 45.41 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d e 70 140 210 280 350 SE +/- 0.69, N = 3 SE +/- 1.56, N = 3 335.50 338.93 339.18 335.86 338.70 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 4K a b c d e 20 40 60 80 100 SE +/- 0.26, N = 3 SE +/- 0.79, N = 3 96.21 96.42 96.06 97.05 96.78 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d e 40 80 120 160 200 SE +/- 0.25, N = 3 SE +/- 0.37, N = 3 160.71 161.41 162.18 161.86 162.25 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 1080p a b c d e 30 60 90 120 150 SE +/- 0.52, N = 3 SE +/- 0.35, N = 3 119.22 119.23 119.77 118.80 119.92 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit a b c d e 0.4217 0.8434 1.2651 1.6868 2.1085 SE +/- 0.002, N = 3 SE +/- 0.004, N = 3 1.864 1.857 1.867 1.874 1.867 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 1080p a b c d e 8 16 24 32 40 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 36.61 36.55 36.66 36.71 36.46 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit a b c d e 5 10 15 20 25 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 19.52 19.42 19.40 19.44 19.44 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit a b c d e 2 4 6 8 10 SE +/- 0.029, N = 3 SE +/- 0.008, N = 3 7.775 7.801 7.811 7.809 7.765 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d e 5 10 15 20 25 SE +/- 0.16, N = 3 SE +/- 0.07, N = 3 22.44 22.49 22.70 22.08 22.67 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d e 8 16 24 32 40 SE +/- 0.26, N = 11 SE +/- 0.39, N = 5 34.83 34.65 36.16 36.21 36.02 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d e 80 160 240 320 400 SE +/- 0.46, N = 3 SE +/- 1.22, N = 3 372.79 370.18 371.60 374.58 371.18 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a b c d e 140 280 420 560 700 SE +/- 4.43, N = 3 SE +/- 4.33, N = 3 621.23 625.06 617.60 630.81 621.08 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d e 3 6 9 12 15 SE +/- 0.08968, N = 6 SE +/- 0.10599, N = 5 9.94497 9.93654 9.91284 9.52165 10.05920 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d e 2 4 6 8 10 SE +/- 0.01157, N = 3 SE +/- 0.02614, N = 3 7.61417 7.66242 7.64590 7.56180 7.61659 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d e 1.1576 2.3152 3.4728 4.6304 5.788 SE +/- 0.02497, N = 3 SE +/- 0.03731, N = 3 4.97187 4.96437 5.14489 4.90503 4.93285 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 4 8 12 16 20 SE +/- 0.04, N = 3 SE +/- 0.13, N = 3 13.47 13.45 13.67 13.61 13.45 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d e 6 12 18 24 30 SE +/- 0.27, N = 3 SE +/- 0.33, N = 3 26.08 27.06 25.78 25.50 27.26 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d e 20 40 60 80 100 SE +/- 0.34, N = 3 SE +/- 0.51, N = 3 79.34 80.16 80.70 79.29 77.85 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d e 60 120 180 240 300 SE +/- 7.70, N = 15 SE +/- 6.89, N = 15 257.99 252.34 291.44 288.12 276.92 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d e 0.9009 1.8018 2.7027 3.6036 4.5045 SE +/- 0.12199, N = 15 SE +/- 0.10773, N = 15 3.92686 4.00404 3.43115 3.47069 3.61108 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d e 200 400 600 800 1000 SE +/- 11.68, N = 15 SE +/- 13.79, N = 15 908.30 913.54 913.49 873.47 889.23 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d e 0.2576 0.5152 0.7728 1.0304 1.288 SE +/- 0.01462, N = 15 SE +/- 0.01709, N = 15 1.10358 1.09825 1.09469 1.14486 1.12456 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d e 0.5363 1.0726 1.6089 2.1452 2.6815 SE +/- 0.00686, N = 3 SE +/- 0.01799, N = 3 2.19841 2.17339 2.26126 2.38355 2.27019 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d e 1.2628 2.5256 3.7884 5.0512 6.314 SE +/- 0.03038, N = 3 SE +/- 0.09195, N = 12 4.88925 4.99490 5.61236 4.69742 4.74627 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d e 20 40 60 80 100 SE +/- 0.83, N = 3 SE +/- 0.63, N = 15 78.92 78.57 75.87 76.94 78.47 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d e 30 60 90 120 150 SE +/- 1.16, N = 9 SE +/- 1.59, N = 5 147.81 145.78 144.56 140.72 142.42 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d e 2 4 6 8 10 SE +/- 0.04465, N = 3 SE +/- 0.05397, N = 3 7.09442 7.03694 7.08004 7.23734 7.21415 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d e 0.6704 1.3408 2.0112 2.6816 3.352 SE +/- 0.00618, N = 3 SE +/- 0.01351, N = 3 2.97938 2.94936 2.94690 2.97608 2.95124 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a b c d e 3 6 9 12 15 SE +/- 0.06, N = 3 SE +/- 0.11, N = 15 12.23 12.29 12.12 12.24 12.79 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel a b c d e 5 10 15 20 25 SE +/- 0.27, N = 3 SE +/- 0.25, N = 3 18.73 18.33 19.71 18.41 18.38 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b c d e 30 60 90 120 150 SE +/- 0.25, N = 3 SE +/- 1.43, N = 4 114.15 114.83 110.42 113.73 110.03 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel a b c d e 40 80 120 160 200 SE +/- 1.34, N = 3 SE +/- 1.30, N = 15 189.68 195.13 188.50 199.71 195.37 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d e 3 6 9 12 15 SE +/- 0.01924, N = 3 SE +/- 0.05188, N = 3 9.51832 9.45128 9.46853 9.50891 9.28112 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d e 2 4 6 8 10 SE +/- 0.00960, N = 3 SE +/- 0.01421, N = 3 6.21628 6.18911 6.15989 6.17185 6.15756 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5