ssss Intel Core i9-14900K testing with a ASUS PRIME Z790-P WIFI (1662 BIOS) and XFX AMD Radeon RX 7900 XTX 24GB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2408222-PTS-SSSS158023&grr .
ssss Processor Motherboard Chipset Memory Disk Graphics Audio Monitor OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d e Intel Core i9-14900K @ 5.70GHz (24 Cores / 32 Threads) ASUS PRIME Z790-P WIFI (1662 BIOS) Intel Raptor Lake-S PCH 2 x 16GB DDR5-6000MT/s Corsair CMK32GX5M2B6000C36 Western Digital WD_BLACK SN850X 2000GB XFX AMD Radeon RX 7900 XTX 24GB Realtek ALC897 ASUS VP28U Ubuntu 24.04 6.10.0-061000rc6daily20240706-generic (x86_64) GNOME Shell 46.0 X Server 1.21.1.11 + Wayland 4.6 Mesa 24.2~git2407080600.801ed4~oibaf~n (git-801ed4d 2024-07-08 noble-oibaf-ppa) (LLVM 17.0.6 DRM 3.57) GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x129 - Thermald 2.5.6 Python Details - Python 3.12.3 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Mitigation of Clear Register File + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: SW sequence; BHI: BHI_DIS_S + srbds: Not affected + tsx_async_abort: Not affected
ssss svt-av1: Preset 3 - Beauty 4K 10-bit svt-av1: Preset 3 - Bosphorus 4K svt-av1: Preset 5 - Beauty 4K 10-bit onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Parallel svt-av1: Preset 3 - Bosphorus 1080p svt-av1: Preset 8 - Beauty 4K 10-bit onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Parallel svt-av1: Preset 5 - Bosphorus 4K onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel svt-av1: Preset 13 - Beauty 4K 10-bit onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Standard svt-av1: Preset 5 - Bosphorus 1080p svt-av1: Preset 8 - Bosphorus 4K svt-av1: Preset 8 - Bosphorus 1080p svt-av1: Preset 13 - Bosphorus 4K svt-av1: Preset 13 - Bosphorus 1080p onnx: ZFNet-512 - CPU - Standard a b c d e 1.330 7.451 5.937 104.5014 9.63175 23.690 8.359 6.90233 144.924 27.269 993.327 1.007155 71.3665 14.0174 5.15165 194.152 17.995 894.328 1.11819 492.383 2.03157 8.44437 118.382 356.477 2.80524 5.66499 176.398 57.6288 17.3545 105.350 9.49206 45.9658 21.7574 4.78134 209.093 58.0153 17.2377 16.7942 59.5411 21.4419 46.6357 3.19209 313.191 1.02774 972.397 3.17050 315.352 10.3968 96.1880 11.4946 86.9976 82.460 58.880 187.970 227.364 688.064 1.324 7.478 5.88 110.428 9.05554 23.741 8.501 7.06699 141.471 27.592 1021.66 0.978795 66.9766 14.9301 4.9909 200.329 18.022 889.875 1.12367 502.753 1.98904 7.96304 125.514 355.747 2.81097 5.65885 176.609 56.4661 17.7094 105.51 9.47765 45.675 21.893 4.78224 209.048 56.6858 17.6406 16.6593 60.0199 22.0917 45.2622 3.22287 310.172 1.03469 965.837 3.18971 313.466 10.4591 95.6044 11.5458 86.6082 83.754 59.44 188.994 228.324 693.231 1.335 7.442 5.919 108.727 9.19721 23.809 8.485 7.33799 136.25 27.388 960.202 1.04145 63.296 15.7981 4.97989 200.771 18.015 886.325 1.12825 495.095 2.0198 8.70901 114.763 357.864 2.79435 5.65828 176.609 61.1198 16.361 105.304 9.4962 45.4239 22.0142 4.78108 209.112 58.3579 17.1352 17.0618 58.6053 21.4036 46.7177 3.19755 312.619 1.02726 972.911 3.16663 315.739 10.5436 94.8399 11.4904 87.0258 82.454 59.241 190.825 227.363 696.738 1.33 7.474 5.991 112.621 8.87924 23.702 8.514 6.8187 146.627 27.584 1000.53 0.999466 66.3103 15.08 5.08514 196.612 17.982 888.389 1.12563 494.738 2.02126 8.28234 120.672 355.91 2.80968 5.67701 176.041 57.2785 17.458 105.366 9.49065 45.2925 22.0779 4.76332 209.89 57.4486 17.4064 16.9005 59.1651 21.9361 45.5834 3.18653 313.696 1.0355 965.116 3.15625 316.774 10.5896 94.4282 11.4619 87.2427 83.754 60.083 187.625 226.982 697.158 1.32 7.446 5.906 103.663 9.64652 23.639 8.568 7.20087 138.84 27.307 1033.24 0.96783 74.0973 13.4955 5.18875 192.686 18.03 889.34 1.12443 482.961 2.07055 8.23308 121.398 357.363 2.79826 5.66489 176.424 55.0761 18.156 105.301 9.49647 46.25 21.6207 4.7765 209.306 57.5235 17.3838 16.609 60.2025 21.6882 46.1047 3.24962 307.619 1.04055 960.341 3.19967 312.486 10.6885 93.5546 11.4375 87.4288 82.218 58.834 190.467 227.699 700.094 OpenBenchmarking.org
SVT-AV1 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit a b c d e 0.3004 0.6008 0.9012 1.2016 1.502 SE +/- 0.004, N = 3 1.330 1.324 1.335 1.330 1.320 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 4K a b c d e 2 4 6 8 10 SE +/- 0.014, N = 3 7.451 7.478 7.442 7.474 7.446 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit a b c d e 1.348 2.696 4.044 5.392 6.74 SE +/- 0.023, N = 3 5.937 5.880 5.919 5.991 5.906 1. (CXX) g++ options: -march=native -mno-avx
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d e 30 60 90 120 150 SE +/- 2.15, N = 15 104.50 110.43 108.73 112.62 103.66 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d e 3 6 9 12 15 SE +/- 0.21792, N = 15 9.63175 9.05554 9.19721 8.87924 9.64652 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 1080p a b c d e 6 12 18 24 30 SE +/- 0.04, N = 3 23.69 23.74 23.81 23.70 23.64 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit a b c d e 2 4 6 8 10 SE +/- 0.048, N = 3 8.359 8.501 8.485 8.514 8.568 1. (CXX) g++ options: -march=native -mno-avx
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d e 2 4 6 8 10 SE +/- 0.05998, N = 8 6.90233 7.06699 7.33799 6.81870 7.20087 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d e 30 60 90 120 150 SE +/- 1.25, N = 8 144.92 141.47 136.25 146.63 138.84 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 4K a b c d e 6 12 18 24 30 SE +/- 0.03, N = 3 27.27 27.59 27.39 27.58 27.31 1. (CXX) g++ options: -march=native -mno-avx
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a b c d e 200 400 600 800 1000 SE +/- 10.36, N = 5 993.33 1021.66 960.20 1000.53 1033.24 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a b c d e 0.2343 0.4686 0.7029 0.9372 1.1715 SE +/- 0.010507, N = 5 1.007155 0.978795 1.041450 0.999466 0.967830 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel a b c d e 16 32 48 64 80 SE +/- 0.73, N = 5 71.37 66.98 63.30 66.31 74.10 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel a b c d e 4 8 12 16 20 SE +/- 0.14, N = 5 14.02 14.93 15.80 15.08 13.50 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 1.1675 2.335 3.5025 4.67 5.8375 SE +/- 0.05176, N = 5 5.15165 4.99090 4.97989 5.08514 5.18875 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 40 80 120 160 200 SE +/- 1.93, N = 5 194.15 200.33 200.77 196.61 192.69 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit a b c d e 4 8 12 16 20 SE +/- 0.03, N = 3 18.00 18.02 18.02 17.98 18.03 1. (CXX) g++ options: -march=native -mno-avx
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d e 200 400 600 800 1000 SE +/- 3.76, N = 3 894.33 889.88 886.33 888.39 889.34 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d e 0.2539 0.5078 0.7617 1.0156 1.2695 SE +/- 0.00471, N = 3 1.11819 1.12367 1.12825 1.12563 1.12443 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d e 110 220 330 440 550 SE +/- 6.17, N = 3 492.38 502.75 495.10 494.74 482.96 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d e 0.4659 0.9318 1.3977 1.8636 2.3295 SE +/- 0.02543, N = 3 2.03157 1.98904 2.01980 2.02126 2.07055 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d e 2 4 6 8 10 SE +/- 0.08654, N = 3 8.44437 7.96304 8.70901 8.28234 8.23308 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d e 30 60 90 120 150 SE +/- 1.21, N = 3 118.38 125.51 114.76 120.67 121.40 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d e 80 160 240 320 400 SE +/- 0.76, N = 3 356.48 355.75 357.86 355.91 357.36 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d e 0.6325 1.265 1.8975 2.53 3.1625 SE +/- 0.00600, N = 3 2.80524 2.81097 2.79435 2.80968 2.79826 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d e 1.2773 2.5546 3.8319 5.1092 6.3865 SE +/- 0.01074, N = 3 5.66499 5.65885 5.65828 5.67701 5.66489 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d e 40 80 120 160 200 SE +/- 0.34, N = 3 176.40 176.61 176.61 176.04 176.42 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d e 14 28 42 56 70 SE +/- 0.49, N = 3 57.63 56.47 61.12 57.28 55.08 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d e 4 8 12 16 20 SE +/- 0.15, N = 3 17.35 17.71 16.36 17.46 18.16 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d e 20 40 60 80 100 SE +/- 0.04, N = 3 105.35 105.51 105.30 105.37 105.30 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d e 3 6 9 12 15 SE +/- 0.00338, N = 3 9.49206 9.47765 9.49620 9.49065 9.49647 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d e 10 20 30 40 50 SE +/- 0.37, N = 3 45.97 45.68 45.42 45.29 46.25 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d e 5 10 15 20 25 SE +/- 0.18, N = 3 21.76 21.89 22.01 22.08 21.62 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d e 1.076 2.152 3.228 4.304 5.38 SE +/- 0.00534, N = 3 4.78134 4.78224 4.78108 4.76332 4.77650 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d e 50 100 150 200 250 SE +/- 0.23, N = 3 209.09 209.05 209.11 209.89 209.31 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b c d e 13 26 39 52 65 SE +/- 0.36, N = 3 58.02 56.69 58.36 57.45 57.52 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b c d e 4 8 12 16 20 SE +/- 0.11, N = 3 17.24 17.64 17.14 17.41 17.38 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d e 4 8 12 16 20 SE +/- 0.08, N = 3 16.79 16.66 17.06 16.90 16.61 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d e 13 26 39 52 65 SE +/- 0.28, N = 3 59.54 60.02 58.61 59.17 60.20 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d e 5 10 15 20 25 SE +/- 0.08, N = 3 21.44 22.09 21.40 21.94 21.69 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d e 11 22 33 44 55 SE +/- 0.18, N = 3 46.64 45.26 46.72 45.58 46.10 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d e 0.7312 1.4624 2.1936 2.9248 3.656 SE +/- 0.02636, N = 3 3.19209 3.22287 3.19755 3.18653 3.24962 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d e 70 140 210 280 350 SE +/- 2.57, N = 3 313.19 310.17 312.62 313.70 307.62 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d e 0.2341 0.4682 0.7023 0.9364 1.1705 SE +/- 0.00581, N = 3 1.02774 1.03469 1.02726 1.03550 1.04055 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d e 200 400 600 800 1000 SE +/- 5.46, N = 3 972.40 965.84 972.91 965.12 960.34 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d e 0.7199 1.4398 2.1597 2.8796 3.5995 SE +/- 0.00266, N = 3 3.17050 3.18971 3.16663 3.15625 3.19967 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d e 70 140 210 280 350 SE +/- 0.27, N = 3 315.35 313.47 315.74 316.77 312.49 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d e 3 6 9 12 15 SE +/- 0.08, N = 3 10.40 10.46 10.54 10.59 10.69 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d e 20 40 60 80 100 SE +/- 0.71, N = 3 96.19 95.60 94.84 94.43 93.55 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d e 3 6 9 12 15 SE +/- 0.05, N = 3 11.49 11.55 11.49 11.46 11.44 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d e 20 40 60 80 100 SE +/- 0.39, N = 3 87.00 86.61 87.03 87.24 87.43 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 1080p a b c d e 20 40 60 80 100 SE +/- 0.23, N = 3 82.46 83.75 82.45 83.75 82.22 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 4K a b c d e 13 26 39 52 65 SE +/- 0.23, N = 3 58.88 59.44 59.24 60.08 58.83 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 1080p a b c d e 40 80 120 160 200 SE +/- 1.66, N = 3 187.97 188.99 190.83 187.63 190.47 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 4K a b c d e 50 100 150 200 250 SE +/- 0.72, N = 3 227.36 228.32 227.36 226.98 227.70 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 1080p a b c d e 150 300 450 600 750 SE +/- 5.44, N = 3 688.06 693.23 696.74 697.16 700.09 1. (CXX) g++ options: -march=native -mno-avx
Phoronix Test Suite v10.8.5