ssss Intel Core i9-14900K testing with a ASUS PRIME Z790-P WIFI (1662 BIOS) and XFX AMD Radeon RX 7900 XTX 24GB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2408222-PTS-SSSS158023&grt .
ssss Processor Motherboard Chipset Memory Disk Graphics Audio Monitor OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d e Intel Core i9-14900K @ 5.70GHz (24 Cores / 32 Threads) ASUS PRIME Z790-P WIFI (1662 BIOS) Intel Raptor Lake-S PCH 2 x 16GB DDR5-6000MT/s Corsair CMK32GX5M2B6000C36 Western Digital WD_BLACK SN850X 2000GB XFX AMD Radeon RX 7900 XTX 24GB Realtek ALC897 ASUS VP28U Ubuntu 24.04 6.10.0-061000rc6daily20240706-generic (x86_64) GNOME Shell 46.0 X Server 1.21.1.11 + Wayland 4.6 Mesa 24.2~git2407080600.801ed4~oibaf~n (git-801ed4d 2024-07-08 noble-oibaf-ppa) (LLVM 17.0.6 DRM 3.57) GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x129 - Thermald 2.5.6 Python Details - Python 3.12.3 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Mitigation of Clear Register File + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: SW sequence; BHI: BHI_DIS_S + srbds: Not affected + tsx_async_abort: Not affected
ssss onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard svt-av1: Preset 3 - Bosphorus 4K svt-av1: Preset 5 - Bosphorus 4K svt-av1: Preset 8 - Bosphorus 4K svt-av1: Preset 13 - Bosphorus 4K svt-av1: Preset 3 - Bosphorus 1080p svt-av1: Preset 5 - Bosphorus 1080p svt-av1: Preset 8 - Bosphorus 1080p svt-av1: Preset 13 - Bosphorus 1080p svt-av1: Preset 3 - Beauty 4K 10-bit svt-av1: Preset 5 - Beauty 4K 10-bit svt-av1: Preset 8 - Beauty 4K 10-bit svt-av1: Preset 13 - Beauty 4K 10-bit a b c d e 118.382 8.44437 176.398 5.66499 14.0174 71.3665 17.2377 58.0153 144.924 6.90233 209.093 4.78134 17.3545 57.6288 21.7574 45.9658 313.191 3.19209 972.397 1.02774 2.03157 492.383 2.80524 356.477 9.63175 104.5014 9.49206 105.350 194.152 5.15165 315.352 3.17050 96.1880 10.3968 86.9976 11.4946 1.007155 993.327 1.11819 894.328 46.6357 21.4419 59.5411 16.7942 7.451 27.269 58.880 227.364 23.690 82.460 187.970 688.064 1.330 5.937 8.359 17.995 125.514 7.96304 176.609 5.65885 14.9301 66.9766 17.6406 56.6858 141.471 7.06699 209.048 4.78224 17.7094 56.4661 21.893 45.675 310.172 3.22287 965.837 1.03469 1.98904 502.753 2.81097 355.747 9.05554 110.428 9.47765 105.51 200.329 4.9909 313.466 3.18971 95.6044 10.4591 86.6082 11.5458 0.978795 1021.66 1.12367 889.875 45.2622 22.0917 60.0199 16.6593 7.478 27.592 59.44 228.324 23.741 83.754 188.994 693.231 1.324 5.88 8.501 18.022 114.763 8.70901 176.609 5.65828 15.7981 63.296 17.1352 58.3579 136.25 7.33799 209.112 4.78108 16.361 61.1198 22.0142 45.4239 312.619 3.19755 972.911 1.02726 2.0198 495.095 2.79435 357.864 9.19721 108.727 9.4962 105.304 200.771 4.97989 315.739 3.16663 94.8399 10.5436 87.0258 11.4904 1.04145 960.202 1.12825 886.325 46.7177 21.4036 58.6053 17.0618 7.442 27.388 59.241 227.363 23.809 82.454 190.825 696.738 1.335 5.919 8.485 18.015 120.672 8.28234 176.041 5.67701 15.08 66.3103 17.4064 57.4486 146.627 6.8187 209.89 4.76332 17.458 57.2785 22.0779 45.2925 313.696 3.18653 965.116 1.0355 2.02126 494.738 2.80968 355.91 8.87924 112.621 9.49065 105.366 196.612 5.08514 316.774 3.15625 94.4282 10.5896 87.2427 11.4619 0.999466 1000.53 1.12563 888.389 45.5834 21.9361 59.1651 16.9005 7.474 27.584 60.083 226.982 23.702 83.754 187.625 697.158 1.33 5.991 8.514 17.982 121.398 8.23308 176.424 5.66489 13.4955 74.0973 17.3838 57.5235 138.84 7.20087 209.306 4.7765 18.156 55.0761 21.6207 46.25 307.619 3.24962 960.341 1.04055 2.07055 482.961 2.79826 357.363 9.64652 103.663 9.49647 105.301 192.686 5.18875 312.486 3.19967 93.5546 10.6885 87.4288 11.4375 0.96783 1033.24 1.12443 889.34 46.1047 21.6882 60.2025 16.609 7.446 27.307 58.834 227.699 23.639 82.218 190.467 700.094 1.32 5.906 8.568 18.03 OpenBenchmarking.org
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d e 30 60 90 120 150 SE +/- 1.21, N = 3 118.38 125.51 114.76 120.67 121.40 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d e 2 4 6 8 10 SE +/- 0.08654, N = 3 8.44437 7.96304 8.70901 8.28234 8.23308 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d e 40 80 120 160 200 SE +/- 0.34, N = 3 176.40 176.61 176.61 176.04 176.42 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d e 1.2773 2.5546 3.8319 5.1092 6.3865 SE +/- 0.01074, N = 3 5.66499 5.65885 5.65828 5.67701 5.66489 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel a b c d e 4 8 12 16 20 SE +/- 0.14, N = 5 14.02 14.93 15.80 15.08 13.50 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel a b c d e 16 32 48 64 80 SE +/- 0.73, N = 5 71.37 66.98 63.30 66.31 74.10 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b c d e 4 8 12 16 20 SE +/- 0.11, N = 3 17.24 17.64 17.14 17.41 17.38 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b c d e 13 26 39 52 65 SE +/- 0.36, N = 3 58.02 56.69 58.36 57.45 57.52 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d e 30 60 90 120 150 SE +/- 1.25, N = 8 144.92 141.47 136.25 146.63 138.84 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d e 2 4 6 8 10 SE +/- 0.05998, N = 8 6.90233 7.06699 7.33799 6.81870 7.20087 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d e 50 100 150 200 250 SE +/- 0.23, N = 3 209.09 209.05 209.11 209.89 209.31 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d e 1.076 2.152 3.228 4.304 5.38 SE +/- 0.00534, N = 3 4.78134 4.78224 4.78108 4.76332 4.77650 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d e 4 8 12 16 20 SE +/- 0.15, N = 3 17.35 17.71 16.36 17.46 18.16 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d e 14 28 42 56 70 SE +/- 0.49, N = 3 57.63 56.47 61.12 57.28 55.08 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d e 5 10 15 20 25 SE +/- 0.18, N = 3 21.76 21.89 22.01 22.08 21.62 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d e 10 20 30 40 50 SE +/- 0.37, N = 3 45.97 45.68 45.42 45.29 46.25 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d e 70 140 210 280 350 SE +/- 2.57, N = 3 313.19 310.17 312.62 313.70 307.62 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d e 0.7312 1.4624 2.1936 2.9248 3.656 SE +/- 0.02636, N = 3 3.19209 3.22287 3.19755 3.18653 3.24962 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d e 200 400 600 800 1000 SE +/- 5.46, N = 3 972.40 965.84 972.91 965.12 960.34 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d e 0.2341 0.4682 0.7023 0.9364 1.1705 SE +/- 0.00581, N = 3 1.02774 1.03469 1.02726 1.03550 1.04055 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d e 0.4659 0.9318 1.3977 1.8636 2.3295 SE +/- 0.02543, N = 3 2.03157 1.98904 2.01980 2.02126 2.07055 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d e 110 220 330 440 550 SE +/- 6.17, N = 3 492.38 502.75 495.10 494.74 482.96 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d e 0.6325 1.265 1.8975 2.53 3.1625 SE +/- 0.00600, N = 3 2.80524 2.81097 2.79435 2.80968 2.79826 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d e 80 160 240 320 400 SE +/- 0.76, N = 3 356.48 355.75 357.86 355.91 357.36 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d e 3 6 9 12 15 SE +/- 0.21792, N = 15 9.63175 9.05554 9.19721 8.87924 9.64652 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d e 30 60 90 120 150 SE +/- 2.15, N = 15 104.50 110.43 108.73 112.62 103.66 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d e 3 6 9 12 15 SE +/- 0.00338, N = 3 9.49206 9.47765 9.49620 9.49065 9.49647 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d e 20 40 60 80 100 SE +/- 0.04, N = 3 105.35 105.51 105.30 105.37 105.30 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 40 80 120 160 200 SE +/- 1.93, N = 5 194.15 200.33 200.77 196.61 192.69 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 1.1675 2.335 3.5025 4.67 5.8375 SE +/- 0.05176, N = 5 5.15165 4.99090 4.97989 5.08514 5.18875 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d e 70 140 210 280 350 SE +/- 0.27, N = 3 315.35 313.47 315.74 316.77 312.49 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d e 0.7199 1.4398 2.1597 2.8796 3.5995 SE +/- 0.00266, N = 3 3.17050 3.18971 3.16663 3.15625 3.19967 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d e 20 40 60 80 100 SE +/- 0.71, N = 3 96.19 95.60 94.84 94.43 93.55 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d e 3 6 9 12 15 SE +/- 0.08, N = 3 10.40 10.46 10.54 10.59 10.69 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d e 20 40 60 80 100 SE +/- 0.39, N = 3 87.00 86.61 87.03 87.24 87.43 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d e 3 6 9 12 15 SE +/- 0.05, N = 3 11.49 11.55 11.49 11.46 11.44 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a b c d e 0.2343 0.4686 0.7029 0.9372 1.1715 SE +/- 0.010507, N = 5 1.007155 0.978795 1.041450 0.999466 0.967830 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a b c d e 200 400 600 800 1000 SE +/- 10.36, N = 5 993.33 1021.66 960.20 1000.53 1033.24 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d e 0.2539 0.5078 0.7617 1.0156 1.2695 SE +/- 0.00471, N = 3 1.11819 1.12367 1.12825 1.12563 1.12443 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d e 200 400 600 800 1000 SE +/- 3.76, N = 3 894.33 889.88 886.33 888.39 889.34 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d e 11 22 33 44 55 SE +/- 0.18, N = 3 46.64 45.26 46.72 45.58 46.10 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d e 5 10 15 20 25 SE +/- 0.08, N = 3 21.44 22.09 21.40 21.94 21.69 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d e 13 26 39 52 65 SE +/- 0.28, N = 3 59.54 60.02 58.61 59.17 60.20 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d e 4 8 12 16 20 SE +/- 0.08, N = 3 16.79 16.66 17.06 16.90 16.61 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 4K a b c d e 2 4 6 8 10 SE +/- 0.014, N = 3 7.451 7.478 7.442 7.474 7.446 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 4K a b c d e 6 12 18 24 30 SE +/- 0.03, N = 3 27.27 27.59 27.39 27.58 27.31 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 4K a b c d e 13 26 39 52 65 SE +/- 0.23, N = 3 58.88 59.44 59.24 60.08 58.83 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 4K a b c d e 50 100 150 200 250 SE +/- 0.72, N = 3 227.36 228.32 227.36 226.98 227.70 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 1080p a b c d e 6 12 18 24 30 SE +/- 0.04, N = 3 23.69 23.74 23.81 23.70 23.64 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 1080p a b c d e 20 40 60 80 100 SE +/- 0.23, N = 3 82.46 83.75 82.45 83.75 82.22 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 1080p a b c d e 40 80 120 160 200 SE +/- 1.66, N = 3 187.97 188.99 190.83 187.63 190.47 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 1080p a b c d e 150 300 450 600 750 SE +/- 5.44, N = 3 688.06 693.23 696.74 697.16 700.09 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit a b c d e 0.3004 0.6008 0.9012 1.2016 1.502 SE +/- 0.004, N = 3 1.330 1.324 1.335 1.330 1.320 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit a b c d e 1.348 2.696 4.044 5.392 6.74 SE +/- 0.023, N = 3 5.937 5.880 5.919 5.991 5.906 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit a b c d e 2 4 6 8 10 SE +/- 0.048, N = 3 8.359 8.501 8.485 8.514 8.568 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit a b c d e 4 8 12 16 20 SE +/- 0.03, N = 3 18.00 18.02 18.02 17.98 18.03 1. (CXX) g++ options: -march=native -mno-avx
Phoronix Test Suite v10.8.5