ssss Intel Core i9-14900K testing with a ASUS PRIME Z790-P WIFI (1662 BIOS) and XFX AMD Radeon RX 7900 XTX 24GB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2408222-PTS-SSSS158023&grs&rdt .
ssss Processor Motherboard Chipset Memory Disk Graphics Audio Monitor OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d e Intel Core i9-14900K @ 5.70GHz (24 Cores / 32 Threads) ASUS PRIME Z790-P WIFI (1662 BIOS) Intel Raptor Lake-S PCH 2 x 16GB DDR5-6000MT/s Corsair CMK32GX5M2B6000C36 Western Digital WD_BLACK SN850X 2000GB XFX AMD Radeon RX 7900 XTX 24GB Realtek ALC897 ASUS VP28U Ubuntu 24.04 6.10.0-061000rc6daily20240706-generic (x86_64) GNOME Shell 46.0 X Server 1.21.1.11 + Wayland 4.6 Mesa 24.2~git2407080600.801ed4~oibaf~n (git-801ed4d 2024-07-08 noble-oibaf-ppa) (LLVM 17.0.6 DRM 3.57) GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x129 - Thermald 2.5.6 Python Details - Python 3.12.3 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Mitigation of Clear Register File + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: SW sequence; BHI: BHI_DIS_S + srbds: Not affected + tsx_async_abort: Not affected
ssss onnx: yolov4 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: T5 Encoder - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard svt-av1: Preset 8 - Beauty 4K 10-bit svt-av1: Preset 8 - Bosphorus 4K onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel svt-av1: Preset 5 - Beauty 4K 10-bit svt-av1: Preset 5 - Bosphorus 1080p svt-av1: Preset 13 - Bosphorus 1080p svt-av1: Preset 8 - Bosphorus 1080p onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard svt-av1: Preset 5 - Bosphorus 4K svt-av1: Preset 3 - Beauty 4K 10-bit onnx: super-resolution-10 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Standard svt-av1: Preset 3 - Bosphorus 1080p onnx: fcn-resnet101-11 - CPU - Standard svt-av1: Preset 13 - Bosphorus 4K svt-av1: Preset 3 - Bosphorus 4K onnx: T5 Encoder - CPU - Standard onnx: GPT-2 - CPU - Standard svt-av1: Preset 13 - Beauty 4K 10-bit onnx: ArcFace ResNet-100 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Parallel a b c d e 14.0174 17.3545 118.382 144.924 1.007155 194.152 2.03157 46.6357 17.2377 96.1880 59.5411 8.359 58.880 21.7574 313.191 5.937 82.460 688.064 187.970 315.352 972.397 27.269 1.330 86.9976 1.11819 23.690 2.80524 227.364 7.451 209.093 176.398 17.995 9.49206 16.7942 21.4419 894.328 993.327 11.4946 10.3968 3.17050 5.15165 105.350 104.5014 9.63175 356.477 492.383 1.02774 3.19209 45.9658 57.6288 4.78134 6.90233 58.0153 71.3665 5.66499 8.44437 14.9301 17.7094 125.514 141.471 0.978795 200.329 1.98904 45.2622 17.6406 95.6044 60.0199 8.501 59.44 21.893 310.172 5.88 83.754 693.231 188.994 313.466 965.837 27.592 1.324 86.6082 1.12367 23.741 2.81097 228.324 7.478 209.048 176.609 18.022 9.47765 16.6593 22.0917 889.875 1021.66 11.5458 10.4591 3.18971 4.9909 105.51 110.428 9.05554 355.747 502.753 1.03469 3.22287 45.675 56.4661 4.78224 7.06699 56.6858 66.9766 5.65885 7.96304 15.7981 16.361 114.763 136.25 1.04145 200.771 2.0198 46.7177 17.1352 94.8399 58.6053 8.485 59.241 22.0142 312.619 5.919 82.454 696.738 190.825 315.739 972.911 27.388 1.335 87.0258 1.12825 23.809 2.79435 227.363 7.442 209.112 176.609 18.015 9.4962 17.0618 21.4036 886.325 960.202 11.4904 10.5436 3.16663 4.97989 105.304 108.727 9.19721 357.864 495.095 1.02726 3.19755 45.4239 61.1198 4.78108 7.33799 58.3579 63.296 5.65828 8.70901 15.08 17.458 120.672 146.627 0.999466 196.612 2.02126 45.5834 17.4064 94.4282 59.1651 8.514 60.083 22.0779 313.696 5.991 83.754 697.158 187.625 316.774 965.116 27.584 1.33 87.2427 1.12563 23.702 2.80968 226.982 7.474 209.89 176.041 17.982 9.49065 16.9005 21.9361 888.389 1000.53 11.4619 10.5896 3.15625 5.08514 105.366 112.621 8.87924 355.91 494.738 1.0355 3.18653 45.2925 57.2785 4.76332 6.8187 57.4486 66.3103 5.67701 8.28234 13.4955 18.156 121.398 138.84 0.96783 192.686 2.07055 46.1047 17.3838 93.5546 60.2025 8.568 58.834 21.6207 307.619 5.906 82.218 700.094 190.467 312.486 960.341 27.307 1.32 87.4288 1.12443 23.639 2.79826 227.699 7.446 209.306 176.424 18.03 9.49647 16.609 21.6882 889.34 1033.24 11.4375 10.6885 3.19967 5.18875 105.301 103.663 9.64652 357.363 482.961 1.04055 3.24962 46.25 55.0761 4.7765 7.20087 57.5235 74.0973 5.66489 8.23308 OpenBenchmarking.org
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel a b c d e 4 8 12 16 20 SE +/- 0.14, N = 5 14.02 14.93 15.80 15.08 13.50 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d e 4 8 12 16 20 SE +/- 0.15, N = 3 17.35 17.71 16.36 17.46 18.16 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d e 30 60 90 120 150 SE +/- 1.21, N = 3 118.38 125.51 114.76 120.67 121.40 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d e 30 60 90 120 150 SE +/- 1.25, N = 8 144.92 141.47 136.25 146.63 138.84 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a b c d e 0.2343 0.4686 0.7029 0.9372 1.1715 SE +/- 0.010507, N = 5 1.007155 0.978795 1.041450 0.999466 0.967830 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 40 80 120 160 200 SE +/- 1.93, N = 5 194.15 200.33 200.77 196.61 192.69 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d e 0.4659 0.9318 1.3977 1.8636 2.3295 SE +/- 0.02543, N = 3 2.03157 1.98904 2.01980 2.02126 2.07055 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d e 11 22 33 44 55 SE +/- 0.18, N = 3 46.64 45.26 46.72 45.58 46.10 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b c d e 4 8 12 16 20 SE +/- 0.11, N = 3 17.24 17.64 17.14 17.41 17.38 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d e 20 40 60 80 100 SE +/- 0.71, N = 3 96.19 95.60 94.84 94.43 93.55 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d e 13 26 39 52 65 SE +/- 0.28, N = 3 59.54 60.02 58.61 59.17 60.20 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit a b c d e 2 4 6 8 10 SE +/- 0.048, N = 3 8.359 8.501 8.485 8.514 8.568 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 4K a b c d e 13 26 39 52 65 SE +/- 0.23, N = 3 58.88 59.44 59.24 60.08 58.83 1. (CXX) g++ options: -march=native -mno-avx
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d e 5 10 15 20 25 SE +/- 0.18, N = 3 21.76 21.89 22.01 22.08 21.62 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d e 70 140 210 280 350 SE +/- 2.57, N = 3 313.19 310.17 312.62 313.70 307.62 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit a b c d e 1.348 2.696 4.044 5.392 6.74 SE +/- 0.023, N = 3 5.937 5.880 5.919 5.991 5.906 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 1080p a b c d e 20 40 60 80 100 SE +/- 0.23, N = 3 82.46 83.75 82.45 83.75 82.22 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 1080p a b c d e 150 300 450 600 750 SE +/- 5.44, N = 3 688.06 693.23 696.74 697.16 700.09 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 1080p a b c d e 40 80 120 160 200 SE +/- 1.66, N = 3 187.97 188.99 190.83 187.63 190.47 1. (CXX) g++ options: -march=native -mno-avx
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d e 70 140 210 280 350 SE +/- 0.27, N = 3 315.35 313.47 315.74 316.77 312.49 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d e 200 400 600 800 1000 SE +/- 5.46, N = 3 972.40 965.84 972.91 965.12 960.34 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 4K a b c d e 6 12 18 24 30 SE +/- 0.03, N = 3 27.27 27.59 27.39 27.58 27.31 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit a b c d e 0.3004 0.6008 0.9012 1.2016 1.502 SE +/- 0.004, N = 3 1.330 1.324 1.335 1.330 1.320 1. (CXX) g++ options: -march=native -mno-avx
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d e 20 40 60 80 100 SE +/- 0.39, N = 3 87.00 86.61 87.03 87.24 87.43 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d e 0.2539 0.5078 0.7617 1.0156 1.2695 SE +/- 0.00471, N = 3 1.11819 1.12367 1.12825 1.12563 1.12443 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 1080p a b c d e 6 12 18 24 30 SE +/- 0.04, N = 3 23.69 23.74 23.81 23.70 23.64 1. (CXX) g++ options: -march=native -mno-avx
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d e 0.6325 1.265 1.8975 2.53 3.1625 SE +/- 0.00600, N = 3 2.80524 2.81097 2.79435 2.80968 2.79826 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 4K a b c d e 50 100 150 200 250 SE +/- 0.72, N = 3 227.36 228.32 227.36 226.98 227.70 1. (CXX) g++ options: -march=native -mno-avx
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 4K a b c d e 2 4 6 8 10 SE +/- 0.014, N = 3 7.451 7.478 7.442 7.474 7.446 1. (CXX) g++ options: -march=native -mno-avx
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d e 50 100 150 200 250 SE +/- 0.23, N = 3 209.09 209.05 209.11 209.89 209.31 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d e 40 80 120 160 200 SE +/- 0.34, N = 3 176.40 176.61 176.61 176.04 176.42 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit a b c d e 4 8 12 16 20 SE +/- 0.03, N = 3 18.00 18.02 18.02 17.98 18.03 1. (CXX) g++ options: -march=native -mno-avx
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d e 3 6 9 12 15 SE +/- 0.00338, N = 3 9.49206 9.47765 9.49620 9.49065 9.49647 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d e 4 8 12 16 20 SE +/- 0.08, N = 3 16.79 16.66 17.06 16.90 16.61 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d e 5 10 15 20 25 SE +/- 0.08, N = 3 21.44 22.09 21.40 21.94 21.69 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d e 200 400 600 800 1000 SE +/- 3.76, N = 3 894.33 889.88 886.33 888.39 889.34 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a b c d e 200 400 600 800 1000 SE +/- 10.36, N = 5 993.33 1021.66 960.20 1000.53 1033.24 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d e 3 6 9 12 15 SE +/- 0.05, N = 3 11.49 11.55 11.49 11.46 11.44 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d e 3 6 9 12 15 SE +/- 0.08, N = 3 10.40 10.46 10.54 10.59 10.69 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d e 0.7199 1.4398 2.1597 2.8796 3.5995 SE +/- 0.00266, N = 3 3.17050 3.18971 3.16663 3.15625 3.19967 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 1.1675 2.335 3.5025 4.67 5.8375 SE +/- 0.05176, N = 5 5.15165 4.99090 4.97989 5.08514 5.18875 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d e 20 40 60 80 100 SE +/- 0.04, N = 3 105.35 105.51 105.30 105.37 105.30 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d e 30 60 90 120 150 SE +/- 2.15, N = 15 104.50 110.43 108.73 112.62 103.66 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d e 3 6 9 12 15 SE +/- 0.21792, N = 15 9.63175 9.05554 9.19721 8.87924 9.64652 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d e 80 160 240 320 400 SE +/- 0.76, N = 3 356.48 355.75 357.86 355.91 357.36 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d e 110 220 330 440 550 SE +/- 6.17, N = 3 492.38 502.75 495.10 494.74 482.96 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d e 0.2341 0.4682 0.7023 0.9364 1.1705 SE +/- 0.00581, N = 3 1.02774 1.03469 1.02726 1.03550 1.04055 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d e 0.7312 1.4624 2.1936 2.9248 3.656 SE +/- 0.02636, N = 3 3.19209 3.22287 3.19755 3.18653 3.24962 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d e 10 20 30 40 50 SE +/- 0.37, N = 3 45.97 45.68 45.42 45.29 46.25 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d e 14 28 42 56 70 SE +/- 0.49, N = 3 57.63 56.47 61.12 57.28 55.08 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d e 1.076 2.152 3.228 4.304 5.38 SE +/- 0.00534, N = 3 4.78134 4.78224 4.78108 4.76332 4.77650 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d e 2 4 6 8 10 SE +/- 0.05998, N = 8 6.90233 7.06699 7.33799 6.81870 7.20087 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b c d e 13 26 39 52 65 SE +/- 0.36, N = 3 58.02 56.69 58.36 57.45 57.52 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel a b c d e 16 32 48 64 80 SE +/- 0.73, N = 5 71.37 66.98 63.30 66.31 74.10 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d e 1.2773 2.5546 3.8319 5.1092 6.3865 SE +/- 0.01074, N = 3 5.66499 5.65885 5.65828 5.67701 5.66489 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d e 2 4 6 8 10 SE +/- 0.08654, N = 3 8.44437 7.96304 8.70901 8.28234 8.23308 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5