new satty AMD Ryzen AI 9 365 testing with a ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS) and AMD Radeon 512MB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2408252-NE-NEWSATTY701&sro&grs .
new satty Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d AMD Ryzen AI 9 365 @ 4.31GHz (10 Cores / 20 Threads) ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS) AMD Device 1507 4 x 6GB LPDDR5-7500MT/s Micron MT62F1536M32D4DS-026 1024GB MTFDKBA1T0QFM-1BD1AABGB AMD Radeon 512MB AMD Rembrandt Radeon HD Audio MEDIATEK Device 7925 Ubuntu 24.04 6.10.0-phx (x86_64) GNOME Shell 46.0 X Server + Wayland 4.6 Mesa 24.3~git2407280600.a211a5~oibaf~n (git-a211a51 2024-07-28 noble-oibaf-ppa) (LLVM 17.0.6 DRM 3.57) GCC 13.2.0 ext4 2880x1800 OpenBenchmarking.org Kernel Details - amdgpu.dcdebugmask=0x600 - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - Platform Profile: balanced - CPU Microcode: 0xb204011 - ACPI Profile: balanced Python Details - Python 3.12.3 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
new satty simdjson: PartialTweets onnx: ArcFace ResNet-100 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard svt-av1: Preset 5 - Bosphorus 4K onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: ZFNet-512 - CPU - Standard svt-av1: Preset 3 - Bosphorus 4K onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: yolov4 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel onnx: GPT-2 - CPU - Standard svt-av1: Preset 8 - Bosphorus 4K onnx: fcn-resnet101-11 - CPU - Parallel svt-av1: Preset 8 - Bosphorus 1080p onnx: ZFNet-512 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel svt-av1: Preset 5 - Bosphorus 1080p svt-av1: Preset 13 - Bosphorus 4K svt-av1: Preset 5 - Beauty 4K 10-bit svt-av1: Preset 3 - Bosphorus 1080p onnx: ArcFace ResNet-100 - CPU - Parallel svt-av1: Preset 8 - Beauty 4K 10-bit whisperfile: Small onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard svt-av1: Preset 13 - Bosphorus 1080p whisperfile: Medium whisperfile: Tiny svt-av1: Preset 3 - Beauty 4K 10-bit onnx: GPT-2 - CPU - Parallel simdjson: DistinctUserID onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Standard simdjson: Kostya simdjson: TopTweet simdjson: LargeRand svt-av1: Preset 13 - Beauty 4K 10-bit onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: ZFNet-512 - CPU - Standard onnx: ZFNet-512 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Parallel a b c d 8.74 22.927 6.48303 216.667 1.16523 14.781 0.534553 8.60423 80.9921 93.206 3.728 0.465907 4.36533 565.244 70.1183 5.52731 130.748 30.826 0.892606 106.46 42.5418 135.85 74.6918 43.914 113.827 2.573 11.911 11.5819 3.694 259.90777 39.7871 474.771 754.78056 52.71359 0.568 96.8538 7.09 26.2276 118.516 166.699 4.51 6.88 1.25 6.164 25.1311 38.1251 1870.72 2146.34 12.3454 14.2597 4.61373 13.3853 43.6148 86.3383 858.199 1120.31 1.76819 7.35929 116.218 180.913 5.99672 8.43551 10.7266 23.5033 154.246 229.072 7.6417 10.3168 6.74 22.9872 6.23528 222.382 1.15557 13.308 0.533521 8.48368 75.8635 91.4483 3.491 0.445551 4.15031 565.483 69.1277 5.50029 128.263 29.745 0.868291 100.454 41.976 139.674 75.9022 41.951 109.45 2.557 11.437 11.7777 3.606 261.74247 39.4192 455.785 751.4815 52.63452 0.565 95.5885 6.91 26.4566 120.311 167.567 4.42 6.91 1.25 6.158 25.3656 37.795 1874.34 2244.4 13.1798 14.4642 4.49478 13.1725 43.5 84.9029 865.312 1151.68 1.76734 7.1579 117.87 181.802 5.96555 8.30904 10.9323 23.8202 160.374 240.94 7.78951 10.4521 6.59 23.7636 6.92476 226.04 1.22009 14.881 0.526244 8.99805 78.9522 91.3322 3.796 0.456424 4.36559 572.861 68.938 5.8995 127.426 31.636 0.895635 107.482 44.791 144.829 77.3006 44.658 116.143 2.63 12.089 11.6564 3.708 257.37884 39.7086 476.223 722.53519 52.49648 0.573 97.0868 6.9 26.1799 118.211 168.371 4.41 6.94 1.25 6.135 25.1788 38.1944 1900.25 2190.94 12.6641 14.5036 4.42207 12.9344 42.0785 85.7866 819.557 1116.52 1.74432 6.90258 111.131 169.5 5.93691 8.45631 10.9462 22.3234 144.406 229.058 7.84183 10.2922 6.83 19.7127 5.89878 193.668 1.06101 12.942 0.471409 7.96736 71.9372 83.0327 3.410 0.420335 3.95583 519.453 64.0751 5.42253 120.741 29.258 0.834423 101.169 42.5568 139.865 72.5395 42.091 112.326 2.481 11.564 11.1720 3.523 269.17700 38.0561 460.254 752.93123 54.71240 0.550 94.3693 7.04 25.7982 117.354 164.449 4.46 7.02 1.24 6.143 26.2808 38.7603 2122.37 2379.56 13.9008 15.6076 5.16479 13.7852 50.7730 89.5086 942.938 1198.57 1.92558 7.14811 125.545 184.512 6.07861 8.52036 12.0469 23.4969 169.616 252.883 8.27490 10.5900 OpenBenchmarking.org
simdjson Throughput Test: PartialTweets OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: PartialTweets a b c d 2 4 6 8 10 SE +/- 0.02, N = 3 8.74 6.74 6.59 6.83 1. (CXX) g++ options: -O3 -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d 6 12 18 24 30 SE +/- 0.16, N = 15 22.93 22.99 23.76 19.71 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b c d 2 4 6 8 10 SE +/- 0.04221, N = 12 6.48303 6.23528 6.92476 5.89878 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d 50 100 150 200 250 SE +/- 1.31, N = 15 216.67 222.38 226.04 193.67 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d 0.2745 0.549 0.8235 1.098 1.3725 SE +/- 0.01056, N = 6 1.16523 1.15557 1.22009 1.06101 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 4K a b c d 4 8 12 16 20 SE +/- 0.15, N = 3 14.78 13.31 14.88 12.94 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d 0.1203 0.2406 0.3609 0.4812 0.6015 SE +/- 0.003579, N = 10 0.534553 0.533521 0.526244 0.471409 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d 3 6 9 12 15 SE +/- 0.09828, N = 3 8.60423 8.48368 8.99805 7.96736 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d 20 40 60 80 100 SE +/- 0.54, N = 3 80.99 75.86 78.95 71.94 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a b c d 20 40 60 80 100 SE +/- 0.57, N = 12 93.21 91.45 91.33 83.03 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 4K a b c d 0.8541 1.7082 2.5623 3.4164 4.2705 SE +/- 0.037, N = 9 3.728 3.491 3.796 3.410 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a b c d 0.1048 0.2096 0.3144 0.4192 0.524 SE +/- 0.004361, N = 3 0.465907 0.445551 0.456424 0.420335 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel a b c d 0.9823 1.9646 2.9469 3.9292 4.9115 SE +/- 0.05493, N = 3 4.36533 4.15031 4.36559 3.95583 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d 120 240 360 480 600 SE +/- 4.51, N = 12 565.24 565.48 572.86 519.45 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d 16 32 48 64 80 SE +/- 0.67, N = 3 70.12 69.13 68.94 64.08 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d 1.3274 2.6548 3.9822 5.3096 6.637 SE +/- 0.03573, N = 14 5.52731 5.50029 5.89950 5.42253 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d 30 60 90 120 150 SE +/- 0.43, N = 3 130.75 128.26 127.43 120.74 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 4K a b c d 7 14 21 28 35 SE +/- 0.19, N = 15 30.83 29.75 31.64 29.26 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d 0.2015 0.403 0.6045 0.806 1.0075 SE +/- 0.006409, N = 3 0.892606 0.868291 0.895635 0.834423 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 1080p a b c d 20 40 60 80 100 SE +/- 0.41, N = 3 106.46 100.45 107.48 101.17 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel a b c d 10 20 30 40 50 SE +/- 0.27, N = 3 42.54 41.98 44.79 42.56 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d 30 60 90 120 150 SE +/- 0.82, N = 3 135.85 139.67 144.83 139.87 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d 20 40 60 80 100 SE +/- 0.61, N = 3 74.69 75.90 77.30 72.54 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 1080p a b c d 10 20 30 40 50 SE +/- 0.37, N = 8 43.91 41.95 44.66 42.09 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 4K a b c d 30 60 90 120 150 SE +/- 0.95, N = 3 113.83 109.45 116.14 112.33 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit a b c d 0.5918 1.1836 1.7754 2.3672 2.959 SE +/- 0.016, N = 3 2.573 2.557 2.630 2.481 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 1080p a b c d 3 6 9 12 15 SE +/- 0.05, N = 3 11.91 11.44 12.09 11.56 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d 3 6 9 12 15 SE +/- 0.04, N = 3 11.58 11.78 11.66 11.17 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit a b c d 0.8343 1.6686 2.5029 3.3372 4.1715 SE +/- 0.010, N = 3 3.694 3.606 3.708 3.523 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Whisperfile Model Size: Small OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Small a b c d 60 120 180 240 300 SE +/- 3.09, N = 3 259.91 261.74 257.38 269.18
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d 9 18 27 36 45 SE +/- 0.45, N = 3 39.79 39.42 39.71 38.06 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 1080p a b c d 100 200 300 400 500 SE +/- 3.20, N = 3 474.77 455.79 476.22 460.25 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Whisperfile Model Size: Medium OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Medium a b c d 160 320 480 640 800 SE +/- 2.92, N = 3 754.78 751.48 722.54 752.93
Whisperfile Model Size: Tiny OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Tiny a b c d 12 24 36 48 60 SE +/- 0.38, N = 3 52.71 52.63 52.50 54.71
SVT-AV1 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit a b c d 0.1289 0.2578 0.3867 0.5156 0.6445 SE +/- 0.004, N = 3 0.568 0.565 0.573 0.550 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d 20 40 60 80 100 SE +/- 0.87, N = 3 96.85 95.59 97.09 94.37 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
simdjson Throughput Test: DistinctUserID OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: DistinctUserID a b c d 2 4 6 8 10 SE +/- 0.08, N = 3 7.09 6.91 6.90 7.04 1. (CXX) g++ options: -O3 -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d 6 12 18 24 30 SE +/- 0.09, N = 3 26.23 26.46 26.18 25.80 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d 30 60 90 120 150 SE +/- 1.16, N = 3 118.52 120.31 118.21 117.35 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d 40 80 120 160 200 SE +/- 0.63, N = 3 166.70 167.57 168.37 164.45 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
simdjson Throughput Test: Kostya OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: Kostya a b c d 1.0148 2.0296 3.0444 4.0592 5.074 SE +/- 0.02, N = 3 4.51 4.42 4.41 4.46 1. (CXX) g++ options: -O3 -lrt
simdjson Throughput Test: TopTweet OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: TopTweet a b c d 2 4 6 8 10 SE +/- 0.02, N = 3 6.88 6.91 6.94 7.02 1. (CXX) g++ options: -O3 -lrt
simdjson Throughput Test: LargeRandom OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: LargeRandom a b c d 0.2813 0.5626 0.8439 1.1252 1.4065 SE +/- 0.00, N = 3 1.25 1.25 1.25 1.24 1. (CXX) g++ options: -O3 -lrt
SVT-AV1 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit a b c d 2 4 6 8 10 SE +/- 0.006, N = 3 6.164 6.158 6.135 6.143 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d 6 12 18 24 30 SE +/- 0.31, N = 3 25.13 25.37 25.18 26.28 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d 9 18 27 36 45 SE +/- 0.13, N = 3 38.13 37.80 38.19 38.76 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d 500 1000 1500 2000 2500 SE +/- 15.92, N = 10 1870.72 1874.34 1900.25 2122.37 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a b c d 500 1000 1500 2000 2500 SE +/- 24.44, N = 3 2146.34 2244.40 2190.94 2379.56 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d 4 8 12 16 20 SE +/- 0.11, N = 3 12.35 13.18 12.66 13.90 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d 4 8 12 16 20 SE +/- 0.16, N = 3 14.26 14.46 14.50 15.61 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d 1.1621 2.3242 3.4863 4.6484 5.8105 SE +/- 0.03387, N = 15 4.61373 4.49478 4.42207 5.16479 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d 4 8 12 16 20 SE +/- 0.12, N = 3 13.39 13.17 12.93 13.79 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d 11 22 33 44 55 SE +/- 0.41, N = 15 43.61 43.50 42.08 50.77 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d 20 40 60 80 100 SE +/- 0.32, N = 3 86.34 84.90 85.79 89.51 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d 200 400 600 800 1000 SE +/- 9.14, N = 6 858.20 865.31 819.56 942.94 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d 300 600 900 1200 1500 SE +/- 9.16, N = 3 1120.31 1151.68 1116.52 1198.57 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d 0.4333 0.8666 1.2999 1.7332 2.1665 SE +/- 0.01672, N = 12 1.76819 1.76734 1.74432 1.92558 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d 2 4 6 8 10 SE +/- 0.04235, N = 3 7.35929 7.15790 6.90258 7.14811 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d 30 60 90 120 150 SE +/- 1.55, N = 3 116.22 117.87 111.13 125.55 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d 40 80 120 160 200 SE +/- 1.19, N = 14 180.91 181.80 169.50 184.51 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d 2 4 6 8 10 SE +/- 0.02319, N = 3 5.99672 5.96555 5.93691 6.07861 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d 2 4 6 8 10 SE +/- 0.08432, N = 3 8.43551 8.30904 8.45631 8.52036 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a b c d 3 6 9 12 15 SE +/- 0.08, N = 12 10.73 10.93 10.95 12.05 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel a b c d 6 12 18 24 30 SE +/- 0.15, N = 3 23.50 23.82 22.32 23.50 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b c d 40 80 120 160 200 SE +/- 1.19, N = 12 154.25 160.37 144.41 169.62 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel a b c d 60 120 180 240 300 SE +/- 3.53, N = 3 229.07 240.94 229.06 252.88 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d 2 4 6 8 10 SE +/- 0.02926, N = 3 7.64170 7.78951 7.84183 8.27490 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d 3 6 9 12 15 SE +/- 0.10, N = 3 10.32 10.45 10.29 10.59 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5