litert-onednn-xnnpack

2 x AMD EPYC 9575F 64-Core testing with a AMD VOLCANO (RVOT1000D BIOS) and ASPEED on Ubuntu 24.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2410169-NE-LITERTONE51&grr&rdt.

litert-onednn-xnnpack ProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelCompilerFile-SystemScreen Resolutionaaab2 x AMD EPYC 9575F 64-Core @ 5.01GHz (128 Cores / 256 Threads)AMD VOLCANO (RVOT1000D BIOS)AMD Device 153a1520GB2 x 3841GB SAMSUNG MZWLO3T8HCLS-00A07ASPEEDBroadcom NetXtreme BCM5720 PCIeUbuntu 24.046.8.12-powercap-1ah-patched (x86_64)GCC 13.2.0ext41920x1200OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-OiuXZC/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-OiuXZC/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- a: Scaling Governor: amd-pstate-epp powersave (EPP: power) - CPU Microcode: 0xb002110 - aa: Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xb002110 - b: Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xb002110 Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

litert-onednn-xnnpack litert: Inception V4litert: Mobilenet Floatlitert: DeepLab V3litert: NASNet Mobilelitert: Mobilenet Quantlitert: Inception ResNet V2litert: SqueezeNetxnnpack: QS8MobileNetV2xnnpack: FP16MobileNetV3Smallxnnpack: FP16MobileNetV3Largexnnpack: FP16MobileNetV2xnnpack: FP16MobileNetV1xnnpack: FP32MobileNetV3Smallxnnpack: FP32MobileNetV3Largexnnpack: FP32MobileNetV2xnnpack: FP32MobileNetV1onednn: Recurrent Neural Network Training - CPUonednn: Recurrent Neural Network Inference - CPUlitert: Quantized COCO SSD MobileNet v1onednn: Deconvolution Batch shapes_1d - CPUonednn: IP Shapes 1D - CPUonednn: IP Shapes 3D - CPUonednn: Convolution Batch Shapes Auto - CPUonednn: Deconvolution Batch shapes_3d - CPUaaab65970.95922.5928110.11800029607.40417.017387.45917.81790.6319540.4517910.2583960.429801680145619.3523838.8133856082918.41321029561.251725725785273191852574832122422775167967130418.838395.48810521.115.39180.6320950.4585060.2597770.42534866603.96222.5522624.719098736194.890362.09771.8225089174792707914703105041718822324143257699415.615389.98810308.115.74650.6348810.4538330.2577350.430174OpenBenchmarking.org

LiteRT

Model: Inception V4

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Inception V4aaab15K30K45K60K75KSE +/- 5520.67, N = 15SE +/- 746.34, N = 1565970.968014.066603.9

LiteRT

Model: Mobilenet Float

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Mobilenet Floataaab13002600390052006500SE +/- 119.72, N = 13SE +/- 104.76, N = 155922.595619.356222.55

LiteRT

Model: DeepLab V3

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: DeepLab V3aaab6K12K18K24K30KSE +/- 3193.55, N = 12SE +/- 1635.57, N = 1528110.123838.822624.7

LiteRT

Model: NASNet Mobile

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: NASNet Mobileaaab300K600K900K1200K1500KSE +/- 16545.28, N = 12SE +/- 16019.04, N = 121800021338560190987

LiteRT

Model: Mobilenet Quant

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Mobilenet Quantaab20K40K60K80K100KSE +/- 5253.23, N = 1582918.436194.8

LiteRT

Model: Inception ResNet V2

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Inception ResNet V2aab30K60K90K120K150KSE +/- 2643.69, N = 12132102.090362.0

LiteRT

Model: SqueezeNet

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: SqueezeNetaaab2K4K6K8K10KSE +/- 109.45, N = 15SE +/- 126.71, N = 39607.409561.259771.82

XNNPACK

Model: QS8MobileNetV2

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: QS8MobileNetV2aab5K10K15K20K25K17257250891. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP16MobileNetV3Small

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV3Smallaab6K12K18K24K30K25785174791. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP16MobileNetV3Large

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV3Largeaab6K12K18K24K30K27319270791. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP16MobileNetV2

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV2aab4K8K12K16K20K18525147031. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP16MobileNetV1

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV1aab2K4K6K8K10K7483105041. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP32MobileNetV3Small

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV3Smallaab5K10K15K20K25K21224171881. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP32MobileNetV3Large

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV3Largeaab5K10K15K20K25K22775223241. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP32MobileNetV2

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV2aab4K8K12K16K20K16796143251. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP32MobileNetV1

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV1aab16003200480064008000713076991. (CXX) g++ options: -O3 -lrt -lm

oneDNN

Harness: Recurrent Neural Network Training - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Training - Engine: CPUaaab90180270360450SE +/- 1.82, N = 3SE +/- 1.16, N = 3417.02418.84415.62MIN: 406.83MIN: 408.29MIN: 408.781. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Inference - Engine: CPUaaab90180270360450SE +/- 2.03, N = 3SE +/- 3.25, N = 3387.46395.49389.99MIN: 376.31MIN: 382.73MIN: 376.481. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread

LiteRT

Model: Quantized COCO SSD MobileNet v1

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Quantized COCO SSD MobileNet v1aab2K4K6K8K10KSE +/- 144.17, N = 310521.110308.1

oneDNN

Harness: Deconvolution Batch shapes_1d - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_1d - Engine: CPUaaab48121620SE +/- 0.01, N = 3SE +/- 0.07, N = 317.8215.3915.75MIN: 14.97MIN: 13.31MIN: 13.641. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 1D - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 1D - Engine: CPUaaab0.14280.28560.42840.57120.714SE +/- 0.001909, N = 3SE +/- 0.002596, N = 30.6319540.6320950.634881MIN: 0.57MIN: 0.57MIN: 0.571. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 3D - Engine: CPUaaab0.10320.20640.30960.41280.516SE +/- 0.002024, N = 3SE +/- 0.000988, N = 30.4517910.4585060.453833MIN: 0.39MIN: 0.4MIN: 0.41. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Convolution Batch Shapes Auto - Engine: CPUaaab0.05840.11680.17520.23360.292SE +/- 0.000187, N = 3SE +/- 0.000249, N = 30.2583960.2597770.257735MIN: 0.24MIN: 0.25MIN: 0.251. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_3d - Engine: CPUaaab0.09680.19360.29040.38720.484SE +/- 0.004175, N = 3SE +/- 0.005431, N = 30.4298010.4253480.430174MIN: 0.41MIN: 0.41MIN: 0.411. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread


Phoronix Test Suite v10.8.5