m7g.8xlarge

amazon testing on Ubuntu 22.04 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2407019-NE-M7G8XLARG55
Jump To Table - Results

Statistics

Remove Outliers Before Calculating Averages

Graph Settings

Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Result
Identifier
View Logs
Performance Per
Dollar
Date
Run
  Test
  Duration
m7g.8xlarge
July 01
  11 Hours, 30 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


m7g.8xlargeOpenBenchmarking.orgPhoronix Test SuiteARMv8 Neoverse-V1 (32 Cores)Amazon EC2 m7g.8xlarge (1.0 BIOS)Amazon Device 0200128GB537GB Amazon Elastic Block StoreAmazon ElasticUbuntu 22.046.5.0-1017-aws (aarch64)1.3.255GCC 11.4.0ext4amazonProcessorMotherboardChipsetMemoryDiskNetworkOSKernelVulkanCompilerFile-SystemSystem LayerM7g.8xlarge BenchmarksSystem Logs- Transparent Huge Pages: madvise- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected

m7g.8xlargeopenvino: Face Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection Retail FP16 - CPUopenvino: Road Segmentation ADAS FP16 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Face Detection Retail FP16-INT8 - CPUopenvino: Road Segmentation ADAS FP16-INT8 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Handwritten English Recognition FP16 - CPUopenvino: Person Re-Identification Retail FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Handwritten English Recognition FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUonnx: GPT-2 - CPU - Parallelonnx: GPT-2 - CPU - Standardonnx: yolov4 - CPU - Parallelonnx: yolov4 - CPU - Standardonnx: T5 Encoder - CPU - Parallelonnx: T5 Encoder - CPU - Standardonnx: bertsquad-12 - CPU - Parallelonnx: bertsquad-12 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Parallelonnx: CaffeNet 12-int8 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Parallelonnx: fcn-resnet101-11 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Parallelonnx: ArcFace ResNet-100 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Parallelonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: super-resolution-10 - CPU - Parallelonnx: super-resolution-10 - CPU - Standardonnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallelonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standarddeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Streamdeepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamdeepsparse: ResNet-50, Baseline - Synchronous Single-Streamdeepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Streamdeepsparse: Llama2 Chat 7b Quantized - Asynchronous Multi-Streamdeepsparse: Llama2 Chat 7b Quantized - Synchronous Single-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Streamllama-cpp: Meta-Llama-3-8B-Instruct-Q8_0.ggufonnx: GPT-2 - CPU - Parallelonnx: GPT-2 - CPU - Standardonnx: yolov4 - CPU - Parallelonnx: yolov4 - CPU - Standardonnx: T5 Encoder - CPU - Parallelonnx: T5 Encoder - CPU - Standardonnx: bertsquad-12 - CPU - Parallelonnx: bertsquad-12 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Parallelonnx: CaffeNet 12-int8 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Parallelonnx: fcn-resnet101-11 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Parallelonnx: ArcFace ResNet-100 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Parallelonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: super-resolution-10 - CPU - Parallelonnx: super-resolution-10 - CPU - Standardonnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallelonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardopenvino: Face Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection Retail FP16 - CPUopenvino: Road Segmentation ADAS FP16 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Face Detection Retail FP16-INT8 - CPUopenvino: Road Segmentation ADAS FP16-INT8 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Handwritten English Recognition FP16 - CPUopenvino: Person Re-Identification Retail FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Handwritten English Recognition FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopencv: Coreopencv: Videoopencv: Stitchingopencv: Features 2Dopencv: Image Processingopencv: Object Detectionopencv: DNN - Deep Neural Networkonednn: IP Shapes 1D - CPUonednn: IP Shapes 3D - CPUonednn: Convolution Batch Shapes Auto - CPUonednn: Deconvolution Batch shapes_1d - CPUonednn: Deconvolution Batch shapes_3d - CPUonednn: Recurrent Neural Network Training - CPUonednn: Recurrent Neural Network Inference - CPUdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Streamdeepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamdeepsparse: ResNet-50, Baseline - Synchronous Single-Streamdeepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Streamdeepsparse: Llama2 Chat 7b Quantized - Asynchronous Multi-Streamdeepsparse: Llama2 Chat 7b Quantized - Synchronous Single-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Streamwhisper-cpp: ggml-base.en - 2016 State of the Unionwhisper-cpp: ggml-small.en - 2016 State of the Unionwhisper-cpp: ggml-medium.en - 2016 State of the Unionmlpack: scikit_icamlpack: scikit_qdamlpack: scikit_svmmlpack: scikit_linearridgeregressionm7g.8xlarge3.6821.1721.20298.312.58903.29113.7051.63543.78166.0216.2146.87332.14314.9270.6743.01347.813808.1339.623886.78144.421216.3834.591528.85209229.629352.8038.9756018.4386385.457929.2081.136581.4202411.342918.1520182.878254.50078.216379.08145.764486.2602010.393810.3256404.7362201.7181148.1704108.7917984.1460412.77343.674716.7775148.0910108.747166.421456.349896.535473.841614.320112.3836173.938562.975710.322510.328022.476.917444.61260217.818112.9634.353582.83110111.41854.22902.592721.07450879.876704.10488.160455.08705.466723.9280912.783912.6429173.474159.7552151.54377.31376.8026.783042.178.8470.31154.8614.7048.17492.91170.4824.0725.39113.16185.8422.982.09201.652.0596964227602844805474310529227261234596.504494.5261910.583664.400313.77267798.293943.271500.496196.832739.42534.9453107.48259.176616.18552.41324083.765859.5688107.58539.1805239.372617.7329164.792113.53011105.383980.729691.624215.86311498.919396.810281.60950179.73014439.6117033.7220.4816.871.70OpenBenchmarking.org

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Face Detection FP16 - Device: CPUm7g.8xlarge0.8281.6562.4843.3124.14SE +/- 0.01, N = 33.681. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Person Detection FP16 - Device: CPUm7g.8xlarge510152025SE +/- 0.01, N = 321.171. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Person Detection FP32 - Device: CPUm7g.8xlarge510152025SE +/- 0.02, N = 321.201. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Vehicle Detection FP16 - Device: CPUm7g.8xlarge60120180240300SE +/- 0.45, N = 3298.311. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Face Detection FP16-INT8 - Device: CPUm7g.8xlarge0.58051.1611.74152.3222.9025SE +/- 0.00, N = 32.581. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Face Detection Retail FP16 - Device: CPUm7g.8xlarge2004006008001000SE +/- 0.28, N = 3903.291. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Road Segmentation ADAS FP16 - Device: CPUm7g.8xlarge306090120150SE +/- 0.14, N = 3113.701. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Vehicle Detection FP16-INT8 - Device: CPUm7g.8xlarge1224364860SE +/- 0.11, N = 351.631. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Weld Porosity Detection FP16 - Device: CPUm7g.8xlarge120240360480600SE +/- 0.86, N = 3543.781. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Face Detection Retail FP16-INT8 - Device: CPUm7g.8xlarge4080120160200SE +/- 0.13, N = 3166.021. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Road Segmentation ADAS FP16-INT8 - Device: CPUm7g.8xlarge48121620SE +/- 0.01, N = 316.211. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Machine Translation EN To DE FP16 - Device: CPUm7g.8xlarge1122334455SE +/- 0.02, N = 346.871. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Weld Porosity Detection FP16-INT8 - Device: CPUm7g.8xlarge70140210280350SE +/- 1.64, N = 3332.141. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Person Vehicle Bike Detection FP16 - Device: CPUm7g.8xlarge70140210280350SE +/- 3.87, N = 3314.921. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Noise Suppression Poconet-Like FP16 - Device: CPUm7g.8xlarge1632486480SE +/- 0.02, N = 370.671. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Handwritten English Recognition FP16 - Device: CPUm7g.8xlarge1020304050SE +/- 0.18, N = 343.011. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Person Re-Identification Retail FP16 - Device: CPUm7g.8xlarge80160240320400SE +/- 1.11, N = 3347.811. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Age Gender Recognition Retail 0013 FP16 - Device: CPUm7g.8xlarge8001600240032004000SE +/- 1.81, N = 33808.131. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Handwritten English Recognition FP16-INT8 - Device: CPUm7g.8xlarge918273645SE +/- 0.07, N = 339.621. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUm7g.8xlarge8001600240032004000SE +/- 2.45, N = 33886.781. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Model Zoo. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Parallelm7g.8xlarge306090120150SE +/- 0.17, N = 3144.421. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Standardm7g.8xlarge50100150200250SE +/- 0.50, N = 3216.381. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Parallelm7g.8xlarge1.03312.06623.09934.13245.1655SE +/- 0.03681, N = 34.591521. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Standardm7g.8xlarge246810SE +/- 0.00106, N = 38.852091. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Parallelm7g.8xlarge50100150200250SE +/- 0.81, N = 3229.631. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Standardm7g.8xlarge80160240320400SE +/- 2.01, N = 3352.801. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Parallelm7g.8xlarge3691215SE +/- 0.05195, N = 38.975601. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Standardm7g.8xlarge510152025SE +/- 0.02, N = 318.441. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallelm7g.8xlarge80160240320400SE +/- 0.75, N = 3385.461. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Standardm7g.8xlarge2004006008001000SE +/- 0.48, N = 3929.211. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Parallelm7g.8xlarge0.25570.51140.76711.02281.2785SE +/- 0.00545, N = 31.136581. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Standardm7g.8xlarge0.31960.63920.95881.27841.598SE +/- 0.00084, N = 31.420241. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallelm7g.8xlarge3691215SE +/- 0.03, N = 311.341. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Standardm7g.8xlarge48121620SE +/- 0.01, N = 318.151. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallelm7g.8xlarge4080120160200SE +/- 0.35, N = 3182.881. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standardm7g.8xlarge60120180240300SE +/- 0.57, N = 3254.501. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Parallelm7g.8xlarge20406080100SE +/- 0.04, N = 378.221. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Standardm7g.8xlarge20406080100SE +/- 0.09, N = 379.081. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallelm7g.8xlarge1.2972.5943.8915.1886.485SE +/- 0.00787, N = 35.764481. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standardm7g.8xlarge246810SE +/- 0.05082, N = 36.260201. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Neural Magic DeepSparse

This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Streamm7g.8xlarge3691215SE +/- 0.08, N = 310.39

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Streamm7g.8xlarge3691215SE +/- 0.00, N = 310.33

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Streamm7g.8xlarge90180270360450SE +/- 0.33, N = 3404.74

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Streamm7g.8xlarge4080120160200SE +/- 0.22, N = 3201.72

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Streamm7g.8xlarge306090120150SE +/- 0.05, N = 3148.17

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: ResNet-50, Baseline - Scenario: Synchronous Single-Streamm7g.8xlarge20406080100SE +/- 0.03, N = 3108.79

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Streamm7g.8xlarge2004006008001000SE +/- 0.55, N = 3984.15

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Streamm7g.8xlarge90180270360450SE +/- 1.54, N = 3412.77

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: Llama2 Chat 7b Quantized - Scenario: Asynchronous Multi-Streamm7g.8xlarge0.82681.65362.48043.30724.134SE +/- 0.0071, N = 33.6747

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: Llama2 Chat 7b Quantized - Scenario: Synchronous Single-Streamm7g.8xlarge48121620SE +/- 0.01, N = 316.78

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Streamm7g.8xlarge306090120150SE +/- 0.02, N = 3148.09

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Streamm7g.8xlarge20406080100SE +/- 0.01, N = 3108.75

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Streamm7g.8xlarge1530456075SE +/- 0.02, N = 366.42

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Streamm7g.8xlarge1326395265SE +/- 0.02, N = 356.35

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Streamm7g.8xlarge20406080100SE +/- 0.04, N = 396.54

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Streamm7g.8xlarge1632486480SE +/- 0.02, N = 373.84

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Streamm7g.8xlarge48121620SE +/- 0.03, N = 314.32

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Streamm7g.8xlarge3691215SE +/- 0.01, N = 312.38

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Streamm7g.8xlarge4080120160200SE +/- 0.23, N = 3173.94

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Streamm7g.8xlarge1428425670SE +/- 0.30, N = 362.98

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Streamm7g.8xlarge3691215SE +/- 0.00, N = 310.32

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.7Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Streamm7g.8xlarge3691215SE +/- 0.00, N = 310.33

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b3067Model: Meta-Llama-3-8B-Instruct-Q8_0.ggufm7g.8xlarge510152025SE +/- 0.14, N = 322.471. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -lopenblas

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Face Detection FP16 - Device: CPUm7g.8xlarge5001000150020002500SE +/- 2.66, N = 32151.54MIN: 1728.64 / MAX: 4274.631. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Person Detection FP16 - Device: CPUm7g.8xlarge80160240320400SE +/- 0.20, N = 3377.31MIN: 235.41 / MAX: 510.121. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Person Detection FP32 - Device: CPUm7g.8xlarge80160240320400SE +/- 0.30, N = 3376.80MIN: 207.53 / MAX: 510.751. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Vehicle Detection FP16 - Device: CPUm7g.8xlarge612182430SE +/- 0.04, N = 326.78MIN: 23.25 / MAX: 51.981. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Face Detection FP16-INT8 - Device: CPUm7g.8xlarge7001400210028003500SE +/- 1.31, N = 33042.17MIN: 2748.48 / MAX: 4882.421. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Face Detection Retail FP16 - Device: CPUm7g.8xlarge246810SE +/- 0.00, N = 38.84MIN: 7.91 / MAX: 16.481. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Road Segmentation ADAS FP16 - Device: CPUm7g.8xlarge1632486480SE +/- 0.09, N = 370.31MIN: 53.97 / MAX: 122.981. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Vehicle Detection FP16-INT8 - Device: CPUm7g.8xlarge306090120150SE +/- 0.32, N = 3154.86MIN: 152.55 / MAX: 178.821. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Weld Porosity Detection FP16 - Device: CPUm7g.8xlarge48121620SE +/- 0.02, N = 314.70MIN: 11.59 / MAX: 168.151. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Face Detection Retail FP16-INT8 - Device: CPUm7g.8xlarge1122334455SE +/- 0.04, N = 348.17MIN: 46.88 / MAX: 54.741. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Road Segmentation ADAS FP16-INT8 - Device: CPUm7g.8xlarge110220330440550SE +/- 0.48, N = 3492.91MIN: 489.58 / MAX: 526.431. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Machine Translation EN To DE FP16 - Device: CPUm7g.8xlarge4080120160200SE +/- 0.08, N = 3170.48MIN: 154.07 / MAX: 334.21. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Weld Porosity Detection FP16-INT8 - Device: CPUm7g.8xlarge612182430SE +/- 0.12, N = 324.07MIN: 22.31 / MAX: 225.531. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Person Vehicle Bike Detection FP16 - Device: CPUm7g.8xlarge612182430SE +/- 0.32, N = 325.39MIN: 22.51 / MAX: 39.781. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Noise Suppression Poconet-Like FP16 - Device: CPUm7g.8xlarge306090120150SE +/- 0.04, N = 3113.16MIN: 110.94 / MAX: 150.981. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Handwritten English Recognition FP16 - Device: CPUm7g.8xlarge4080120160200SE +/- 0.75, N = 3185.84MIN: 183.08 / MAX: 211.711. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Person Re-Identification Retail FP16 - Device: CPUm7g.8xlarge612182430SE +/- 0.07, N = 322.98MIN: 16.5 / MAX: 41.021. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Age Gender Recognition Retail 0013 FP16 - Device: CPUm7g.8xlarge0.47030.94061.41091.88122.3515SE +/- 0.00, N = 32.09MIN: 1 / MAX: 26.971. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Handwritten English Recognition FP16-INT8 - Device: CPUm7g.8xlarge4080120160200SE +/- 0.38, N = 3201.65MIN: 199.37 / MAX: 225.021. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUm7g.8xlarge0.46130.92261.38391.84522.3065SE +/- 0.00, N = 32.05MIN: 1.28 / MAX: 23.681. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenCV

This is a benchmark of the OpenCV (Computer Vision) library's built-in performance tests. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: Corem7g.8xlarge20K40K60K80K100KSE +/- 663.91, N = 3969641. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: Videom7g.8xlarge5K10K15K20K25KSE +/- 92.56, N = 3227601. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

Test: Graph API

m7g.8xlarge: The test quit with a non-zero exit status. E: AbsExact error: G-API output and reference output matrixes are not bitexact equal.

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: Stitchingm7g.8xlarge60K120K180K240K300KSE +/- 284.51, N = 32844801. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: Features 2Dm7g.8xlarge12K24K36K48K60KSE +/- 267.56, N = 3547431. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: Image Processingm7g.8xlarge20K40K60K80K100KSE +/- 282.17, N = 31052921. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: Object Detectionm7g.8xlarge6K12K18K24K30KSE +/- 75.10, N = 3272611. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: DNN - Deep Neural Networkm7g.8xlarge5K10K15K20K25KSE +/- 427.59, N = 15234591. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: IP Shapes 1D - Engine: CPUm7g.8xlarge246810SE +/- 0.00772, N = 36.50449MIN: 6.421. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: IP Shapes 3D - Engine: CPUm7g.8xlarge1.01842.03683.05524.07365.092SE +/- 0.12704, N = 154.52619MIN: 4.131. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Convolution Batch Shapes Auto - Engine: CPUm7g.8xlarge3691215SE +/- 0.02, N = 310.58MIN: 10.451. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Deconvolution Batch shapes_1d - Engine: CPUm7g.8xlarge1428425670SE +/- 0.08, N = 364.40MIN: 64.091. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Deconvolution Batch shapes_3d - Engine: CPUm7g.8xlarge48121620SE +/- 0.01, N = 313.77MIN: 13.691. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Recurrent Neural Network Training - Engine: CPUm7g.8xlarge2K4K6K8K10KSE +/- 80.29, N = 37798.29MIN: 7620.61. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Recurrent Neural Network Inference - Engine: CPUm7g.8xlarge8001600240032004000SE +/- 12.89, N = 33943.27MIN: 3910.41. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

Neural Magic DeepSparse

This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Streamm7g.8xlarge30060090012001500SE +/- 0.95, N = 31500.50

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Streamm7g.8xlarge20406080100SE +/- 0.03, N = 396.83

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Streamm7g.8xlarge918273645SE +/- 0.03, N = 339.43

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Streamm7g.8xlarge1.11272.22543.33814.45085.5635SE +/- 0.0052, N = 34.9453

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Streamm7g.8xlarge20406080100SE +/- 0.02, N = 3107.48

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: ResNet-50, Baseline - Scenario: Synchronous Single-Streamm7g.8xlarge3691215SE +/- 0.0025, N = 39.1766

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Streamm7g.8xlarge48121620SE +/- 0.01, N = 316.19

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Streamm7g.8xlarge0.5431.0861.6292.1722.715SE +/- 0.0089, N = 32.4132

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: Llama2 Chat 7b Quantized - Scenario: Asynchronous Multi-Streamm7g.8xlarge9001800270036004500SE +/- 7.23, N = 34083.77

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: Llama2 Chat 7b Quantized - Scenario: Synchronous Single-Streamm7g.8xlarge1326395265SE +/- 0.05, N = 359.57

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Streamm7g.8xlarge20406080100SE +/- 0.04, N = 3107.59

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Streamm7g.8xlarge3691215SE +/- 0.0009, N = 39.1805

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Streamm7g.8xlarge50100150200250SE +/- 0.03, N = 3239.37

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Streamm7g.8xlarge48121620SE +/- 0.01, N = 317.73

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Streamm7g.8xlarge4080120160200SE +/- 0.02, N = 3164.79

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Streamm7g.8xlarge3691215SE +/- 0.00, N = 313.53

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Streamm7g.8xlarge2004006008001000SE +/- 1.76, N = 31105.38

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Streamm7g.8xlarge20406080100SE +/- 0.05, N = 380.73

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Streamm7g.8xlarge20406080100SE +/- 0.12, N = 391.62

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Streamm7g.8xlarge48121620SE +/- 0.08, N = 315.86

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Streamm7g.8xlarge30060090012001500SE +/- 0.08, N = 31498.92

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.7Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Streamm7g.8xlarge20406080100SE +/- 0.01, N = 396.81

Whisper.cpp

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-base.en - Input: 2016 State of the Unionm7g.8xlarge20406080100SE +/- 0.69, N = 1581.611. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -mcpu=native

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-small.en - Input: 2016 State of the Unionm7g.8xlarge4080120160200SE +/- 2.02, N = 4179.731. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -mcpu=native

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-medium.en - Input: 2016 State of the Unionm7g.8xlarge100200300400500SE +/- 5.49, N = 9439.611. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -mcpu=native

Mlpack Benchmark

Mlpack benchmark scripts for machine learning libraries Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_icam7g.8xlarge816243240SE +/- 0.05, N = 333.72

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_qdam7g.8xlarge510152025SE +/- 0.09, N = 320.48

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_svmm7g.8xlarge48121620SE +/- 0.02, N = 316.87

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_linearridgeregressionm7g.8xlarge0.38250.7651.14751.531.9125SE +/- 0.00, N = 31.70

126 Results Shown

OpenVINO:
  Face Detection FP16 - CPU
  Person Detection FP16 - CPU
  Person Detection FP32 - CPU
  Vehicle Detection FP16 - CPU
  Face Detection FP16-INT8 - CPU
  Face Detection Retail FP16 - CPU
  Road Segmentation ADAS FP16 - CPU
  Vehicle Detection FP16-INT8 - CPU
  Weld Porosity Detection FP16 - CPU
  Face Detection Retail FP16-INT8 - CPU
  Road Segmentation ADAS FP16-INT8 - CPU
  Machine Translation EN To DE FP16 - CPU
  Weld Porosity Detection FP16-INT8 - CPU
  Person Vehicle Bike Detection FP16 - CPU
  Noise Suppression Poconet-Like FP16 - CPU
  Handwritten English Recognition FP16 - CPU
  Person Re-Identification Retail FP16 - CPU
  Age Gender Recognition Retail 0013 FP16 - CPU
  Handwritten English Recognition FP16-INT8 - CPU
  Age Gender Recognition Retail 0013 FP16-INT8 - CPU
ONNX Runtime:
  GPT-2 - CPU - Parallel
  GPT-2 - CPU - Standard
  yolov4 - CPU - Parallel
  yolov4 - CPU - Standard
  T5 Encoder - CPU - Parallel
  T5 Encoder - CPU - Standard
  bertsquad-12 - CPU - Parallel
  bertsquad-12 - CPU - Standard
  CaffeNet 12-int8 - CPU - Parallel
  CaffeNet 12-int8 - CPU - Standard
  fcn-resnet101-11 - CPU - Parallel
  fcn-resnet101-11 - CPU - Standard
  ArcFace ResNet-100 - CPU - Parallel
  ArcFace ResNet-100 - CPU - Standard
  ResNet50 v1-12-int8 - CPU - Parallel
  ResNet50 v1-12-int8 - CPU - Standard
  super-resolution-10 - CPU - Parallel
  super-resolution-10 - CPU - Standard
  Faster R-CNN R-50-FPN-int8 - CPU - Parallel
  Faster R-CNN R-50-FPN-int8 - CPU - Standard
Neural Magic DeepSparse:
  NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream
  NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream
  NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream
  NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Stream
  ResNet-50, Baseline - Asynchronous Multi-Stream
  ResNet-50, Baseline - Synchronous Single-Stream
  ResNet-50, Sparse INT8 - Asynchronous Multi-Stream
  ResNet-50, Sparse INT8 - Synchronous Single-Stream
  Llama2 Chat 7b Quantized - Asynchronous Multi-Stream
  Llama2 Chat 7b Quantized - Synchronous Single-Stream
  CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream
  CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream
  CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream
  CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Stream
  NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream
  NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream
  CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream
  CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream
  BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream
  BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Stream
  NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream
  NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream
Llama.cpp
OpenVINO:
  Face Detection FP16 - CPU
  Person Detection FP16 - CPU
  Person Detection FP32 - CPU
  Vehicle Detection FP16 - CPU
  Face Detection FP16-INT8 - CPU
  Face Detection Retail FP16 - CPU
  Road Segmentation ADAS FP16 - CPU
  Vehicle Detection FP16-INT8 - CPU
  Weld Porosity Detection FP16 - CPU
  Face Detection Retail FP16-INT8 - CPU
  Road Segmentation ADAS FP16-INT8 - CPU
  Machine Translation EN To DE FP16 - CPU
  Weld Porosity Detection FP16-INT8 - CPU
  Person Vehicle Bike Detection FP16 - CPU
  Noise Suppression Poconet-Like FP16 - CPU
  Handwritten English Recognition FP16 - CPU
  Person Re-Identification Retail FP16 - CPU
  Age Gender Recognition Retail 0013 FP16 - CPU
  Handwritten English Recognition FP16-INT8 - CPU
  Age Gender Recognition Retail 0013 FP16-INT8 - CPU
OpenCV:
  Core
  Video
  Stitching
  Features 2D
  Image Processing
  Object Detection
  DNN - Deep Neural Network
oneDNN:
  IP Shapes 1D - CPU
  IP Shapes 3D - CPU
  Convolution Batch Shapes Auto - CPU
  Deconvolution Batch shapes_1d - CPU
  Deconvolution Batch shapes_3d - CPU
  Recurrent Neural Network Training - CPU
  Recurrent Neural Network Inference - CPU
Neural Magic DeepSparse:
  NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream
  NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream
  NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream
  NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Stream
  ResNet-50, Baseline - Asynchronous Multi-Stream
  ResNet-50, Baseline - Synchronous Single-Stream
  ResNet-50, Sparse INT8 - Asynchronous Multi-Stream
  ResNet-50, Sparse INT8 - Synchronous Single-Stream
  Llama2 Chat 7b Quantized - Asynchronous Multi-Stream
  Llama2 Chat 7b Quantized - Synchronous Single-Stream
  CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream
  CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream
  CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream
  CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Stream
  NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream
  NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream
  CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream
  CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream
  BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream
  BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Stream
  NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream
  NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream
Whisper.cpp:
  ggml-base.en - 2016 State of the Union
  ggml-small.en - 2016 State of the Union
  ggml-medium.en - 2016 State of the Union
Mlpack Benchmark:
  scikit_ica
  scikit_qda
  scikit_svm
  scikit_linearridgeregression