xeon febby

2 x INTEL XEON PLATINUM 8592+ testing with a Quanta Cloud QuantaGrid D54Q-2U S6Q-MB-MPS (3B05.TEL4P1 BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2402196-NE-XEONFEBBY11&sor&grw.

xeon febbyProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelCompilerFile-SystemScreen Resolutionab2 x INTEL XEON PLATINUM 8592+ @ 3.90GHz (128 Cores / 256 Threads)Quanta Cloud QuantaGrid D54Q-2U S6Q-MB-MPS (3B05.TEL4P1 BIOS)Intel Device 1bce1008GB3201GB Micron_7450_MTFDKCB3T2TFSASPEED2 x Intel X710 for 10GBASE-TUbuntu 23.106.6.0-060600-generic (x86_64)GCC 13.2.0ext41024x768OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0x21000161 Python Details- Python 3.11.6Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

xeon febbyquicksilver: CORAL2 P2tensorflow: CPU - 1 - VGG-16tensorflow: CPU - 1 - AlexNettensorflow: CPU - 1 - GoogLeNettensorflow: CPU - 1 - ResNet-50onnx: GPT-2 - CPU - Standardy-cruncher: 500My-cruncher: 1Bonnx: yolov4 - CPU - Standardonnx: T5 Encoder - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardquicksilver: CORAL2 P1onnx: ArcFace ResNet-100 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardpytorch: CPU - 1 - ResNet-50pytorch: CPU - 1 - ResNet-152pytorch: CPU - 1 - Efficientnet_v2_lllama-cpp: llama-2-7b.Q4_0.ggufllama-cpp: llama-2-13b.Q4_0.ggufllama-cpp: llama-2-70b-chat.Q5_0.ggufllamafile: llava-v1.5-7b-q4 - CPUllamafile: mistral-7b-instruct-v0.2.Q8_0 - CPUllamafile: wizardcoder-python-34b-v1.0.Q6_K - CPUgromacs: MPI CPU - water_GMX50_barequicksilver: CTS2namd: ATPase with 327,506 Atomsnamd: STMV with 1,066,628 Atomsdav1d: Chimera 1080pdav1d: Summer Nature 4Kdav1d: Summer Nature 1080pdav1d: Chimera 1080p 10-bitoidn: RT.hdr_alb_nrm.3840x2160 - CPU-Onlyoidn: RT.ldr_alb_nrm.3840x2160 - CPU-Onlyoidn: RTLightmap.hdr.4096x4096 - CPU-Onlyspeedb: Rand Readspeedb: Update Randspeedb: Read While Writingspeedb: Read Rand Write Randonnx: GPT-2 - CPU - Standardonnx: yolov4 - CPU - Standardonnx: T5 Encoder - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardab800000012.2539.9818.237.28205.5632.7235.10715.3826465.64715.9646786.2629.90267862500035.9174170.023256.99638.330251.6619.080.420.690.550.430.538.573.7417.91895560005.983081.74622204.3868.4387.78235.425.105.142.466132577451570601694373915204364.8615165.00652.1469562.63611.27117100.9827.84055.880913.8905326.0875841800011.8937.717.627.01208.9932.7615.24917.1686360.77922.6587789.3129.4317882000038.5606170.235247.66537.883251.0019.210.580.450.340.538.813.8618.39893540004.020291.81638202.8368.3686.79239.015.165.002.474905331531564581818711015143314.7811358.24382.7709144.13111.26631106.02325.93195.873694.0371226.395OpenBenchmarking.org

Quicksilver

Input: CORAL2 P2

OpenBenchmarking.orgFigure Of Merit, More Is BetterQuicksilver 20230818Input: CORAL2 P2ba2M4M6M8M10M841800080000001. (CXX) g++ options: -fopenmp -O3 -march=native

TensorFlow

Device: CPU - Batch Size: 1 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 1 - Model: VGG-16ab369121512.2511.89

TensorFlow

Device: CPU - Batch Size: 1 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 1 - Model: AlexNetab91827364539.9837.70

TensorFlow

Device: CPU - Batch Size: 1 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 1 - Model: GoogLeNetab4812162018.2317.62

TensorFlow

Device: CPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 1 - Model: ResNet-50ab2468107.287.01

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Standardba50100150200250208.99205.561. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Y-Cruncher

Pi Digits To Calculate: 500M

OpenBenchmarking.orgSeconds, Fewer Is BetterY-Cruncher 0.8.3Pi Digits To Calculate: 500Mab0.62121.24241.86362.48483.1062.7232.761

Y-Cruncher

Pi Digits To Calculate: 1B

OpenBenchmarking.orgSeconds, Fewer Is BetterY-Cruncher 0.8.3Pi Digits To Calculate: 1Bab1.1812.3623.5434.7245.9055.1075.249

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Standardba4812162017.1715.381. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Standardab100200300400500465.65360.781. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Standardba51015202522.6615.961. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Standardba2004006008001000789.31786.261. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Standardab36912159.902679.431701. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Quicksilver

Input: CORAL2 P1

OpenBenchmarking.orgFigure Of Merit, More Is BetterQuicksilver 20230818Input: CORAL2 P1ba2M4M6M8M10M882000086250001. (CXX) g++ options: -fopenmp -O3 -march=native

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Standardba91827364538.5635.921. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standardba4080120160200170.24170.021. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Standardab60120180240300257.00247.671. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standardab91827364538.3337.881. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

PyTorch

Device: CPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-50ab122436486051.6651.00MIN: 21.28 / MAX: 52.92MIN: 24.67 / MAX: 53.71

PyTorch

Device: CPU - Batch Size: 1 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-152ba51015202519.2119.08MIN: 6.66 / MAX: 20.29MIN: 2.63 / MAX: 20.35

PyTorch

Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_la0.09450.1890.28350.3780.47250.42MIN: 0.23 / MAX: 1.28

Llama.cpp

Model: llama-2-7b.Q4_0.gguf

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b1808Model: llama-2-7b.Q4_0.ggufab0.15530.31060.46590.62120.77650.690.581. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas

Llama.cpp

Model: llama-2-13b.Q4_0.gguf

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b1808Model: llama-2-13b.Q4_0.ggufab0.12380.24760.37140.49520.6190.550.451. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas

Llama.cpp

Model: llama-2-70b-chat.Q5_0.gguf

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b1808Model: llama-2-70b-chat.Q5_0.ggufab0.09680.19360.29040.38720.4840.430.341. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas

Llamafile

Test: llava-v1.5-7b-q4 - Acceleration: CPU

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: llava-v1.5-7b-q4 - Acceleration: CPUba0.11930.23860.35790.47720.59650.530.53

Llamafile

Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPUba2468108.818.57

Llamafile

Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPUba0.86851.7372.60553.4744.34253.863.74

GROMACS

Implementation: MPI CPU - Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2024Implementation: MPI CPU - Input: water_GMX50_bareba51015202518.4017.921. (CXX) g++ options: -O3 -lm

Quicksilver

Input: CTS2

OpenBenchmarking.orgFigure Of Merit, More Is BetterQuicksilver 20230818Input: CTS2ab2M4M6M8M10M955600093540001. (CXX) g++ options: -fopenmp -O3 -march=native

NAMD

Input: ATPase with 327,506 Atoms

OpenBenchmarking.orgns/day, More Is BetterNAMD 3.0b6Input: ATPase with 327,506 Atomsab1.34622.69244.03865.38486.7315.983084.02029

NAMD

Input: STMV with 1,066,628 Atoms

OpenBenchmarking.orgns/day, More Is BetterNAMD 3.0b6Input: STMV with 1,066,628 Atomsba0.40870.81741.22611.63482.04351.816381.74622

dav1d

Video Input: Chimera 1080p

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.4Video Input: Chimera 1080pab4080120160200204.38202.831. (CC) gcc options: -pthread

dav1d

Video Input: Summer Nature 4K

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.4Video Input: Summer Nature 4Kab153045607568.4368.361. (CC) gcc options: -pthread

dav1d

Video Input: Summer Nature 1080p

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.4Video Input: Summer Nature 1080pab2040608010087.7886.791. (CC) gcc options: -pthread

dav1d

Video Input: Chimera 1080p 10-bit

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.4Video Input: Chimera 1080p 10-bitba50100150200250239.01235.421. (CC) gcc options: -pthread

Intel Open Image Denoise

Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.2Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Onlyba1.1612.3223.4834.6445.8055.165.10

Intel Open Image Denoise

Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.2Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Onlyab1.15652.3133.46954.6265.78255.145.00

Intel Open Image Denoise

Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.2Run: RTLightmap.hdr.4096x4096 - Device: CPU-Onlyba0.55581.11161.66742.22322.7792.472.46

Speedb

Test: Random Read

OpenBenchmarking.orgOp/s, More Is BetterSpeedb 2.7Test: Random Readab130M260M390M520M650M6132577454905331531. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread

Speedb

Test: Update Random

OpenBenchmarking.orgOp/s, More Is BetterSpeedb 2.7Test: Update Randomab30K60K90K120K150K1570601564581. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread

Speedb

Test: Read While Writing

OpenBenchmarking.orgOp/s, More Is BetterSpeedb 2.7Test: Read While Writingba4M8M12M16M20M18187110169437391. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread

Speedb

Test: Read Random Write Random

OpenBenchmarking.orgOp/s, More Is BetterSpeedb 2.7Test: Read Random Write Randomab300K600K900K1200K1500K152043615143311. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Standardba1.09382.18763.28144.37525.4694.781134.861511. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Standardba153045607558.2465.011. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Standardab0.62351.2471.87052.4943.11752.146952.770911. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Standardba142842567044.1362.641. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Standardba0.2860.5720.8581.1441.431.266311.271171. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Standardab20406080100100.98106.021. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Standardba71421283525.9327.841. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standardba1.32322.64643.96965.29286.6165.873695.880911. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Standardab0.90841.81682.72523.63364.5423.890534.037121. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standardab61218243026.0926.401. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt


Phoronix Test Suite v10.8.5