epyc 9654 AMD March

2 x AMD EPYC 9654 96-Core testing with a AMD Titanite_4G (RTI1004D BIOS) and ASPEED on Ubuntu 23.04 via the Phoronix Test Suite.

a

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-l0Aoyl/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-l0Aoyl/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa101111
Python Notes: Python 3.10.9
Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

b

c

Processor: AMD EPYC 9654 96-Core @ 3.71GHz (96 Cores / 192 Threads), Motherboard: AMD Titanite_4G (RTI1004D BIOS), Chipset: AMD Device 14a4, Memory: 768GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe

OS: Ubuntu 23.04, Kernel: 5.19.0-21-generic (x86_64), Desktop: GNOME Shell 43.1, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 12.2.0, File-System: ext4, Screen Resolution: 1920x1080

d

e

Changed Processor to 2 x AMD EPYC 9654 96-Core @ 3.71GHz (192 Cores / 384 Threads).

Changed Memory to 1520GB.

Google Draco

Draco is a library developed by Google for compressing/decompressing 3D geometric meshes and point clouds. This test profile uses some Artec3D PLY models as the sample 3D model input formats for Draco compression/decompression. Learn more via the OpenBenchmarking.org test page.

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

Google Draco

SPECFEM3D

Darmstadt Automotive Parallel Heterogeneous Suite

DAPHNE is the Darmstadt Automotive Parallel HeterogeNEous Benchmark Suite with OpenCL / CUDA / OpenMP test cases for these automotive benchmarks for evaluating programming models in context to vehicle autonomous driving capabilities. Learn more via the OpenBenchmarking.org test page.

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

Neural Magic DeepSparse

This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Model Zoo. Learn more via the OpenBenchmarking.org test page.

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

SPECFEM3D

ONNX Runtime

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

OpenCV

This is a benchmark of the OpenCV (Computer Vision) library's built-in performance tests. Learn more via the OpenBenchmarking.org test page.

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

Timed FFmpeg Compilation

This test times how long it takes to build the FFmpeg multimedia library. Learn more via the OpenBenchmarking.org test page.

John The Ripper

This is a benchmark of John The Ripper, which is a password cracker. Learn more via the OpenBenchmarking.org test page.

Timed LLVM Compilation

This test times how long it takes to compile/build the LLVM compiler stack. Learn more via the OpenBenchmarking.org test page.

Zstd Compression

This test measures the time needed to compress/decompress a sample file (silesia.tar) using Zstd (Zstandard) compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.

dav1d

Dav1d is an open-source, speedy AV1 video decoder. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.

Video Input: Chimera 1080p

d: The test quit with a non-zero exit status.

e: The test quit with a non-zero exit status.

Video Input: Summer Nature 4K

d: The test quit with a non-zero exit status.

e: The test quit with a non-zero exit status.

Video Input: Summer Nature 1080p

d: The test quit with a non-zero exit status.

e: The test quit with a non-zero exit status.

Video Input: Chimera 1080p 10-bit

d: The test quit with a non-zero exit status.

e: The test quit with a non-zero exit status.

FFmpeg

This is a benchmark of the FFmpeg multimedia framework. The FFmpeg test profile is making use of a modified version of vbench from Columbia University's Architecture and Design Lab (ARCADE) [http://arcade.cs.columbia.edu/vbench/] that is a benchmark for video-as-a-service workloads. The test profile offers the options of a range of vbench scenarios based on freely distributable video content and offers the options of using the x264 or x265 video encoders for transcoding. Learn more via the OpenBenchmarking.org test page.

Timed Godot Game Engine Compilation

This test times how long it takes to compile the Godot Game Engine. Godot is a popular, open-source, cross-platform 2D/3D game engine and is built using the SCons build system and targeting the X11 platform. Learn more via the OpenBenchmarking.org test page.

Embree

Build2

This test profile measures the time to bootstrap/install the build2 C++ build toolchain from source. Build2 is a cross-platform build toolchain for C/C++ code and features Cargo-like features. Learn more via the OpenBenchmarking.org test page.

nginx

This is a benchmark of the lightweight Nginx HTTP(S) web-server. This Nginx web server benchmark test profile makes use of the wrk program for facilitating the HTTP requests over a fixed period time with a configurable number of concurrent clients/connections. HTTPS with a self-signed OpenSSL certificate is used by this test for local benchmarking. Learn more via the OpenBenchmarking.org test page.

Connections: 100

a: The test quit with a non-zero exit status.

c: The test quit with a non-zero exit status.

b: The test quit with a non-zero exit status.

d: The test quit with a non-zero exit status.

e: The test quit with a non-zero exit status.

Connections: 200

d: The test quit with a non-zero exit status.

e: The test quit with a non-zero exit status.

Connections: 1000

a: The test quit with a non-zero exit status.

c: The test quit with a non-zero exit status.

b: The test quit with a non-zero exit status.

d: The test quit with a non-zero exit status.

e: The test quit with a non-zero exit status.

Apache HTTP Server

This is a test of the Apache HTTPD web server. This Apache HTTPD web server benchmark test profile makes use of the wrk program for facilitating the HTTP requests over a fixed period time with a configurable number of concurrent clients. Learn more via the OpenBenchmarking.org test page.

Concurrent Requests: 100

a: The test quit with a non-zero exit status.

c: The test quit with a non-zero exit status.

b: The test quit with a non-zero exit status.

d: The test quit with a non-zero exit status.

e: The test quit with a non-zero exit status.

Concurrent Requests: 200

d: The test quit with a non-zero exit status.

e: The test quit with a non-zero exit status.

Concurrent Requests: 1000

a: The test quit with a non-zero exit status.

c: The test quit with a non-zero exit status.

b: The test quit with a non-zero exit status.

d: The test quit with a non-zero exit status.

e: The test quit with a non-zero exit status.

OpenSSL

OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.

ClickHouse

ClickHouse is an open-source, high performance OLAP data management system. This test profile uses ClickHouse's standard benchmark recommendations per https://clickhouse.com/docs/en/operations/performance-test/ / https://github.com/ClickHouse/ClickBench/tree/main/clickhouse with the 100 million rows web analytics dataset. The reported value is the query processing time using the geometric mean of all separate queries performed as an aggregate. Learn more via the OpenBenchmarking.org test page.

Memcached

Memcached is a high performance, distributed memory object caching system. This Memcached test profiles makes use of memtier_benchmark for excuting this CPU/memory-focused server benchmark. Learn more via the OpenBenchmarking.org test page.

RocksDB

This is a benchmark of Meta/Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.

PostgreSQL

This is a benchmark of PostgreSQL using the integrated pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.

MariaDB

This is a MariaDB MySQL database server benchmark making use of mysqlslap. Learn more via the OpenBenchmarking.org test page.

SPECFEM3D

Timed Node.js Compilation

This test profile times how long it takes to build/compile Node.js itself from source. Node.js is a JavaScript run-time built from the Chrome V8 JavaScript engine while itself is written in C/C++. Learn more via the OpenBenchmarking.org test page.

181 Results Shown

Google Draco
SPECFEM3D
Google Draco
SPECFEM3D:
Homogeneous Halfspace
Tomographic Model
Darmstadt Automotive Parallel Heterogeneous Suite:
OpenMP - NDT Mapping
OpenMP - Points2Image
OpenMP - Euclidean Cluster
TensorFlow:
CPU - 16 - AlexNet
CPU - 32 - AlexNet
CPU - 64 - AlexNet
CPU - 256 - AlexNet
CPU - 512 - AlexNet
CPU - 16 - GoogLeNet
CPU - 16 - ResNet-50
CPU - 32 - GoogLeNet
CPU - 32 - ResNet-50
CPU - 64 - GoogLeNet
CPU - 64 - ResNet-50
CPU - 256 - GoogLeNet
CPU - 256 - ResNet-50
CPU - 512 - GoogLeNet
CPU - 512 - ResNet-50
Neural Magic DeepSparse:
NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream:
items/sec
ms/batch
NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream:
items/sec
ms/batch
NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream:
items/sec
ms/batch
NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Synchronous Single-Stream:
items/sec
ms/batch
NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream:
items/sec
ms/batch
NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Synchronous Single-Stream:
items/sec
ms/batch
CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream:
items/sec
ms/batch
CV Detection, YOLOv5s COCO - Synchronous Single-Stream:
items/sec
ms/batch
CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream:
items/sec
ms/batch
CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream:
items/sec
ms/batch
NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream:
items/sec
ms/batch
NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream:
items/sec
ms/batch
CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream:
items/sec
ms/batch
CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream:
items/sec
ms/batch
NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream:
items/sec
ms/batch
NLP Text Classification, BERT base uncased SST2 - Synchronous Single-Stream:
items/sec
ms/batch
NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream:
items/sec
ms/batch
NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream:
items/sec
ms/batch
ONNX Runtime:
GPT-2 - CPU - Parallel
GPT-2 - CPU - Standard
yolov4 - CPU - Parallel
yolov4 - CPU - Standard
bertsquad-12 - CPU - Parallel
bertsquad-12 - CPU - Standard
CaffeNet 12-int8 - CPU - Parallel
CaffeNet 12-int8 - CPU - Standard
fcn-resnet101-11 - CPU - Parallel
fcn-resnet101-11 - CPU - Standard
ArcFace ResNet-100 - CPU - Parallel
ArcFace ResNet-100 - CPU - Standard
SPECFEM3D
ONNX Runtime:
ResNet50 v1-12-int8 - CPU - Parallel
ResNet50 v1-12-int8 - CPU - Standard
super-resolution-10 - CPU - Parallel
super-resolution-10 - CPU - Standard
Faster R-CNN R-50-FPN-int8 - CPU - Parallel
Faster R-CNN R-50-FPN-int8 - CPU - Standard
OpenCV:
Core
Video
Graph API
Stitching
Features 2D
Image Processing
Object Detection
DNN - Deep Neural Network
GROMACS
Timed FFmpeg Compilation
John The Ripper:
bcrypt
WPA PSK
Blowfish
HMAC-SHA512
MD5
Timed LLVM Compilation:
Ninja
Unix Makefiles
Zstd Compression:
8 - Compression Speed
8 - Decompression Speed
12 - Compression Speed
12 - Decompression Speed
19 - Compression Speed
19 - Decompression Speed
8, Long Mode - Compression Speed
8, Long Mode - Decompression Speed
19, Long Mode - Compression Speed
19, Long Mode - Decompression Speed
dav1d:
Chimera 1080p
Summer Nature 4K
Summer Nature 1080p
Chimera 1080p 10-bit
FFmpeg:
libx264 - Live:
Seconds
FPS
libx265 - Live:
Seconds
FPS
libx264 - Upload:
Seconds
FPS
libx265 - Upload:
Seconds
FPS
libx264 - Platform:
Seconds
FPS
libx265 - Platform:
Seconds
FPS
libx264 - Video On Demand:
Seconds
FPS
libx265 - Video On Demand:
Seconds
FPS
Timed Godot Game Engine Compilation
Embree:
Pathtracer - Crown
Pathtracer ISPC - Crown
Pathtracer - Asian Dragon
Pathtracer - Asian Dragon Obj
Pathtracer ISPC - Asian Dragon
Pathtracer ISPC - Asian Dragon Obj
Build2
nginx:
200
500
Apache HTTP Server:
200
500
OpenSSL:
SHA256
SHA512
RSA4096
RSA4096
ChaCha20
AES-128-GCM
AES-256-GCM
ChaCha20-Poly1305
ClickHouse:
100M Rows Hits Dataset, First Run / Cold Cache
100M Rows Hits Dataset, Second Run
100M Rows Hits Dataset, Third Run
Memcached:
1:5
1:10
1:100
RocksDB:
Rand Fill
Rand Read
Update Rand
Seq Fill
Rand Fill Sync
Read While Writing
Read Rand Write Rand
PostgreSQL:
1 - 800 - Read Only
1 - 800 - Read Only - Average Latency
1 - 1000 - Read Only
1 - 1000 - Read Only - Average Latency
1 - 800 - Read Write
1 - 800 - Read Write - Average Latency
1 - 1000 - Read Write
1 - 1000 - Read Write - Average Latency
100 - 800 - Read Only
100 - 800 - Read Only - Average Latency
100 - 1000 - Read Only
100 - 1000 - Read Only - Average Latency
100 - 800 - Read Write
100 - 800 - Read Write - Average Latency
100 - 1000 - Read Write
100 - 1000 - Read Write - Average Latency
MariaDB:
512
1024
2048
4096
8192
SPECFEM3D
Timed Node.js Compilation

a

Testing initiated at 28 March 2023 14:53 by user phoronix.

b

Testing initiated at 28 March 2023 18:31 by user phoronix.

c

Testing initiated at 28 March 2023 21:17 by user phoronix.

d

Testing initiated at 29 March 2023 06:43 by user phoronix.

e

Processor: 2 x AMD EPYC 9654 96-Core @ 3.71GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1004D BIOS), Chipset: AMD Device 14a4, Memory: 1520GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe

Testing initiated at 29 March 2023 10:47 by user phoronix.

epyc 9654 AMD March

View

Statistics

Graph Settings

Additional Graphs

Multi-Way Comparison

Table

Run Management

a

b

c

d

e

Google Draco

SPECFEM3D

Google Draco

SPECFEM3D

Darmstadt Automotive Parallel Heterogeneous Suite

TensorFlow

Neural Magic DeepSparse

ONNX Runtime

SPECFEM3D

ONNX Runtime

OpenCV

GROMACS

Timed FFmpeg Compilation

John The Ripper

Timed LLVM Compilation

Zstd Compression

dav1d

FFmpeg

Timed Godot Game Engine Compilation

Embree

Build2

nginx

Apache HTTP Server

OpenSSL

ClickHouse

Memcached

RocksDB

PostgreSQL

MariaDB

SPECFEM3D

Timed Node.js Compilation

181 Results Shown

a

b

c

d

e