Unpublished-Xelera

Assets for the Podcast

Label: S1E3
Working Title: Ultra-Low Latency Inference at the Network Edge
Audio Stream & Cover Art:
This is the first complete version. Scott still needs to finish the description and TOC, but that will be done shortly.

Description: We still need a description, but it should mention the recently completed STAC-ML™ Markets (Inference) benchmark audit on a stack that includes a STAC-ML™ Pack for Xelera Silva with AMD Alveo™ V80 on an HPE Proliant DL385 Gen10 Plus v2 server. Highlights from this report:

For the small (GBT_A) and medium (GBT_B) models, 99th percentile latencies were <= 1.95µs for all Numbers of Model Instances (NMI) tested, with worst-case instance throughput > 560K inferences per second at the highest NMIs tested
For the large (GBT_C) model, the 99th percentile latency was 2.88µs, with worst-case instance throughput of 379K inferences per second
The maximum latency was <= 12.3µs across all models and NMI tested

Table of Contents

00:00 Hello & Welcome
00:09 Ron Background
01:04 Felix Background
01:40 Andrea Background
02:30 Xelera Origin Story
06:00 FPGAs are very sticky once you start working with them
07:33 What is Xelera, what do they do?
07:55 It’s not about FPGAs
08:30 It’s network acceleration for the data center
08:40 We are seeing a massive buildout in data center capacity
09:05 Number one KPI (Key Performance Indicator) is compute capacity
10:31 All that processing is starting to reach its limits
10:50 In cybersecurity and networking, we see this lack of computing becoming painful
11:10 The thing every infrastructure buyer or architect needs to consider
11:24 What are the products that Xelera offers, and who do they address?
11:45 First is a classic DPU softNIC
12:05 Strong footprint in Cyber Security vertical
12:26 Second product is AI Acceleration
12:48 We bring Andrea back in to talk about his STAC Research Event Talk
13:40 When you attend STAC events, you’re in a room with really deeply technical people
14:00 All these people are here to solve one problem: optimize their execution stack
14:57 Tail latency spikes are discussed, and the impact on trading
15:17 Ron drills into this tail latency issue a bit more
16:12 The cost of latency varies from firm to firm
16:45 Too slow, and you are a price taker, not a price maker
17:13 It’s important to win more than 50.001% of the time
17:24 What we’re selling today is the ability to trade both faster and smarter
20:11 Edited through till this point
…. Gap editing is complete through the end; the next phase for this section should be fast
… Expect to complete on Wednesday.
43:47 — Need to reload the Goodbye.

SmartNICs Today

Unpublished-Xelera

Assets for the Podcast

LATEST PODCASTS

Recent posts

My Experience with AI Inference Using PCIe-Based GPUs

Ethernet Positioning for Ultra Ethernet and Ultra Accelerated Link in the AI Landscape

Nvidia BF4 vs Xsight Labs E1

SUPPORTING Quote of the month

About

Topics

Follow Us