Assets for the Podcast
Label: S1E3
Working Title: Ultra-Low Latency Inference at the Network Edge
Audio Stream & Cover Art:
This is the first complete version. Scott still needs to finish the description and TOC, but that will be done shortly.
Description: We still need a description, but it should mention the recently completed STAC-ML™ Markets (Inference) benchmark audit on a stack that includes a STAC-ML™ Pack for Xelera Silva with AMD Alveo™ V80 on an HPE Proliant DL385 Gen10 Plus v2 server. Highlights from this report:
- For the small (GBT_A) and medium (GBT_B) models, 99th percentile latencies were <= 1.95µs for all Numbers of Model Instances (NMI) tested, with worst-case instance throughput > 560K inferences per second at the highest NMIs tested
- For the large (GBT_C) model, the 99th percentile latency was 2.88µs, with worst-case instance throughput of 379K inferences per second
- The maximum latency was <= 12.3µs across all models and NMI tested
Table of Contents
- 00:00 Hello & Welcome
- 00:09 Ron Background
- 01:04 Felix Background
- 01:40 Andrea Background
- 02:30 Xelera Origin Story
- 06:00 FPGAs are very sticky once you start working with them
- 07:33 What is Xelera, what do they do?
- 07:55 It’s not about FPGAs
- 08:30 It’s network acceleration for the data center
- 08:40 We are seeing a massive buildout in data center capacity
- 09:05 Number one KPI (Key Performance Indicator) is compute capacity
- 10:31 All that processing is starting to reach its limits
- 10:50 In cybersecurity and networking, we see this lack of computing becoming painful
- 11:10 The thing every infrastructure buyer or architect needs to consider
- 11:24 What are the products that Xelera offers, and who do they address?
- 11:45 First is a classic DPU softNIC
- 12:05 Strong footprint in Cyber Security vertical
- 12:26 Second product is AI Acceleration
- 12:48 We bring Andrea back in to talk about his STAC Research Event Talk
- 13:40 When you attend STAC events, you’re in a room with really deeply technical people
- 14:00 All these people are here to solve one problem: optimize their execution stack
- 14:57 Tail latency spikes are discussed, and the impact on trading
- 15:17 Ron drills into this tail latency issue a bit more
- 16:12 The cost of latency varies from firm to firm
- 16:45 Too slow, and you are a price taker, not a price maker
- 17:13 It’s important to win more than 50.001% of the time
- 17:24 What we’re selling today is the ability to trade both faster and smarter
- 20:11 Edited through till this point
- …. Gap editing is complete through the end; the next phase for this section should be fast
- … Expect to complete on Wednesday.
- 43:47 — Need to reload the Goodbye.
