Assets for the Podcast

Label: S1E3
Working Title: Ultra-Low Latency Inference at the Network Edge
Audio Stream & Cover Art:
This is the first complete version. Scott still needs to finish the description and TOC, but that will be done shortly.

Description: We still need a description, but it should mention the recently completed STAC-ML™ Markets (Inference) benchmark audit on a stack that includes a STAC-ML™ Pack for Xelera Silva with AMD Alveo™ V80 on an HPE Proliant DL385 Gen10 Plus v2 server. Highlights from this report:

  • For the small (GBT_A) and medium (GBT_B) models, 99th percentile latencies were <= 1.95µs for all Numbers of Model Instances (NMI) tested, with worst-case instance throughput > 560K inferences per second at the highest NMIs tested 
  • For the large (GBT_C) model, the 99th percentile latency was 2.88µs, with worst-case instance throughput of 379K inferences per second  
  • The maximum latency was <= 12.3µs across all models and NMI tested
      

Table of Contents

  • 00:00 Hello & Welcome
  • 00:09 Ron Background
  • 01:04 Felix Background
  • 01:40 Andrea Background
  • 02:30 Xelera Origin Story
  • 06:00 FPGAs are very sticky once you start working with them
  • 07:33 What is Xelera, what do they do?
  • 07:55 It’s not about FPGAs
  • 08:30 It’s network acceleration for the data center
  • 08:40 We are seeing a massive buildout in data center capacity
  • 09:05 Number one KPI (Key Performance Indicator) is compute capacity
  • 10:31 All that processing is starting to reach its limits
  • 10:50 In cybersecurity and networking, we see this lack of computing becoming painful
  • 11:10 The thing every infrastructure buyer or architect needs to consider
  • 11:24 What are the products that Xelera offers, and who do they address?
  • 11:45 First is a classic DPU softNIC
  • 12:05 Strong footprint in Cyber Security vertical
  • 12:26 Second product is AI Acceleration
  • 12:48 We bring Andrea back in to talk about his STAC Research Event Talk
  • 13:40 When you attend STAC events, you’re in a room with really deeply technical people
  • 14:00 All these people are here to solve one problem: optimize their execution stack
  • 14:57 Tail latency spikes are discussed, and the impact on trading
  • 15:17 Ron drills into this tail latency issue a bit more
  • 16:12 The cost of latency varies from firm to firm
  • 16:45 Too slow, and you are a price taker, not a price maker
  • 17:13 It’s important to win more than 50.001% of the time
  • 17:24 What we’re selling today is the ability to trade both faster and smarter
  • 20:11 Edited through till this point
  • …. Gap editing is complete through the end; the next phase for this section should be fast
  • … Expect to complete on Wednesday.
  • 43:47 — Need to reload the Goodbye.
LATEST PODCASTS

04.16.26 Alex Stein of Liquid Market Solutions Talks about Network Attached Compute

03.20.2026 S1:E1 DDoS Defense with DYNANIC Team

03.01.26 Trailer: Season Preview & Goals

SUPPORTING Quote of the month

“AMI’s expertise in firmware and infrastructure for ​cloud and AI is a natural ​extension of our portfolio, deepening our role ⁠in system-level security, manageability, and control.”

Lattice CEO Fort Tamer