d-Matrix says things can only get faster with IO accelerator

d-Matrix has unwrapped the next building block in its AI acceleration strategy, in the shape of an IO accelerator it says will deliver ultra low latency for AI inference.

The firm used the AI Infra Summit to unwrap its d-Matrix JetStream, a custom PCI IO card, which d-Matrix says will deliver 400Gbps bandwidth and 2 microsecond latency and can be scaled up within servers and out across multiple nodes.

The part is designed to be used with the Corsair Inference Acceleration platform it announced late last year, which d-Matrix said could support 60,000 tokens per second at 1ms/token for Llama3 8B.

CEO and co-founder Sid Sheth said the business was founded in 2019 specifically to target the inference problem – even as the tech and investment worlds were enraptured by the training and development of ever bigger LLMs.

Now, he said, “In 2025, specifically post Deepseek, the narrative has really shifted to not just inference, but commercially viable inference. How does one get a return on investment, on all those massive CapEx investments that have happened over the last 10 years.”

This boiled down to two key problems, he continued, the first of which was “the memory and compute bottleneck” which the company had tried to address with the Corsair platform. It followed that up last month with its 3DIMC technology to stack LPDDR5, together with modified SRAM, which will find its way into its upcoming Raptor platfrom.

Now, he said, with the shift to inference, it was clear that users were demanding “extremely fast interaction with an application.” But, running models purely out of extremely fast memory was a challenge, he said. “We are capacity limited in that fast memory in a single server, right?”

This meant the IO bottleneck had to be addressed. “So now that we’ve addressed the memory and compute bottleneck, how do we take a single node solution… and scale it out so that we can get access to more of that extremely fast memory and also solve the IO bottleneck.”

Sheth said it had looked at products from companies like Nvidia and Broadcom, “And the conclusion was that there was no product out there that had the kind of latency advantages that we were looking for.”

The FPGA-based three quarter length card will sit along Corsair devices within a server. He showed a slide detailing an architecture with a JetStream sat alongside four Corsairs, the whole quintet in turn hitched to a PCIe switch, which would connect to other nodes.

This would all scale up within a node, he said. “And then through the top of rack switches, we can connect it to spine switches, which is the top most layer of the Ethernet switches. And with that, we would be able to essentially scale out this solution across multiple racks.”

“This is not new silicon. This is an FPGA,” he explained. “So we developed all the IP in partnership with an external partner, and that IP was incorporated into this FPGA with another partner.”

On the compute acceleration side, the Corsair will be followed by the Raptor product unveiled a couple of weeks ago, with future products incorporating the stacked RAM technology.

In parallel, on the IO track, the JetStream technology will be built into chiplets, using SUE/UA Link, and in time incorporating optical IO, “When co-packaged optics are ready.”

VP of product, Sree Ganesan added that it had always intended to scale up Corsair across nodes, and that this was where the potential communication overhead really started to kick in.

“So we have to obviously do something about scaling across multiple nodes, to build out to larger models, but then to not take away from the latency advantage that you get on Corsair.”

She said JetStream was compliant with current standards and customers can just plug and play into existing data centers. “It’s basically using standard Ethernet for that communication, and using the minimal subset of the Ethernet protocols to actually use those switches and just standard top of rack Ethernet switches to connect across multiple racks.”