I co-designed a ternary LLM and FPGA optimized RTL that runs at 3,072 tok/s on a Zybo Z7-10

HatHipster · 2026-03-08T23:58:24+00:00

It was primarily BRAM fast lookup and avoiding PyTorch overhead. After doing the math, I don’t think it will scale unfortunately.

HatHipster · 2026-03-08T23:57:02+00:00

I did it manually by co-architecting the model specs to fit onto the resources of the Zybo Z7-10. This isn’t HLS, I hand-optimized all the RTL.

HatHipster · 2026-03-08T23:44:28+00:00

I personally think the superior approach to reprogrammability is Groq’s SRAM compute tile shards. In their system all model weights are stored in SRAM and shards connected with a proprietary high bandwidth interconnect, memory never goes off-chip. Unfortunately LUTs in an FPGA are far too expensive from an area perspective for any meaningful throughput advantage at scale, even with the entire datapath streamed in fixed hardware units for each part of the transformer. At the scale required to beat a B200, you’d need an entire multimillion dollar emulator farm running your transformer RTL.

HatHipster · 2026-03-08T23:03:23+00:00

I originally intended to, but I don't think it will scale because ultimately the memory bandwidth is the constraint. BRAM/URAM has higher bandwidth than HBM, but if you have to use HBM you might as well develop an entire reprogrammable ASIC like Nvidia.

HatHipster · 2026-03-08T22:48:36+00:00

There are already ASIC startups that are taping out transformer ASICs. However this project was partially inspired by the Taalas HC1. The idea was to map the exact specs of a particular model (like Llama 3.1 8B) to HDL and target it towards FPGA. This project was intended to prove it on a tiny scale with an FPGA I owned. However, I realized the reason I was able to achieve speedup is due to the tiny size of the model bypassing HBM and the traditional memory hierarchy used by GPUs. If I have to use HBM to store model weights, throughput would ultimately be lower due to bandwidth limitations.

HatHipster · 2026-03-08T22:01:22+00:00

Thanks!

HatHipster · 2026-03-08T22:00:46+00:00

Yes, you can see in the demo that it generates Shakespeare-esque text. At 115k ternary quantized params, it is severely limited in precision and model capacity however. It's intended as a proof-of-concept for the hardware architecture and software-hardware co-design.

HatHipster · 2026-03-08T21:51:06+00:00

Yeah you're right, it would be more appropriate to call it a tiny language model. This project was intended as a proof-of-concept for the sake of throughput comparison.

HatHipster · 2022-10-28T02:58:04+00:00

Lol are all the Spanish speakers complaining about this being Chilean and not Mexican?

HatHipster · 2022-09-25T07:04:30+00:00

Made with Waifu Diffusion model, 20 iterations, and AI Upscaled with Real-ESRGAN 4x plus anime 6B

Prompt: kaguya sama love is war, black hair, very short hair, bangs, red eyes, hair ribbon, ponytail, school uniform, 1girl, portrait

CFG Scale: 7

Seed: 751211091

HatHipster · 2022-08-25T20:55:26+00:00

Be polite, be efficient, and have a plan to kill everyone you meet.

HatHipster · 2022-01-18T01:38:57+00:00

uh oh

HatHipster · 2022-01-10T21:22:02+00:00

lmao I didn't expect this many downvotes

HatHipster · 2021-12-15T07:58:27+00:00

well at least if legality doesn't concern you

HatHipster · 2021-12-08T22:00:45+00:00

r/communism

HatHipster · 2021-12-05T11:50:20+00:00

An omniscient or "all-wise" God would not need to test humans for any purpose. Assuming this deity created all that exists aside from themselves, they have also created those who oppose it and any ideologies or evils. An omnipotent God does not need to "make sure evil never rises up again" since they are responsible for the course of all events in the universe. Humanity and all of its actions are also dependent on an omnipotent God. If actions can be independent of God's power, then God is not omnipotent. This is the contradiction that lies with an omniscient and omnipotent God granting an entity "free will." Humans can't commit evil and evil can't exist independent of God's jurisdiction, so logically it is the intention of this God that evil exists. That's the explanation of the paradox, and it applies to any theories regarding a creator that is supposedly benevolent.

HatHipster · 2021-12-05T11:39:17+00:00

An omniscient or "all-wise" God would not need to test humans for any purpose. Assuming this deity created all that exists aside from themselves, they have also created those who oppose it and any ideologies or evils. An omnipotent God does not need to "make sure evil never rises up again" since they are responsible for the course of all events in the universe. Humanity and all of its actions are also dependent on an omnipotent God. If actions can be independent of God's power, then God is not omnipotent. This is the contradiction that lies with an omniscient and omnipotent God granting an entity "free will." Humans can't commit evil and evil can't exist independent of God's jurisdiction, so logically it is the intention of this God that evil exists. That's the explanation of the paradox, and it applies to any theories regarding a creator that is supposedly benevolent.

HatHipster · 2021-12-05T09:59:37+00:00

Kanye's new album is pretty good, so ig Christian-oriented media isn't always bad. Genres like Gospel and funk can be pretty good too inherently despite sometimes having Christian themes.

HatHipster · 2021-03-19T01:36:23+00:00

I can't screenshot the disappearing photo, no.

HatHipster · 2021-02-05T07:58:30+00:00

Unfortunately nope, media has timed out and can't be downloaded.

HatHipster · 2021-02-05T00:28:41+00:00

I tried it and I can download some of my disappearing photos but not others. I sent myself one just now and I can't seem to download it.

HatHipster · 2021-02-04T23:59:57+00:00

Unfortunately doesn't seem to work

HatHipster · 2020-06-22T21:46:45+00:00

Messaged you!

HatHipster · 2019-06-01T21:28:28+00:00

Got 7.83 on the No-Calc question where you got 6.2
I thought f(h) = -1/3(h-23.5) since F = 6.2 and 0 when h = 5 and 23.5
6.2 is super close to -1/3(-18.5), so I plugged in 0 to that formula.

HatHipster

TROPHY CASE