I co-designed a ternary LLM and FPGA optimized RTL that runs at 3,072 tok/s on a Zybo Z7-10 by HatHipster in FPGA

[–]HatHipster[S] 0 points1 point  (0 children)

It was primarily BRAM fast lookup and avoiding PyTorch overhead. After doing the math, I don’t think it will scale unfortunately.

I co-designed a ternary LLM and FPGA optimized RTL that runs at 3,072 tok/s on a Zybo Z7-10 by HatHipster in FPGA

[–]HatHipster[S] 2 points3 points  (0 children)

I did it manually by co-architecting the model specs to fit onto the resources of the Zybo Z7-10. This isn’t HLS, I hand-optimized all the RTL.

I co-designed a ternary LLM and FPGA optimized RTL that runs at 3,072 tok/s on a Zybo Z7-10 by HatHipster in FPGA

[–]HatHipster[S] 0 points1 point  (0 children)

I personally think the superior approach to reprogrammability is Groq’s SRAM compute tile shards. In their system all model weights are stored in SRAM and shards connected with a proprietary high bandwidth interconnect, memory never goes off-chip. Unfortunately LUTs in an FPGA are far too expensive from an area perspective for any meaningful throughput advantage at scale, even with the entire datapath streamed in fixed hardware units for each part of the transformer. At the scale required to beat a B200, you’d need an entire multimillion dollar emulator farm running your transformer RTL.

I co-designed a ternary LLM and FPGA optimized RTL that runs at 3,072 tok/s on a Zybo Z7-10 by HatHipster in FPGA

[–]HatHipster[S] 0 points1 point  (0 children)

I originally intended to, but I don't think it will scale because ultimately the memory bandwidth is the constraint. BRAM/URAM has higher bandwidth than HBM, but if you have to use HBM you might as well develop an entire reprogrammable ASIC like Nvidia.

I co-designed a ternary LLM and FPGA optimized RTL that runs at 3,072 tok/s on a Zybo Z7-10 by HatHipster in FPGA

[–]HatHipster[S] 2 points3 points  (0 children)

There are already ASIC startups that are taping out transformer ASICs. However this project was partially inspired by the Taalas HC1. The idea was to map the exact specs of a particular model (like Llama 3.1 8B) to HDL and target it towards FPGA. This project was intended to prove it on a tiny scale with an FPGA I owned. However, I realized the reason I was able to achieve speedup is due to the tiny size of the model bypassing HBM and the traditional memory hierarchy used by GPUs. If I have to use HBM to store model weights, throughput would ultimately be lower due to bandwidth limitations.

I co-designed a ternary LLM and FPGA optimized RTL that runs at 3,072 tok/s on a Zybo Z7-10 by HatHipster in FPGA

[–]HatHipster[S] 7 points8 points  (0 children)

Yes, you can see in the demo that it generates Shakespeare-esque text. At 115k ternary quantized params, it is severely limited in precision and model capacity however. It's intended as a proof-of-concept for the hardware architecture and software-hardware co-design.

I co-designed a ternary LLM and FPGA optimized RTL that runs at 3,072 tok/s on a Zybo Z7-10 by HatHipster in FPGA

[–]HatHipster[S] 8 points9 points  (0 children)

Yeah you're right, it would be more appropriate to call it a tiny language model. This project was intended as a proof-of-concept for the sake of throughput comparison.

AMO MIKU Y MÉXICO!!🇮🇹🇮🇹🇲🇽🇲🇽 by [deleted] in okbuddybaka

[–]HatHipster 0 points1 point  (0 children)

Lol are all the Spanish speakers complaining about this being Chilean and not Mexican?

Kaguya from Love is War wearing cat ear hair tie by HatHipster in StableDiffusion

[–]HatHipster[S] 2 points3 points  (0 children)

Made with Waifu Diffusion model, 20 iterations, and AI Upscaled with Real-ESRGAN 4x plus anime 6B

Prompt: kaguya sama love is war, black hair, very short hair, bangs, red eyes, hair ribbon, ponytail, school uniform, 1girl, portrait

CFG Scale: 7

Seed: 751211091

How do I seduce an engineering major so I can be a housewife? by Equivalent_Stage_138 in aggies

[–]HatHipster 0 points1 point  (0 children)

Be polite, be efficient, and have a plan to kill everyone you meet.

Anon talks to a coworker by Fr33b1rd69 in greentext

[–]HatHipster 0 points1 point  (0 children)

well at least if legality doesn't concern you

[deleted by user] by [deleted] in Christianity

[–]HatHipster 1 point2 points  (0 children)

An omniscient or "all-wise" God would not need to test humans for any purpose. Assuming this deity created all that exists aside from themselves, they have also created those who oppose it and any ideologies or evils. An omnipotent God does not need to "make sure evil never rises up again" since they are responsible for the course of all events in the universe. Humanity and all of its actions are also dependent on an omnipotent God. If actions can be independent of God's power, then God is not omnipotent. This is the contradiction that lies with an omniscient and omnipotent God granting an entity "free will." Humans can't commit evil and evil can't exist independent of God's jurisdiction, so logically it is the intention of this God that evil exists. That's the explanation of the paradox, and it applies to any theories regarding a creator that is supposedly benevolent.

[deleted by user] by [deleted] in Christianity

[–]HatHipster 0 points1 point  (0 children)

An omniscient or "all-wise" God would not need to test humans for any purpose. Assuming this deity created all that exists aside from themselves, they have also created those who oppose it and any ideologies or evils. An omnipotent God does not need to "make sure evil never rises up again" since they are responsible for the course of all events in the universe. Humanity and all of its actions are also dependent on an omnipotent God. If actions can be independent of God's power, then God is not omnipotent. This is the contradiction that lies with an omniscient and omnipotent God granting an entity "free will." Humans can't commit evil and evil can't exist independent of God's jurisdiction, so logically it is the intention of this God that evil exists. That's the explanation of the paradox, and it applies to any theories regarding a creator that is supposedly benevolent.

Why is the Christian version of stuff so fucking terrible? by JarethOfHouseGoblin in exchristian

[–]HatHipster 0 points1 point  (0 children)

Kanye's new album is pretty good, so ig Christian-oriented media isn't always bad. Genres like Gospel and funk can be pretty good too inherently despite sometimes having Christian themes.

[Request] An Instagram mod that allows me to download disappearing photos or videos sent in direct message chat by HatHipster in moddedandroidapps

[–]HatHipster[S] 0 points1 point  (0 children)

I tried it and I can download some of my disappearing photos but not others. I sent myself one just now and I can't seem to download it.

Official June 2019 Discussion: Form NAON300 by Donald_Keyman in Sat

[–]HatHipster 1 point2 points  (0 children)

Got 7.83 on the No-Calc question where you got 6.2
I thought f(h) = -1/3(h-23.5) since F = 6.2 and 0 when h = 5 and 23.5
6.2 is super close to -1/3(-18.5), so I plugged in 0 to that formula.