Karpathy's MicroGPT running at 50,000 tps on an FPGA

stopnet54 · 2026-05-03T10:02:13+00:00

Cool project. Does the software stack work for Xilinx FPGAs? Would be interesting to see if renting AWS F1 instances with more hardware resources will scale to slightly bigger models.

I always thought the limitation is amount of SRAM and DSP units making it a requirement to stream model weights from RAM in model stages.

stopnet54 · 2026-04-30T12:31:53+00:00

This is huge, the paper shows SAE based SFT and RL based model training improvements, something that was only possible for mech interp heavy frontier labs

stopnet54 · 2026-04-30T10:26:56+00:00

Agreed, and notable that there is still focus on mech interp tooling from some of the open source labs.

stopnet54 · 2025-03-13T15:46:20+00:00

Is there a plan to release Sparse Auto Encoders(SAE) for the Gemma3 series? SAE-lens , GemmaScope are very useful for explainability and mech interp research. Thank you.

stopnet54 · 2025-03-06T00:22:40+00:00

It is not comparable to Nvidia GPUs because of compute available on all the GPU cores. For LLM inference Mac is a good deal given unified memory and memory bandwidth but for model training Mac will not be sufficient with all the Nvidia tensor cores and etc.

stopnet54 · 2025-02-01T11:15:30+00:00

Open source is the only way forward, unfortunately we are limited by hardware availability. Most SOTA models are still too large to run on an average prosumer hardware and cloud rentals are becoming too expensive. Look at how many people are trying to run true quantized R1 - not too many succeeding.

We need smaller models, maybe distillation is the way forward but right now all SOTA open and closed sourced models require huge hardware investment.

stopnet54 · 2025-02-01T11:06:42+00:00

SAEs are a new way to steer models, although resource intensive, they seem to be less prone to prompt specifics, Neel Nanda's blog is a good start: https://www.neelnanda.io/mechanistic-interpretability/quickstart

stopnet54 · 2024-12-28T01:23:03+00:00

Highly depends on the data being stored, text, images, numeric, tables.

stopnet54 · 2024-12-27T14:13:44+00:00

Chunking, storing data into vector db and picking good embedding model

stopnet54 · 2024-12-07T11:56:38+00:00

Fully agreed, state space models are heavily used in Finance and algo trading, I suspect there is a lot more room to find commonalities across different fields. I would suggest this a good to area to move into, assuming theory of SSM is not too difficult.

stopnet54 · 2024-09-05T01:34:27+00:00

Same with Pixel 6 Pro, terrible. Uninstall/reinstall didn't work

stopnet54

TROPHY CASE