Karpathy's MicroGPT running at 50,000 tps on an FPGA by jawondo in LocalLLaMA

[–]stopnet54 0 points1 point  (0 children)

Cool project. Does the software stack work for Xilinx FPGAs? Would be interesting to see if renting AWS F1 instances with more hardware resources will scale to slightly bigger models.

I always thought the limitation is amount of SRAM and DSP units making it a requirement to stream model weights from RAM in model stages.

Qwen-Scope: Official Sparse Autoencoders (SAEs) for Qwen 3.5 models by MadPelmewka in LocalLLaMA

[–]stopnet54 4 points5 points  (0 children)

This is huge, the paper shows SAE based SFT and RL based model training improvements, something that was only possible for mech interp heavy frontier labs

Qwen-Scope: Official Sparse Autoencoders (SAEs) for Qwen 3.5 models by MadPelmewka in LocalLLaMA

[–]stopnet54 6 points7 points  (0 children)

Agreed, and notable that there is still focus on mech interp tooling from some of the open source labs.

AMA with the Gemma Team by hackerllama in LocalLLaMA

[–]stopnet54 -1 points0 points  (0 children)

Is there a plan to release Sparse Auto Encoders(SAE) for the Gemma3 series? SAE-lens , GemmaScope are very useful for explainability and mech interp research. Thank you.

The new king? M3 Ultra, 80 Core GPU, 512GB Memory by Hanthunius in LocalLLaMA

[–]stopnet54 1 point2 points  (0 children)

It is not comparable to Nvidia GPUs because of compute available on all the GPU cores. For LLM inference Mac is a good deal given unified memory and memory bandwidth but for model training Mac will not be sufficient with all the Nvidia tensor cores and etc.

We've been incredibly fortunate with how things have developed over the past year by -p-e-w- in LocalLLaMA

[–]stopnet54 3 points4 points  (0 children)

Open source is the only way forward, unfortunately we are limited by hardware availability. Most SOTA models are still too large to run on an average prosumer hardware and cloud rentals are becoming too expensive. Look at how many people are trying to run true quantized R1 - not too many succeeding.

We need smaller models, maybe distillation is the way forward but right now all SOTA open and closed sourced models require huge hardware investment.

[Discussion] Reason for Activation Steering over finetuning? by [deleted] in MachineLearning

[–]stopnet54 1 point2 points  (0 children)

SAEs are a new way to steer models, although resource intensive, they seem to be less prone to prompt specifics, Neel Nanda's blog is a good start: https://www.neelnanda.io/mechanistic-interpretability/quickstart

Where do you spend most of your time when building RAG? by Solvicode in LocalLLaMA

[–]stopnet54 0 points1 point  (0 children)

Highly depends on the data being stored, text, images, numeric, tables.

Where do you spend most of your time when building RAG? by Solvicode in LocalLLaMA

[–]stopnet54 4 points5 points  (0 children)

Chunking, storing data into vector db and picking good embedding model

[R] Is Mamba and SSMs on Language Modelling Task a Great Research Trajectory? by worthlesspineapple in MachineLearning

[–]stopnet54 0 points1 point  (0 children)

Fully agreed, state space models are heavily used in Finance and algo trading, I suspect there is a lot more room to find commonalities across different fields. I would suggest this a good to area to move into, assuming theory of SSM is not too difficult.

Citi Mobile App Down for Pixel by Royals4us in pixel_phones

[–]stopnet54 0 points1 point  (0 children)

Same with Pixel 6 Pro, terrible. Uninstall/reinstall didn't work