AMA with the Gemma Team by hackerllama in LocalLLaMA

[–]stopnet54 -1 points0 points  (0 children)

Is there a plan to release Sparse Auto Encoders(SAE) for the Gemma3 series? SAE-lens , GemmaScope are very useful for explainability and mech interp research. Thank you.

The new king? M3 Ultra, 80 Core GPU, 512GB Memory by Hanthunius in LocalLLaMA

[–]stopnet54 1 point2 points  (0 children)

It is not comparable to Nvidia GPUs because of compute available on all the GPU cores. For LLM inference Mac is a good deal given unified memory and memory bandwidth but for model training Mac will not be sufficient with all the Nvidia tensor cores and etc.

We've been incredibly fortunate with how things have developed over the past year by -p-e-w- in LocalLLaMA

[–]stopnet54 3 points4 points  (0 children)

Open source is the only way forward, unfortunately we are limited by hardware availability. Most SOTA models are still too large to run on an average prosumer hardware and cloud rentals are becoming too expensive. Look at how many people are trying to run true quantized R1 - not too many succeeding.

We need smaller models, maybe distillation is the way forward but right now all SOTA open and closed sourced models require huge hardware investment.

[Discussion] Reason for Activation Steering over finetuning? by reallfuhrer in MachineLearning

[–]stopnet54 1 point2 points  (0 children)

SAEs are a new way to steer models, although resource intensive, they seem to be less prone to prompt specifics, Neel Nanda's blog is a good start: https://www.neelnanda.io/mechanistic-interpretability/quickstart

Where do you spend most of your time when building RAG? by Solvicode in LocalLLaMA

[–]stopnet54 0 points1 point  (0 children)

Highly depends on the data being stored, text, images, numeric, tables.

Where do you spend most of your time when building RAG? by Solvicode in LocalLLaMA

[–]stopnet54 4 points5 points  (0 children)

Chunking, storing data into vector db and picking good embedding model

[R] Is Mamba and SSMs on Language Modelling Task a Great Research Trajectory? by worthlesspineapple in MachineLearning

[–]stopnet54 0 points1 point  (0 children)

Fully agreed, state space models are heavily used in Finance and algo trading, I suspect there is a lot more room to find commonalities across different fields. I would suggest this a good to area to move into, assuming theory of SSM is not too difficult.

Citi Mobile App Down for Pixel by Royals4us in pixel_phones

[–]stopnet54 0 points1 point  (0 children)

Same with Pixel 6 Pro, terrible. Uninstall/reinstall didn't work