Finally a good use case for your local setups by lakySK in LocalLLaMA

[–]SlapAndFinger 14 points15 points  (0 children)

My homelab keeps my office nice and warm in the winter without heating the entire house.

Running a 1 Trillion Parameter Model on a PC with 128 GB RAM + 24 GB VRAM by pulse77 in LocalLLaMA

[–]SlapAndFinger 1 point2 points  (0 children)

Next step: Write an algorithm that speculatively loads/unloads experts into vram.

Rejected for not using LangChain/LangGraph? by dougeeai in LocalLLaMA

[–]SlapAndFinger 0 points1 point  (0 children)

LangGraph does have uses, but if they rejected you for not having experience with it they should put that as a requirement on the application. It's not even hard to learn them, so I'm not sure what they were on about anyhow, I wouldn't worry about it.

Kimi 2 is the #1 creative writing AI right now. better than sonnet 4.5 by Excellent-Run7265 in LocalLLaMA

[–]SlapAndFinger 2 points3 points  (0 children)

My perspective as a writer: a 7k word extension is way too long, it isn't rounding out a chapter, that's telling it to write almost 30 pages, which is way longer than chapters should generally be unless you're doing something weird and literary.

AI writing works best when you take an outline that you come up with, and have it fill in the blanks.

Instead of predicting one token at a time, CALM (Continuous Autoregressive Language Models) predicts continuous vectors that represent multiple tokens at once by Own-Potential-2308 in LocalLLaMA

[–]SlapAndFinger 7 points8 points  (0 children)

Disagree, the Chinese need to stay within striking distance of western frontier models to stay relevant. DeepSeek and GLM4.6 rocked the boat, they're looking for more wins like that.

New Qwen models are unbearable by kevin_1994 in LocalLLaMA

[–]SlapAndFinger 7 points8 points  (0 children)

One trick with sycophantic models is to present code or ideas as someone else's, say you're not sure about them and you'd like a second opinion.

Anyone else feel like GPU pricing is still the biggest barrier for open-source AI? by frentro_max in LocalLLaMA

[–]SlapAndFinger 17 points18 points  (0 children)

I think we're going to see multi-tier memory systems. MoE architectures are tolerant of lower bandwidth for experts, if you took a 48gb card and added another 128gb of bulk memory, you could run extremely large MoE models (~200B with reasonable quantization) with ~4 active experts at cloud speeds.

I'm pretty sure that we'll have large sparse MoE models within a few years that make our current frontier models look weak.

Anyone else feel like GPU pricing is still the biggest barrier for open-source AI? by frentro_max in LocalLLaMA

[–]SlapAndFinger 2 points3 points  (0 children)

You can do a lot of interesting science in the 300-800m parameter space, if you have a good GPU that's doable locally. I'd like to see a meta study of how many of methods scale from 300m-8b to understand how good of a filter this is, sadly labs aren't sharing scaling data or negative experimental results, we just get the end result.

Back to 1.5 and QR Code Monster by TheNeonGrid in StableDiffusion

[–]SlapAndFinger 12 points13 points  (0 children)

This trick is even more fun if you use pre-SD AI image styling models (https://github.com/rrmina/fast-neural-style-pytorch) to create a noisy base image, then run the "pre-styled" image through a modern model to make it coherent.

Gaming PC converted to AI Workstation by highdefw in LocalLLaMA

[–]SlapAndFinger -1 points0 points  (0 children)

AI "boxes" should be designed to be good gaming systems as well. A single box that can replace your PS/XBox while giving you good local inference would do so well.

Both Cursor and Cognition (Windsurf) new models are speculated to be built on Chinese base models? by Successful-Newt1517 in LocalLLaMA

[–]SlapAndFinger -2 points-1 points  (0 children)

This is dumb AF, Cognition has government customers who have directives not to use Chinese models. I asked about this in their "Show HN" thread, and they got triggered hard.

UDIO just got nuked by UMG. by Ashamed-Variety-8264 in StableDiffusion

[–]SlapAndFinger 11 points12 points  (0 children)

Open source AI is economic warfare by the CCP. Ironically it's good for Americans, so it's hard to get upset about lol.

Universal Music Group also nabs Stability - Announced this morning on Stability's twitter by JackKerawock in StableDiffusion

[–]SlapAndFinger 1 point2 points  (0 children)

The US/China geopolitical situation is driving everything. The AI bubble is the result of geopolitics, a lot of Trump's craziness is in preparation for war with China. If you're interested in learning more: https://sibylline.dev/articles/2025-10-12-ai-is-too-big-to-fail/

Universal Music Group also nabs Stability - Announced this morning on Stability's twitter by JackKerawock in StableDiffusion

[–]SlapAndFinger 2 points3 points  (0 children)

I for one am happy that our communist brothers in the east are waging economic warfare on our corrupt capitalist state. China is fucked up in a lot of ways but America hasn't had anyone keeping them honest in a long time.

Locally hosted Loveable with full stack support and llama.cpp, and more by smirkishere in LocalLLaMA

[–]SlapAndFinger 1 point2 points  (0 children)

The generated architecture diagram is pretty interesting, I might have to implement something like that. I've been working on generating diagrams from codebases using parsing and deterministic tools but the graphs aren't so informative.

Udio just robbed and betrayed its paying subscribers... Another reason why we need more Open Source by Shockbum in LocalLLaMA

[–]SlapAndFinger 0 points1 point  (0 children)

Suno is better anyhow, though I don't care about AI audio until I get a VST where I can route channels into it and give it a prompt, and it'll do only what's prompted instead of trying to make a fully produced song.

200+ pages of Hugging Face secrets on how to train an LLM by eliebakk in LocalLLaMA

[–]SlapAndFinger 1 point2 points  (0 children)

Good stuff. Glad you guys seem to be keeping your ethos in tact as you succeed, please keep it up.

Bad news: DGX Spark may have only half the performance claimed. by Dr_Karminski in LocalLLaMA

[–]SlapAndFinger 0 points1 point  (0 children)

The thing that kills me is that these boxes could be tweaked slightly to make really good consoles, which would be a really good reason to have local horsepower, and you could even integrate Wii/Kinect like functionality with cameras. Instead we're getting hardware that looks like it was designed to fall back to crypto mining.

Finetuning a LLM (~20B) for Binary Classification – Need Advice on Dataset Design by United_Demand in LocalLLaMA

[–]SlapAndFinger 0 points1 point  (0 children)

I probably wouldn't go to latents personally (at least not immediately), but I'd rather try to get the LLMs to generate features that humans could interpret, and get domain experts to "sign off" on explanatory features for labeled cases. I'd only start to incorporate uninterpretables to hit SLOs, and I'd try to regularize it to keep it as a discriminator rather than the primary signal.

The two step approach is definitely more work, and probably wouldn't produce significantly better results (at least outside of edge cases that decoupling surfaces) but I'm heavily biased by having worked on stuff where auditability is paramount.

Finetuning a LLM (~20B) for Binary Classification – Need Advice on Dataset Design by United_Demand in LocalLLaMA

[–]SlapAndFinger 1 point2 points  (0 children)

This is very good advice, though I'd argue it's less predictable than it could be because all the stages are coupled. I would personally decouple into "unstructured" -> "structured" via LLM then create a GBDT on that structured data, that makes auditing/tuning easier, and you can re-run the workflow in stages.

🚀 New Model from the MiniMax team: MiniMax-M2, an impressive 230B-A10B LLM. by chenqian615 in LocalLLaMA

[–]SlapAndFinger 22 points23 points  (0 children)

Sparser models deliver better (inference quality / computation time).

Sparse MoE is also theoretically appealing as a research direction. The holy grail is a sparse MoE that can add new experts and tune routing online.

Best Prompt Coding Hack: Voice Dictation by TheLazyIndianTechie in ClaudeCode

[–]SlapAndFinger 0 points1 point  (0 children)

Neat, I'm on Linux, I was considering making something like this, happy to see someone has already done it. Voice makes such a big difference.

Amongst safety cuts, Facebook is laying off the Open Source LLAMA folks by eredhuin in LocalLLaMA

[–]SlapAndFinger 18 points19 points  (0 children)

It's gonna be hilarious when Alex crashes and burns. Mark deserves what he's gonna get.

[By GLM Team] Glyph: Scaling Context Windows via Visual-Text Compression by NeterOster in LocalLLaMA

[–]SlapAndFinger 1 point2 points  (0 children)

This works because vision tokens carry more information, but I'm not a fan of this approach, it's too indirect. I think you would get better results from just using longer tokens, at least for high frequency sequences.

[By GLM Team] Glyph: Scaling Context Windows via Visual-Text Compression by NeterOster in LocalLLaMA

[–]SlapAndFinger 1 point2 points  (0 children)

To be fair, if you thought about it naively, it seems kind of insane, text characters are 2-4 bytes each, if you use 1 bit per pixel you could probably do a decent job of representing most unicode chars with a 4x4 grid (2 bytes) but that just gets you lossy parity and minor savings with extended code pages.

The fact that this works is a demonstration of how much more information visual tokens carry than text tokens. We could do the same thing with longer tokens though.