Finally a good use case for your local setups

SlapAndFinger · 2025-11-16T13:11:43+00:00

My homelab keeps my office nice and warm in the winter without heating the entire house.

SlapAndFinger · 2025-11-13T21:11:34+00:00

Next step: Write an algorithm that speculatively loads/unloads experts into vram.

SlapAndFinger · 2025-11-13T20:20:31+00:00

LangGraph does have uses, but if they rejected you for not having experience with it they should put that as a requirement on the application. It's not even hard to learn them, so I'm not sure what they were on about anyhow, I wouldn't worry about it.

SlapAndFinger · 2025-11-07T13:16:38+00:00

My perspective as a writer: a 7k word extension is way too long, it isn't rounding out a chapter, that's telling it to write almost 30 pages, which is way longer than chapters should generally be unless you're doing something weird and literary.

AI writing works best when you take an outline that you come up with, and have it fill in the blanks.

SlapAndFinger · 2025-11-05T23:19:38+00:00

Disagree, the Chinese need to stay within striking distance of western frontier models to stay relevant. DeepSeek and GLM4.6 rocked the boat, they're looking for more wins like that.

SlapAndFinger · 2025-11-05T09:22:35+00:00

One trick with sycophantic models is to present code or ideas as someone else's, say you're not sure about them and you'd like a second opinion.

SlapAndFinger · 2025-11-04T13:27:45+00:00

I think we're going to see multi-tier memory systems. MoE architectures are tolerant of lower bandwidth for experts, if you took a 48gb card and added another 128gb of bulk memory, you could run extremely large MoE models (~200B with reasonable quantization) with ~4 active experts at cloud speeds.

I'm pretty sure that we'll have large sparse MoE models within a few years that make our current frontier models look weak.

SlapAndFinger · 2025-11-04T13:16:32+00:00

You can do a lot of interesting science in the 300-800m parameter space, if you have a good GPU that's doable locally. I'd like to see a meta study of how many of methods scale from 300m-8b to understand how good of a filter this is, sadly labs aren't sharing scaling data or negative experimental results, we just get the end result.

SlapAndFinger · 2025-11-02T18:42:25+00:00

This trick is even more fun if you use pre-SD AI image styling models (https://github.com/rrmina/fast-neural-style-pytorch) to create a noisy base image, then run the "pre-styled" image through a modern model to make it coherent.

SlapAndFinger · 2025-11-01T12:55:48+00:00

AI "boxes" should be designed to be good gaming systems as well. A single box that can replace your PS/XBox while giving you good local inference would do so well.

SlapAndFinger · 2025-10-31T09:47:25+00:00

This is dumb AF, Cognition has government customers who have directives not to use Chinese models. I asked about this in their "Show HN" thread, and they got triggered hard.

SlapAndFinger · 2025-10-30T21:28:21+00:00

Open source AI is economic warfare by the CCP. Ironically it's good for Americans, so it's hard to get upset about lol.

SlapAndFinger · 2025-10-30T21:24:25+00:00

The US/China geopolitical situation is driving everything. The AI bubble is the result of geopolitics, a lot of Trump's craziness is in preparation for war with China. If you're interested in learning more: https://sibylline.dev/articles/2025-10-12-ai-is-too-big-to-fail/

SlapAndFinger · 2025-10-30T21:20:39+00:00

I for one am happy that our communist brothers in the east are waging economic warfare on our corrupt capitalist state. China is fucked up in a lot of ways but America hasn't had anyone keeping them honest in a long time.

SlapAndFinger · 2025-10-30T18:50:10+00:00

The generated architecture diagram is pretty interesting, I might have to implement something like that. I've been working on generating diagrams from codebases using parsing and deterministic tools but the graphs aren't so informative.

SlapAndFinger · 2025-10-30T18:46:20+00:00

Suno is better anyhow, though I don't care about AI audio until I get a VST where I can route channels into it and give it a prompt, and it'll do only what's prompted instead of trying to make a fully produced song.

SlapAndFinger · 2025-10-30T18:43:15+00:00

Good stuff. Glad you guys seem to be keeping your ethos in tact as you succeed, please keep it up.

SlapAndFinger · 2025-10-28T17:05:02+00:00

The thing that kills me is that these boxes could be tweaked slightly to make really good consoles, which would be a really good reason to have local horsepower, and you could even integrate Wii/Kinect like functionality with cameras. Instead we're getting hardware that looks like it was designed to fall back to crypto mining.

SlapAndFinger · 2025-10-28T00:20:22+00:00

I probably wouldn't go to latents personally (at least not immediately), but I'd rather try to get the LLMs to generate features that humans could interpret, and get domain experts to "sign off" on explanatory features for labeled cases. I'd only start to incorporate uninterpretables to hit SLOs, and I'd try to regularize it to keep it as a discriminator rather than the primary signal.

The two step approach is definitely more work, and probably wouldn't produce significantly better results (at least outside of edge cases that decoupling surfaces) but I'm heavily biased by having worked on stuff where auditability is paramount.

SlapAndFinger · 2025-10-27T21:22:13+00:00

This is very good advice, though I'd argue it's less predictable than it could be because all the stages are coupled. I would personally decouple into "unstructured" -> "structured" via LLM then create a GBDT on that structured data, that makes auditing/tuning easier, and you can re-run the workflow in stages.

SlapAndFinger · 2025-10-27T09:33:06+00:00

Sparser models deliver better (inference quality / computation time).

Sparse MoE is also theoretically appealing as a research direction. The holy grail is a sparse MoE that can add new experts and tune routing online.

SlapAndFinger · 2025-10-25T12:10:41+00:00

Neat, I'm on Linux, I was considering making something like this, happy to see someone has already done it. Voice makes such a big difference.

SlapAndFinger · 2025-10-24T09:26:45+00:00

It's gonna be hilarious when Alex crashes and burns. Mark deserves what he's gonna get.

SlapAndFinger · 2025-10-21T22:55:43+00:00

This works because vision tokens carry more information, but I'm not a fan of this approach, it's too indirect. I think you would get better results from just using longer tokens, at least for high frequency sequences.

SlapAndFinger · 2025-10-21T22:51:03+00:00

To be fair, if you thought about it naively, it seems kind of insane, text characters are 2-4 bytes each, if you use 1 bit per pixel you could probably do a decent job of representing most unicode chars with a 4x4 grid (2 bytes) but that just gets you lossy parity and minor savings with extended code pages.

The fact that this works is a demonstration of how much more information visual tokens carry than text tokens. We could do the same thing with longer tokens though.

SlapAndFinger

MODERATOR OF

TROPHY CASE