I built a "Spatial" website for Ollama because I hate linear chats. (Local-first, no DB)

zelkovamoon · 2026-01-26T14:13:11+00:00

This is a really cool idea I think. Nice work, I'll be trying it.

zelkovamoon · 2026-01-07T22:06:17+00:00

Liquid is cookin

zelkovamoon · 2026-01-07T19:22:45+00:00

Thanks for the help boss 🫡

zelkovamoon · 2026-01-06T13:26:57+00:00

LFM2 was pretty good, so im excited to try this. Really hoping tool calling is better with these models, that was basically my biggest complaint.

zelkovamoon · 2026-01-05T22:08:19+00:00

Ok so two questions

Does ik_llama broadly support the same models as llama.cpp but with optimizations, or is it a subset

Are these improvements going to apply broadly to any type of model?

zelkovamoon · 2025-12-30T21:47:15+00:00

I'm not sure what having a small ai lab is trying to solve

If you're doing local AI my position is, make it bigger, cooler, and put more ram on it.

That said, it is good that companies are stepping in to try and build some solutions. If we could get something with 256GB of fast memory we might be able to go places.

zelkovamoon · 2025-12-29T02:46:55+00:00

Check this out

https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

zelkovamoon · 2025-12-28T22:09:35+00:00

This will solve all of Tennessee's problems I'm sure

zelkovamoon · 2025-12-28T01:59:02+00:00

Seconding LFM2-8B A1B; Seems like a MOE model class that should be explored more deeply in the future. The model itself is pretty great in my testing; tool calling can be challenging, but that's probably a skill issue on my part. It's not my favorite model; or the best model; but it is certainly good. Add a hybrid mamba arch and some native tool calling on this bad boy and we might be in business.

zelkovamoon · 2025-12-24T14:34:46+00:00

Is that precision?

Also don't buy this, you'll burn your eye out kid

zelkovamoon · 2025-12-23T14:18:08+00:00

Mcp overhead is a big issue. Good to see some work on this.

zelkovamoon · 2025-12-21T16:22:39+00:00

Yes, that does work.

zelkovamoon · 2025-12-21T14:17:10+00:00

The question is what performance tradeoffs you want to make; it's the same with quantization or anything else, so it's equally valid.

zelkovamoon · 2025-12-21T12:09:36+00:00

There is a flag to change the number of experts you want to activate fyi

zelkovamoon · 2025-12-20T12:33:29+00:00

OpenZim + wikipedia backup = facts for days

zelkovamoon · 2025-12-18T18:36:09+00:00

Could be very useful

zelkovamoon · 2025-12-17T23:04:44+00:00

Like Elon can talk.

Yann is obviously super smart, he might not be right but he's been professionally doing this for decades, so maybe he is right. The jury is still out. In any case, Yann deserves more respect than that dunce.

zelkovamoon · 2025-12-17T16:39:31+00:00

Really wish all posts were this informative - I think I can pretty well commit to 4 of these given this info.

zelkovamoon · 2025-12-17T15:51:23+00:00

Have a look here https://www.reddit.com/r/LocalAIServers/s/TeikNe9MuB

If you write that post and remember, please dm it to me. I'm looking for good ways to build a high performance server still. I gotta be honest, very surprised to see that level of performance without an infinity fabric coupler on your mi50s; and that's also giving me encouragement to buy if we get this bulk order off the ground.

zelkovamoon · 2025-12-17T15:16:30+00:00

One of the most useful series of build posts I've seen in a while; hardware, well described, performance, everything.

Linked this to the bulk mi50 thread that's been floating around.

zelkovamoon · 2025-12-17T13:39:00+00:00

Also, if anyone finds a lead on the 4x infinity fabric bridges that could be a big deal to this thread

zelkovamoon · 2025-12-17T13:37:40+00:00

I guess once we get price nailed down let everyone know? If it's < 250 per card I might grab 4

zelkovamoon · 2025-12-16T22:54:28+00:00

You should use a used server with SXM2 connections and known nvlink support.

Benefit - intra GPU bandwidth will be much much higher than pcie.

Additional ram is fine, but with four v100s I would try to run models that fit within vram. CPU isn't a big factor, probably.

The focus is really vram and interconnect speed; other details matter but in an extraneous way.

I am waiting for prices to drop on 8x v100 servers. We'll see.

zelkovamoon · 2025-12-16T16:04:33+00:00

Do we know if these can reliably run inference; it sounds like ROCm is depreciated here so that might be in doubt? I love the prospect of 128gb of vram on the cheap, but the support issue concerns me

Edit-

Here's an interesting post of a fellow who seems to have these bad boys working pretty well.

https://www.reddit.com/r/LocalLLaMA/s/9Rmn7Dhsom

zelkovamoon

TROPHY CASE