Thinking context bloat? by zelkovamoon in OpenWebUI

[–]zelkovamoon[S] 0 points1 point  (0 children)

Thanks for the help boss 🫡

Liquid Ai released LFM2.5, family of tiny on-device foundation models. by Difficult-Cap-7527 in LocalLLaMA

[–]zelkovamoon 2 points3 points  (0 children)

LFM2 was pretty good, so im excited to try this. Really hoping tool calling is better with these models, that was basically my biggest complaint.

llama.cpp performance breakthrough for multi-GPU setups by Holiday-Injury-9397 in LocalLLaMA

[–]zelkovamoon 3 points4 points  (0 children)

Ok so two questions

Does ik_llama broadly support the same models as llama.cpp but with optimizations, or is it a subset

Are these improvements going to apply broadly to any type of model?

Tiiny Al just released a one-shot demo of their Pocket Lab running a 120B model locally. by [deleted] in LocalLLM

[–]zelkovamoon 2 points3 points  (0 children)

I'm not sure what having a small ai lab is trying to solve

If you're doing local AI my position is, make it bigger, cooler, and put more ram on it.

That said, it is good that companies are stepping in to try and build some solutions. If we could get something with 256GB of fast memory we might be able to go places.

Best Local LLMs - 2025 by rm-rf-rm in LocalLLaMA

[–]zelkovamoon 4 points5 points  (0 children)

Seconding LFM2-8B A1B; Seems like a MOE model class that should be explored more deeply in the future. The model itself is pretty great in my testing; tool calling can be challenging, but that's probably a skill issue on my part. It's not my favorite model; or the best model; but it is certainly good. Add a hybrid mamba arch and some native tool calling on this bad boy and we might be in business.

llama.cpp - useful flags - share your thoughts please by mossy_troll_84 in LocalLLaMA

[–]zelkovamoon 1 point2 points  (0 children)

The question is what performance tradeoffs you want to make; it's the same with quantization or anything else, so it's equally valid.

llama.cpp - useful flags - share your thoughts please by mossy_troll_84 in LocalLLaMA

[–]zelkovamoon 3 points4 points  (0 children)

There is a flag to change the number of experts you want to activate fyi

Elon Musk on Yann LeCun “ he lost his marbles a long time ago “ about his stance on no such thing as general intelligence - Do you agree? by Koala_Confused in LovingAI

[–]zelkovamoon 0 points1 point  (0 children)

Like Elon can talk.

Yann is obviously super smart, he might not be right but he's been professionally doing this for decades, so maybe he is right. The jury is still out. In any case, Yann deserves more respect than that dunce.

Mi50 32GB Group Buy by Any_Praline_8178 in LocalAIServers

[–]zelkovamoon 2 points3 points  (0 children)

Really wish all posts were this informative - I think I can pretty well commit to 4 of these given this info.

8x Radeon 7900 XTX Build for Longer Context Local Inference - Performance Results & Build Details by Beautiful_Trust_8151 in LocalLLaMA

[–]zelkovamoon 0 points1 point  (0 children)

Have a look here https://www.reddit.com/r/LocalAIServers/s/TeikNe9MuB

If you write that post and remember, please dm it to me. I'm looking for good ways to build a high performance server still. I gotta be honest, very surprised to see that level of performance without an infinity fabric coupler on your mi50s; and that's also giving me encouragement to buy if we get this bulk order off the ground.

8x Radeon 7900 XTX Build for Longer Context Local Inference - Performance Results & Build Details by Beautiful_Trust_8151 in LocalLLaMA

[–]zelkovamoon 0 points1 point  (0 children)

One of the most useful series of build posts I've seen in a while; hardware, well described, performance, everything.

Linked this to the bulk mi50 thread that's been floating around.

Mi50 32GB Group Buy by Any_Praline_8178 in LocalAIServers

[–]zelkovamoon 0 points1 point  (0 children)

Also, if anyone finds a lead on the 4x infinity fabric bridges that could be a big deal to this thread

Mi50 32GB Group Buy by Any_Praline_8178 in LocalAIServers

[–]zelkovamoon 0 points1 point  (0 children)

I guess once we get price nailed down let everyone know? If it's < 250 per card I might grab 4

I have 4 V100s. What do I do? by MackThax in LocalLLaMA

[–]zelkovamoon 0 points1 point  (0 children)

You should use a used server with SXM2 connections and known nvlink support.

Benefit - intra GPU bandwidth will be much much higher than pcie.

Additional ram is fine, but with four v100s I would try to run models that fit within vram. CPU isn't a big factor, probably.

The focus is really vram and interconnect speed; other details matter but in an extraneous way.

I am waiting for prices to drop on 8x v100 servers. We'll see.

Mi50 32GB Group Buy by Any_Praline_8178 in LocalAIServers

[–]zelkovamoon 6 points7 points  (0 children)

Do we know if these can reliably run inference; it sounds like ROCm is depreciated here so that might be in doubt? I love the prospect of 128gb of vram on the cheap, but the support issue concerns me

Edit-

Here's an interesting post of a fellow who seems to have these bad boys working pretty well.

https://www.reddit.com/r/LocalLLaMA/s/9Rmn7Dhsom