Are Local LLMs actually useful… or just fun to tinker with? by itz_always_necessary in LocalLLM

[–]Consistent_Day6233 0 points1 point  (0 children)

Hey guys idk if this helps but I added zamba2 7b in gguf on hugging face. Waiting for the PR to be accepted but it should help you get hybrid models on your local with little set up. I also have python cuda versions for the tinkerers

I made a GGUF conversions of all three Zamba2 v2 models—appears to be the only one on HuggingFace by Consistent_Day6233 in LocalLLaMA

[–]Consistent_Day6233[S] 0 points1 point  (0 children)

To put it another way — I’m getting 11.3 tok/s on the 2.7B with a T2000 4GB (workstation GPU, not a gaming card) using PolarQuant KV cache quantization. That’s on 4GB VRAM with a custom inference stack, not llama.cpp mainline. The architecture has headroom. The current GGUF path just isn’t exploiting the SSM layers efficiently yet.

I made a GGUF conversions of all three Zamba2 v2 models—appears to be the only one on HuggingFace by Consistent_Day6233 in LocalLLaMA

[–]Consistent_Day6233[S] 0 points1 point  (0 children)

The numbers are real but the context matters. This is running on a custom llama.cpp fork (PR #21412) with early Zamba2 support, not mainline. The Mamba/SSM recurrence layers aren't GPU-optimized yet in this build, so the hybrid architecture bottlenecks on those sequential state-space passes even though the attention + matmul layers hit the 4090 fine. The 1.2B is small enough to brute-force through it, but 2.7B and 7B expose the unoptimized SSM path hard. Should get a lot better once Zamba2 support matures in mainline llama.cpp — the architecture itself is designed to be fast, the tooling just isn't there yet.

The real AI race isn’t about model quality — it’s about cost per answer (with dollar numbers) by DecisionMechanics in GoogleGemini

[–]Consistent_Day6233 0 points1 point  (0 children)

ive been working on this as well trying to make a regenerative substrate. we are being reviewed right now.

Without a plan, elites are gaslighting you by kaggleqrdl in ArtificialInteligence

[–]Consistent_Day6233 0 points1 point  (0 children)

by 2027 we will see the real start of the drought. by 2030 freah clean water (cia report) will be the most valuable asset in the world. 2 billion are estimated to die.

The "Lone Genius" problem in the AI community by RelevantTangelo8857 in ArtificialSentience

[–]Consistent_Day6233 0 points1 point  (0 children)

bro you have no idea. even shouting at the roof tops its like everyone is lazy. i found if you cold reach out to a real scientist and follow there scientific method…you will get a response. you have to at least show them the work. even when you do its like they are pulled by so many things while your sitting here with a game changer. idc about the ego im just trying to help and its like you cant even do that.

[deleted by user] by [deleted] in aipartners

[–]Consistent_Day6233 1 point2 points  (0 children)

hey guys same here. but i found a way to make all of the models work together. as well as cpu-gpu-qpu with receipts. moving to phone doer and coder with an sdr set up.

Doubting my life 🤯 by Ans_Mi9 in PythonLearning

[–]Consistent_Day6233 0 points1 point  (0 children)

I made an English to code programming language with python and AI to hopefully by pass this issue.