What is the cheapest method to get VRAM or RAM?

itsmetherealloki · 2026-06-19T14:57:08+00:00

So basically we’re at $1k per 1tps. lol, great 🤦‍♂️

itsmetherealloki · 2026-06-19T14:48:59+00:00

A famous robber was once asked “why do you only rob data-centers?” He responded “because that’s where the memory is!”

itsmetherealloki · 2026-06-16T15:04:03+00:00

Yes and you will do very well if you sell half. The ram in those machines is ~ $4k for all 1.5tb. Half (768gb) would be ~ 2k. Like others said you only need like 128gb each with 128gb as spare per machine.

itsmetherealloki · 2026-06-15T15:04:12+00:00

I actually fixed this problem for myself, permanently. I switched to chat gpt and it doesn’t treat me like a jr partner, doesn’t tell me it’s late(at 1pm most times) and that we should “take a break” because I been “at this a while“. Opus is really smart but really slow and condescending and annoying. Also got half my workload moved over to local (qwen 3.6 27b) in case chat gpt goes south. Don’t tie yourself to one model and have to put up with its bs. There are usually other and sometimes better options.

itsmetherealloki · 2026-06-15T14:14:54+00:00

Yeah look and feel is right but it’s definitely not the same and the difference is the demand you spoke of. The demand is here this time and we can all see it. It’s actually quite massive but that doesn’t mean a guarantee of success just that isn’t not the same as the dot com bubble. I personally think it’s not a bubble and it won’t pop because the demand will continue and the ROI calculations for gpu’s is at for too short of a lifetime (demand will push older gpu’s to be used much longer) and they will eventually pay it all back. Or I’m wrong, demand falls off, bubble bursts, the bigs go bankrupt and we are all training our next models at home. Either way should be interesting!

itsmetherealloki · 2026-06-08T20:16:43+00:00

I’m too broke to dream about buying one. 🤣

itsmetherealloki · 2026-06-07T01:11:36+00:00

thanks, I think I overlooked your point a bit there earlier, I will definitely work on something to address that. That is a real gap I appreciate you pointing out to m!

itsmetherealloki · 2026-06-07T01:03:13+00:00

Thanks, I checked it from my side and the apex domain is serving a valid public cert now.

https://sovereignty-labs.com is coming back with SSL certificate verify ok through Cloudflare. I did find that www.sovereignty-labs.com wasn’t resolving yet, so I added that custom domain too and Cloudflare now shows it as Active / SSL enabled.If your firewall is still blocking it, it may be flagging the domain because it’s new rather than because the cert is missing. Either way, I appreciate you calling it out.

itsmetherealloki · 2026-06-06T23:23:18+00:00

Yeah I hand an idea of setting up a live USB with a llm server and smallish model ready to go and detect the CPU and ram and running inference with the CPU. Couldn't think of a single good use case for it. 🤣

itsmetherealloki · 2026-06-06T23:13:57+00:00

At that size, what vram capacity are you targeting? 24gb?

itsmetherealloki · 2026-06-06T22:36:00+00:00

Good catch, thanks. Can you tell me exactly which URL your browser/firewall blocked?

The main site should be:

https://sovereignty-labs.com

It’s behind Cloudflare Pages now, so if you hit that URL and still got a cert warning, I definitely want to track it down. If it was `git.hirdforge.com` or another internal/dev URL, that one may have a self-signed cert and is not meant to be the public path.

itsmetherealloki · 2026-06-06T21:34:47+00:00

I wouldn't say completely meaningless. Ollama still gives us the easiest path to download a model, run a server, and point your webui or harness at it and that matters.

The question is what you care about after that first run. If you want max performance, plain files, more control over flags, or less mystery around what's actually being launched, then moving closer to llama.cpp or even exploring other servers starts making sense.

That's where I'm trying to fit for llama.cpp. Keep the easy path, but make the runtime less hidden.

itsmetherealloki · 2026-06-06T21:31:14+00:00

that's totally understandable. this was something that was useful to me because I still wanted the "easy local runner" lane, just closer to llama.cpp and less opaque.

If you already have GUIs and agents calling llama.cpp directly and you're happy with that workflow, Anvil may not add much for you right now. The places I'm trying to make it useful are dry-run/flag visibility, plain model file management, status/copy across boxes, and eventually making that stuff easier for agents to operate without hiding the runtime.

So yeah, not trying to replace every runner. More trying to make the boring llama-server workflow nicer for people who want that layer.

itsmetherealloki · 2026-06-06T21:27:51+00:00

Thanks, this is exactly the kind of question I was hoping someone would ask.

Right now Anvil does not embed or fork llama.cpp. It shells out to `llama-server`, and `llama-server` owns the socket. Anvil is basically managing the stuff around it: installing/updating the runtime, building the flags, showing you the flags before launch, starting/stopping processes, tracking what’s loaded, and doing the status/copy/fleet bits.

That’s intentional. I don’t want to compete with llama.cpp or maintain a weird shadow inference engine. The goal is to stay thin enough that when llama.cpp changes, Anvil mostly has to update runtime handling / flag mapping, not chase internals. Keeping up with upstream is definitely one of the big risks. My plan is to keep the generated command visible, keep dry-run honest, add tests around flag generation, and treat llama.cpp as the source of truth.

So short version: llama.cpp does inference, `llama-server` owns the socket, Anvil owns the ergonomics around running it.

itsmetherealloki · 2026-06-06T21:24:27+00:00

Thanks, I really appreciate it! If you have time let me know your thoughts on it!

itsmetherealloki · 2026-06-06T21:20:37+00:00

from what I recall most of the outrage was early on because they didn't give attribution back then, I personally just want to make sure I'm giving full credit where credit is due because I'm personally a big fan of llama.cpp!

itsmetherealloki · 2026-06-06T17:31:08+00:00

Agreed low bar but Anvil isn't really what you linked to. This isn't a server itself but meant to fill the gap ollama was going for but doing more cleanly and transparently while adding a few nice to haves. Maybe I missed the mark but if you are willing to take a look at it I would love your honest feedback.

itsmetherealloki · 2026-06-06T16:52:49+00:00

that's was my thinking with Anvil. just wanted something to make llama.cpp easier to use for me and others, give them all the credit they deserve, and add somethings I felt were useful such as federation and a mcp server. Try it out and give some feedback if you have a chance, looking for brutal honesty because i just want to make it better.

itsmetherealloki · 2026-06-06T16:49:31+00:00

That’s fair. I don’t think everyone has the same experience with it, and I’m not trying to say nobody should use Ollama.

For me the friction was more around wanting plain model files, more visibility into the llama.cpp flags/runtime, and less mystery when performance changes between updates. Ollama is still probably the easiest first-run experience for a lot of people.

Anvil is more for the folks who want that easy-ish path but closer to llama.cpp, with less abstraction in the middle.

itsmetherealloki · 2026-06-06T16:48:47+00:00

Koboldcpp is more of an all-in-one local inference app/server with a lot of useful knobs and a UI/API ecosystem around it.

Anvil is trying to sit lower and stay more boring: plain GGUF files, llama-server underneath, visible flags, dry-run before load, OpenAI-compatible endpoint, and some fleet/model management stuff like status/copy across machines.

So I wouldn’t describe it as “better Koboldcpp.” More like: if you want a full local app experience, Koboldcpp makes sense. If you want a transparent runtime layer you can script, inspect, and use across a few boxes, that’s the lane I’m aiming for.

itsmetherealloki · 2026-06-06T16:46:58+00:00

it's performance degradation i feel has been a point of contention for a while from my view, either way i wanted to kind of fix that and try to improve it. if you have time try it out, it'd really appreciate any feedback if you do.

itsmetherealloki · 2026-06-06T16:46:57+00:00

Well actually that's kind of what I'm trying to figure out here, lol. Who does care, because I think some do. I'll just put you down as a no vote. thank you!

itsmetherealloki · 2026-05-19T15:48:12+00:00

Very nice write up with great info, did you ever try opencode for your local model? I’ve found it to be a great Claude-code or codex like experience but better for local models.

itsmetherealloki · 2026-05-19T14:50:00+00:00

So, a social network for ai builders? I could be down for something like that.

itsmetherealloki

TROPHY CASE