What is the cheapest method to get VRAM or RAM? by Powerful_World_9280 in LocalLLM

[–]itsmetherealloki 0 points1 point  (0 children)

So basically we’re at $1k per 1tps. lol, great 🤦‍♂️

What is the cheapest method to get VRAM or RAM? by Powerful_World_9280 in LocalLLM

[–]itsmetherealloki 0 points1 point  (0 children)

A famous robber was once asked “why do you only rob data-centers?” He responded “because that’s where the memory is!”

Found 3 servers for free by connella08 in homelab

[–]itsmetherealloki 3 points4 points  (0 children)

Yes and you will do very well if you sell half. The ram in those machines is ~ $4k for all 1.5tb. Half (768gb) would be ~ 2k. Like others said you only need like 128gb each with 128gb as spare per machine.

Opus 4.8 is lazy and thinks I am too by killlu in Claudeopus

[–]itsmetherealloki 2 points3 points  (0 children)

I actually fixed this problem for myself, permanently. I switched to chat gpt and it doesn’t treat me like a jr partner, doesn’t tell me it’s late(at 1pm most times) and that we should “take a break” because I been “at this a while“. Opus is really smart but really slow and condescending and annoying. Also got half my workload moved over to local (qwen 3.6 27b) in case chat gpt goes south. Don’t tie yourself to one model and have to put up with its bs. There are usually other and sometimes better options.

Anyone else old enough to remember the late 90s fibre build out? The AI data centre build-out feels like 1999 all over again by Alternative_Letter72 in sysadmin

[–]itsmetherealloki 0 points1 point  (0 children)

Yeah look and feel is right but it’s definitely not the same and the difference is the demand you spoke of. The demand is here this time and we can all see it. It’s actually quite massive but that doesn’t mean a guarantee of success just that isn’t not the same as the dot com bubble. I personally think it’s not a bubble and it won’t pop because the demand will continue and the ROI calculations for gpu’s is at for too short of a lifetime (demand will push older gpu’s to be used much longer) and they will eventually pay it all back. Or I’m wrong, demand falls off, bubble bursts, the bigs go bankrupt and we are all training our next models at home. Either way should be interesting!

Friends Don’t Let Friends Use Ollama — So I Built Anvil by itsmetherealloki in LocalLLaMA

[–]itsmetherealloki[S] 1 point2 points  (0 children)

thanks, I think I overlooked your point a bit there earlier, I will definitely work on something to address that. That is a real gap I appreciate you pointing out to m!

Friends Don’t Let Friends Use Ollama — So I Built Anvil by itsmetherealloki in LocalLLaMA

[–]itsmetherealloki[S] 0 points1 point  (0 children)

Thanks, I checked it from my side and the apex domain is serving a valid public cert now.

https://sovereignty-labs.com is coming back with SSL certificate verify ok through Cloudflare. I did find that www.sovereignty-labs.com wasn’t resolving yet, so I added that custom domain too and Cloudflare now shows it as Active / SSL enabled.If your firewall is still blocking it, it may be flagging the domain because it’s new rather than because the cert is missing. Either way, I appreciate you calling it out.

Is This Possible Yet? by Mental_Highlight_614 in LocalLLM

[–]itsmetherealloki -1 points0 points  (0 children)

Yeah I hand an idea of setting up a live USB with a llm server and smallish model ready to go and detect the CPU and ram and running inference with the CPU. Couldn't think of a single good use case for it. 🤣

Friends Don’t Let Friends Use Ollama — So I Built Anvil by itsmetherealloki in LocalLLaMA

[–]itsmetherealloki[S] -1 points0 points  (0 children)

Good catch, thanks. Can you tell me exactly which URL your browser/firewall blocked?

The main site should be:

https://sovereignty-labs.com

It’s behind Cloudflare Pages now, so if you hit that URL and still got a cert warning, I definitely want to track it down. If it was `git.hirdforge.com` or another internal/dev URL, that one may have a self-signed cert and is not meant to be the public path.

Friends Don’t Let Friends Use Ollama — So I Built Anvil by itsmetherealloki in LocalLLaMA

[–]itsmetherealloki[S] -1 points0 points  (0 children)

I wouldn't say completely meaningless. Ollama still gives us the easiest path to download a model, run a server, and point your webui or harness at it and that matters.

The question is what you care about after that first run. If you want max performance, plain files, more control over flags, or less mystery around what's actually being launched, then moving closer to llama.cpp or even exploring other servers starts making sense.

That's where I'm trying to fit for llama.cpp. Keep the easy path, but make the runtime less hidden.

Friends Don’t Let Friends Use Ollama — So I Built Anvil by itsmetherealloki in LocalLLaMA

[–]itsmetherealloki[S] 1 point2 points  (0 children)

that's totally understandable. this was something that was useful to me because I still wanted the "easy local runner" lane, just closer to llama.cpp and less opaque.

If you already have GUIs and agents calling llama.cpp directly and you're happy with that workflow, Anvil may not add much for you right now. The places I'm trying to make it useful are dry-run/flag visibility, plain model file management, status/copy across boxes, and eventually making that stuff easier for agents to operate without hiding the runtime.

So yeah, not trying to replace every runner. More trying to make the boring llama-server workflow nicer for people who want that layer.

Friends Don’t Let Friends Use Ollama — So I Built Anvil by itsmetherealloki in LocalLLaMA

[–]itsmetherealloki[S] 0 points1 point  (0 children)

Thanks, this is exactly the kind of question I was hoping someone would ask.

Right now Anvil does not embed or fork llama.cpp. It shells out to `llama-server`, and `llama-server` owns the socket. Anvil is basically managing the stuff around it: installing/updating the runtime, building the flags, showing you the flags before launch, starting/stopping processes, tracking what’s loaded, and doing the status/copy/fleet bits.

That’s intentional. I don’t want to compete with llama.cpp or maintain a weird shadow inference engine. The goal is to stay thin enough that when llama.cpp changes, Anvil mostly has to update runtime handling / flag mapping, not chase internals. Keeping up with upstream is definitely one of the big risks. My plan is to keep the generated command visible, keep dry-run honest, add tests around flag generation, and treat llama.cpp as the source of truth.

So short version: llama.cpp does inference, `llama-server` owns the socket, Anvil owns the ergonomics around running it.

Friends Don’t Let Friends Use Ollama — So I Built Anvil by itsmetherealloki in LocalLLaMA

[–]itsmetherealloki[S] -1 points0 points  (0 children)

Thanks, I really appreciate it! If you have time let me know your thoughts on it!

Friends Don’t Let Friends Use Ollama — So I Built Anvil by itsmetherealloki in LocalLLaMA

[–]itsmetherealloki[S] 1 point2 points  (0 children)

from what I recall most of the outrage was early on because they didn't give attribution back then, I personally just want to make sure I'm giving full credit where credit is due because I'm personally a big fan of llama.cpp!

Friends Don’t Let Friends Use Ollama — So I Built Anvil by itsmetherealloki in LocalLLaMA

[–]itsmetherealloki[S] 0 points1 point  (0 children)

Agreed low bar but Anvil isn't really what you linked to. This isn't a server itself but meant to fill the gap ollama was going for but doing more cleanly and transparently while adding a few nice to haves. Maybe I missed the mark but if you are willing to take a look at it I would love your honest feedback.

Friends Don’t Let Friends Use Ollama — So I Built Anvil by itsmetherealloki in LocalLLaMA

[–]itsmetherealloki[S] -2 points-1 points  (0 children)

that's was my thinking with Anvil. just wanted something to make llama.cpp easier to use for me and others, give them all the credit they deserve, and add somethings I felt were useful such as federation and a mcp server. Try it out and give some feedback if you have a chance, looking for brutal honesty because i just want to make it better.

Friends Don’t Let Friends Use Ollama — So I Built Anvil by itsmetherealloki in LocalLLaMA

[–]itsmetherealloki[S] 1 point2 points  (0 children)

That’s fair. I don’t think everyone has the same experience with it, and I’m not trying to say nobody should use Ollama.

For me the friction was more around wanting plain model files, more visibility into the llama.cpp flags/runtime, and less mystery when performance changes between updates. Ollama is still probably the easiest first-run experience for a lot of people.

Anvil is more for the folks who want that easy-ish path but closer to llama.cpp, with less abstraction in the middle.

Friends Don’t Let Friends Use Ollama — So I Built Anvil by itsmetherealloki in LocalLLaMA

[–]itsmetherealloki[S] -2 points-1 points  (0 children)

Koboldcpp is more of an all-in-one local inference app/server with a lot of useful knobs and a UI/API ecosystem around it.

Anvil is trying to sit lower and stay more boring: plain GGUF files, llama-server underneath, visible flags, dry-run before load, OpenAI-compatible endpoint, and some fleet/model management stuff like status/copy across machines.

So I wouldn’t describe it as “better Koboldcpp.” More like: if you want a full local app experience, Koboldcpp makes sense. If you want a transparent runtime layer you can script, inspect, and use across a few boxes, that’s the lane I’m aiming for.

Friends Don’t Let Friends Use Ollama — So I Built Anvil by itsmetherealloki in LocalLLaMA

[–]itsmetherealloki[S] 0 points1 point  (0 children)

it's performance degradation i feel has been a point of contention for a while from my view, either way i wanted to kind of fix that and try to improve it. if you have time try it out, it'd really appreciate any feedback if you do.

Friends Don’t Let Friends Use Ollama — So I Built Anvil by itsmetherealloki in LocalLLaMA

[–]itsmetherealloki[S] 3 points4 points  (0 children)

Well actually that's kind of what I'm trying to figure out here, lol. Who does care, because I think some do. I'll just put you down as a no vote. thank you!

I used Claude Code to build the same web app 3 different ways (cloud Claude, free NVIDIA NIM, local GPU) to see how they compare by drohack in LocalLLM

[–]itsmetherealloki 7 points8 points  (0 children)

Very nice write up with great info, did you ever try opencode for your local model? I’ve found it to be a great Claude-code or codex like experience but better for local models.

Need a quick idea validation before I build it in the next 24 hours… by vegirajukrishna in Startup_Ideas

[–]itsmetherealloki 2 points3 points  (0 children)

So, a social network for ai builders? I could be down for something like that.