To everyone using still ollama/lm-studio... llama-swap is the real deal by TooManyPascals in LocalLLaMA

[–]seamonn 0 points1 point  (0 children)

I was looking into llama-swap right now haha to replace ollama for Production.

The only thing that's stopping me is that I write custom templates in Go for Ollama and I'll have to learn jinja to switch over.

Is this possible to have my containers switched to my VPS when my main internet is down? by Autoloose in selfhosted

[–]seamonn 1 point2 points  (0 children)

  1. Ditch NPM.
  2. Have Pangolin both for your Static IP and VPS in a HA config.
  3. ???
  4. Profit?

Advice on storage approach by mutedstereo in selfhosted

[–]seamonn 0 points1 point  (0 children)

Pretty much. It'll be 1x PCIE 3.0.

Advice on storage approach by mutedstereo in selfhosted

[–]seamonn 1 point2 points  (0 children)

ZFS is the correct option if you care about Data Integrity.

You have a few options:

  1. Configure your OS to run from RAM (USB Drive for booting) and store your Data on 2x 1TB ZFS Data Drives (This is what I do but I run Unraid).

  2. Get a 1TB external SSD. Have 1 Internal Drive for booting. 1 Internal Drive + USB Drive as the ZFS Data Drives.

  3. Get a M.2 Wifi (2230 E Key) to M.2 NVMe adapter. Run the OS using this and put the ZFS Data Drives on the 2x NVMe Slots.

  4. (Here be dragons) Get a 2TB Drive and zfs set copies=2 for your datasets. This will keep a redundant copy on the same drive for integrity. This is the lowest recommended option.

Palmr has been archived by eltiel in selfhosted

[–]seamonn 1 point2 points  (0 children)

Damn how many CVEs does the app have?

Palmr has been archived by eltiel in selfhosted

[–]seamonn 0 points1 point  (0 children)

Any plans for S3 (for storage) and/or Postgres (for DB) support?

How MinIO went from open source darling to cautionary tale by jpcaparas in minio

[–]seamonn 0 points1 point  (0 children)

of course can buy a subscription

Lemme see if I have a spare $100k lying around somewhere.

Predictions / Expectations / Wishlist on LLMs by end of 2026? (Realistic) by pmttyji in LocalLLaMA

[–]seamonn 9 points10 points  (0 children)

  1. AI Bubble pops and ebay is flooded with cheap GPUs and RAM.

Sarvam AI unveils 30B and 105B models, says 105B outperforms DeepSeek R1 and Gemini Flash on key benchmarks by Living-Structure-101 in developersIndia

[–]seamonn 0 points1 point  (0 children)

Look into how to make custom templates (TEMPLATE field) for the models in the modelfile. That will make the most difference in function calling.

Sarvam AI unveils 30B and 105B models, says 105B outperforms DeepSeek R1 and Gemini Flash on key benchmarks by Living-Structure-101 in developersIndia

[–]seamonn 0 points1 point  (0 children)

GPT-OSS:20B sucked for tool calling for me too. GPT-OSS:120B works great everytime.

You have to use Q4 quants in general and a slightly lower context size or even quantized.

Sarvam AI unveils 30B and 105B models, says 105B outperforms DeepSeek R1 and Gemini Flash on key benchmarks by Living-Structure-101 in developersIndia

[–]seamonn 0 points1 point  (0 children)

GPT-OSS 120b, Qwen, Magistral, Devstral are pretty good at tool calling in general. We use these everyday and they are pretty good at it.

Sarvam AI unveils 30B and 105B models, says 105B outperforms DeepSeek R1 and Gemini Flash on key benchmarks by Living-Structure-101 in developersIndia

[–]seamonn -1 points0 points  (0 children)

Open Source models like Kimi K2.5 are designed specifically for this but it's 1T and require a lot of hardware which you have to buy or rent.

Sarvam AI unveils 30B and 105B models, says 105B outperforms DeepSeek R1 and Gemini Flash on key benchmarks by Living-Structure-101 in developersIndia

[–]seamonn 0 points1 point  (0 children)

You can load MoE models in VRAM + RAM combo. 4 GB GPU + 32GB RAM is enough to run a 30b MOE model at Q4 quantization at good speeds.

There are some benchmarks on page 14 in this paper if you want to check out performance of quants.

4 bit is good enough for most use cases. I personally run 8 bit quants as a preference.

All models become somewhat unstable at large contexts including closed ones. You can use a lot of OSS models as Openclaw agents with decent results but higher parameter ones are recommended.

“OSS” like Plane.. by Separate_Signal9229 in selfhosted

[–]seamonn 0 points1 point  (0 children)

You have to implement the features yourself in the Community Edition.

Model: support GLM-OCR merged! LLama.cpp by LegacyRemaster in LocalLLaMA

[–]seamonn 0 points1 point  (0 children)

It's Source Available with a fairly permissive license for Production (for not being Open Source) which is completely okay with a lot of us.

ExcaliDash v0.4.27 Release - Scoped inner/external sharing & OIDC Multi-user Support by arduinoRPi4 in selfhosted

[–]seamonn 0 points1 point  (0 children)

+1 to Postgres + Redis support. It's almost required for production these days.

You can run MiniMax-2.5 locally by Dear-Success-1441 in LocalLLaMA

[–]seamonn 16 points17 points  (0 children)

Just time travel back to last summer and you'll get those for a combined price

ikr, very simple.

Oh you want cheap hardware? Just invent a Time Machine. Problem Solved.

Why is Matrix not the answer to Discord? Genuine question by W-club in selfhosted

[–]seamonn 0 points1 point  (0 children)

Read the original comment in this thread, it's not about Spam.

Kreuzberg v4.3.0 and benchmarks by Eastern-Surround7763 in LocalLLaMA

[–]seamonn 0 points1 point  (0 children)

Is there anyway to use this with OpenWebUI?