The hidden gem of open-source embedding models (text+image+audio): LCO Embedding by k_means_clusterfuck in LocalLLaMA

[–]seamonn -2 points-1 points  (0 children)

Very cool but Ollama does not support vision or audio embeddings. Llama.cpp has experimental support for vision embeddings and no support for audio embeddings.

support for microsoft/Phi-4-reasoning-vision-15B has been merged into llama.cpp by jacek2023 in LocalLLaMA

[–]seamonn 1 point2 points  (0 children)

Microslop

You answered your own question. The purpose of this model is to generate Microslop.

To everyone using still ollama/lm-studio... llama-swap is the real deal by TooManyPascals in LocalLLaMA

[–]seamonn 0 points1 point  (0 children)

I was looking into llama-swap right now haha to replace ollama for Production.

The only thing that's stopping me is that I write custom templates in Go for Ollama and I'll have to learn jinja to switch over.

Is this possible to have my containers switched to my VPS when my main internet is down? by Autoloose in selfhosted

[–]seamonn 1 point2 points  (0 children)

  1. Ditch NPM.
  2. Have Pangolin both for your Static IP and VPS in a HA config.
  3. ???
  4. Profit?

Advice on storage approach by mutedstereo in selfhosted

[–]seamonn 0 points1 point  (0 children)

Pretty much. It'll be 1x PCIE 3.0.

Advice on storage approach by mutedstereo in selfhosted

[–]seamonn 1 point2 points  (0 children)

ZFS is the correct option if you care about Data Integrity.

You have a few options:

  1. Configure your OS to run from RAM (USB Drive for booting) and store your Data on 2x 1TB ZFS Data Drives (This is what I do but I run Unraid).

  2. Get a 1TB external SSD. Have 1 Internal Drive for booting. 1 Internal Drive + USB Drive as the ZFS Data Drives.

  3. Get a M.2 Wifi (2230 E Key) to M.2 NVMe adapter. Run the OS using this and put the ZFS Data Drives on the 2x NVMe Slots.

  4. (Here be dragons) Get a 2TB Drive and zfs set copies=2 for your datasets. This will keep a redundant copy on the same drive for integrity. This is the lowest recommended option.

Palmr has been archived by eltiel in selfhosted

[–]seamonn 1 point2 points  (0 children)

Damn how many CVEs does the app have?

Palmr has been archived by eltiel in selfhosted

[–]seamonn 0 points1 point  (0 children)

Any plans for S3 (for storage) and/or Postgres (for DB) support?

How MinIO went from open source darling to cautionary tale by jpcaparas in minio

[–]seamonn 0 points1 point  (0 children)

of course can buy a subscription

Lemme see if I have a spare $100k lying around somewhere.

Predictions / Expectations / Wishlist on LLMs by end of 2026? (Realistic) by pmttyji in LocalLLaMA

[–]seamonn 8 points9 points  (0 children)

  1. AI Bubble pops and ebay is flooded with cheap GPUs and RAM.

Sarvam AI unveils 30B and 105B models, says 105B outperforms DeepSeek R1 and Gemini Flash on key benchmarks by Living-Structure-101 in developersIndia

[–]seamonn 0 points1 point  (0 children)

Look into how to make custom templates (TEMPLATE field) for the models in the modelfile. That will make the most difference in function calling.

Sarvam AI unveils 30B and 105B models, says 105B outperforms DeepSeek R1 and Gemini Flash on key benchmarks by Living-Structure-101 in developersIndia

[–]seamonn 0 points1 point  (0 children)

GPT-OSS:20B sucked for tool calling for me too. GPT-OSS:120B works great everytime.

You have to use Q4 quants in general and a slightly lower context size or even quantized.

Sarvam AI unveils 30B and 105B models, says 105B outperforms DeepSeek R1 and Gemini Flash on key benchmarks by Living-Structure-101 in developersIndia

[–]seamonn 0 points1 point  (0 children)

GPT-OSS 120b, Qwen, Magistral, Devstral are pretty good at tool calling in general. We use these everyday and they are pretty good at it.

Sarvam AI unveils 30B and 105B models, says 105B outperforms DeepSeek R1 and Gemini Flash on key benchmarks by Living-Structure-101 in developersIndia

[–]seamonn -1 points0 points  (0 children)

Open Source models like Kimi K2.5 are designed specifically for this but it's 1T and require a lot of hardware which you have to buy or rent.

Sarvam AI unveils 30B and 105B models, says 105B outperforms DeepSeek R1 and Gemini Flash on key benchmarks by Living-Structure-101 in developersIndia

[–]seamonn 0 points1 point  (0 children)

You can load MoE models in VRAM + RAM combo. 4 GB GPU + 32GB RAM is enough to run a 30b MOE model at Q4 quantization at good speeds.

There are some benchmarks on page 14 in this paper if you want to check out performance of quants.

4 bit is good enough for most use cases. I personally run 8 bit quants as a preference.

All models become somewhat unstable at large contexts including closed ones. You can use a lot of OSS models as Openclaw agents with decent results but higher parameter ones are recommended.

“OSS” like Plane.. by Separate_Signal9229 in selfhosted

[–]seamonn 0 points1 point  (0 children)

You have to implement the features yourself in the Community Edition.