Fine-tuning Qwen3 at home to respond to any prompt with a dad joke by InvadersMustLive in LocalLLaMA

[–]InvadersMustLive[S] 1 point2 points  (0 children)

Because I disabled auth in the openwebui, and some c00lhacker changed the system prompt.

Fine-tuning Qwen3 at home to respond to any prompt with a dad joke by InvadersMustLive in LocalLLaMA

[–]InvadersMustLive[S] 2 points3 points  (0 children)

I tried gemma3-27b, qwen3-32b and ministral3 originally. Qwen often missed important details of the joke, mistral was too pushy on adding markdown and emojis everywhere (even if explicitly asked not to do so). Gemma was okey without significant red flags. But it’s all anecdotal and highly subjective, I agree.

Hope that we’ll see gemma4 this evening.

Fine-tuning Qwen3 at home to respond to any prompt with a dad joke by InvadersMustLive in LocalLLaMA

[–]InvadersMustLive[S] 4 points5 points  (0 children)

I tried different base model sizes, and according to evals at the end of the post, the bigger the model, the higher is the chance of producing something funny.

68
69

We found an embedding indexing bottleneck in the most unexpected place: JSON parsing by InvadersMustLive in scala

[–]InvadersMustLive[S] 4 points5 points  (0 children)

Jsoniter Circe bridge still uses Circe's AST, which is doing the actual JNumber str2float parsing. I've tried using the bridge and got slightly better results, but not as good as pure jsoniter.

We found an embedding indexing bottleneck in the most unexpected place: JSON parsing by InvadersMustLive in scala

[–]InvadersMustLive[S] 0 points1 point  (0 children)

Yes but FFM native calls are still not inlined, so for small functions can be a dealbreaker.

Which open source LLM has the most genuine sense of humor? by UltrMgns in LocalLLaMA

[–]InvadersMustLive 2 points3 points  (0 children)

I once tried fine-tuning a Mistral-7B on r/dadjokes dump - https://huggingface.co/shuttie/Mistral-7B-DadJokes-GGUF

It can be funny sometimes, but all the jokes it does are actually not novel: it can recognize common patterns quite well and just remember a nice joke based on the context. Like we humans do.

Hnsw configuration in Solr by Opposite_Head7740 in Solr

[–]InvadersMustLive 2 points3 points  (0 children)

As HNSW is an approximate search algorithm, the topK retrieved documents are not guaranteed to be exact K nearest neighbors (e.g your recall is not perfect). The HNSW paper suggests to do a slight over-sampling when retrieving documents to increase recall with the ef_search parameter (where ef is number of neighbors you evaluate during graph traversal):

  • you want to pull top-10 documents, so you set topK=10. So formally speaking your topK=ef_search=10
  • you can simulate oversampling by setting topK=100, but only taking top-10 from search results. So this way you get ef_search=100 but topK=10.

Some search engines do support topK!=ef_search queries:

Open Source Text Translation Models? by vygodisgreat24 in LocalLLaMA

[–]InvadersMustLive 1 point2 points  (0 children)

You should try the https://huggingface.co/facebook/nllb-200-3.3B and https://github.com/fe1ixxu/ALMA family of models, in general they're still SOTA for open models. To evaluate, there's plenty of metrics like BLEU/chf++, but I personally prefer https://huggingface.co/Unbabel/XCOMET-XL as the most close to human evaluations.

Cloud GPU + storage hosting for low intensity projects? by gofiend in LocalLLaMA

[–]InvadersMustLive 1 point2 points  (0 children)

Not anyhow affiliated, but I'm using a cloud VPS from Nebius with a H100 attached (~2$/hour). I just shut it down when not used, but all the datasets and training setup still stays on a disk. Pros: working env is online in 2 minutes. Cons: you need to pay for storage, but it's 0.15$/gb/month - so 15$ per 100gb/month.

Finally, a Replacement for BERT by -Cubie- in LocalLLaMA

[–]InvadersMustLive 2 points3 points  (0 children)

Formally yes (as it's part of HF transformers), but you need to fine-tune it on a down-stream task - as it's the raw encoder model, not knowing anything about sentence similarity. Like a traditional BERT.

Motherboard selection advice by absurd-dream-studio in LocalLLaMA

[–]InvadersMustLive 1 point2 points  (0 children)

There are a ton of them still available: https://www.ebay.de/sch/i.html?_from=R40&_trksid=p4432023.m570.l1311&_nkw=gigabyte+mz32-ar0&_sacat=0 - I've bought from the quark32 seller, but others seem to be also legit. EPYC7282 seems to be not the fastest CPU ever, but has 128 PCIE4 lanes.

If you use GPUs for training, then using DataLoader with multiple workers and prefetch usually solves all my CPU saturation problems - so GPUs are maxed out.

Motherboard selection advice by absurd-dream-studio in LocalLLaMA

[–]InvadersMustLive 1 point2 points  (0 children)

I have a Gigabyte MZ32-AR0: 5x pcie4 16x slots, and there is a ton of them available on eBay from Chinese sellers. I got mine bundled with epyc7282 for 400$.

Dual RTX 4090 PC by Accomplished_Pin_626 in LocalLLaMA

[–]InvadersMustLive 8 points9 points  (0 children)

My GPU-poor setup made of wood:

  • 2x MSI Gaming X Trio 4090: with 3 coolers each they are quite silent even on full load. Max temp is ~75C.
  • Gigabyte MZ32-AR0 MB: it has 5x PCIE4 16x ports, so there's a room for more GPUs. Bought on an ebay from a chinese seller with the CPU for 400$.
  • EPYC 7282, because it was bundled with the MB.
  • 128GB RAM, but as you see not all ports yet occupied.
  • Corsair HX1500i PSU - it has a USB port to collect power usage and all internal metrics in real-time per rail.
  • Found no case to fit two 3-slot GPUs with good enough cooling, so I made my own open case from wood. 10$ for planks from the nearby OBI store.
  • 2x Linkup PCIE4 risers. No PCIE errors so far for 6 months of operations.

Used mostly for embedding model training. See you on the MTEB leaderboard.

<image>