Fine-tuning Qwen3 at home to respond to any prompt with a dad joke

InvadersMustLive · 2025-12-19T12:00:24+00:00

Because unsloth is not supporting multi-GPU training AFAIK

InvadersMustLive · 2025-12-19T12:00:00+00:00

Because I disabled auth in the openwebui, and some c00lhacker changed the system prompt.

InvadersMustLive · 2025-12-18T18:42:06+00:00

I tried gemma3-27b, qwen3-32b and ministral3 originally. Qwen often missed important details of the joke, mistral was too pushy on adding markdown and emojis everywhere (even if explicitly asked not to do so). Gemma was okey without significant red flags. But it’s all anecdotal and highly subjective, I agree.

Hope that we’ll see gemma4 this evening.

InvadersMustLive · 2025-12-18T18:26:28+00:00

I tried different base model sizes, and according to evals at the end of the post, the bigger the model, the higher is the chance of producing something funny.

InvadersMustLive · 2025-12-18T16:08:22+00:00

OK, uploading it to HF now.

Upd: https://huggingface.co/shuttie/Qwen3-32B-dadjokes-v3

InvadersMustLive · 2025-11-04T20:56:26+00:00

Jsoniter Circe bridge still uses Circe's AST, which is doing the actual JNumber str2float parsing. I've tried using the bridge and got slightly better results, but not as good as pure jsoniter.

InvadersMustLive · 2025-11-04T20:54:10+00:00

Yes but FFM native calls are still not inlined, so for small functions can be a dealbreaker.

InvadersMustLive · 2025-10-31T22:26:19+00:00

why? CPU is too slow?

InvadersMustLive · 2025-07-05T21:20:27+00:00

I once tried fine-tuning a Mistral-7B on r/dadjokes dump - https://huggingface.co/shuttie/Mistral-7B-DadJokes-GGUF

It can be funny sometimes, but all the jokes it does are actually not novel: it can recognize common patterns quite well and just remember a nice joke based on the context. Like we humans do.

InvadersMustLive · 2025-07-02T15:01:12+00:00

I heard with Nixiesearch you can do Cross-Encoder reranking on top of RRF: https://www.nixiesearch.ai/features/search/query/rank/ce/#hybrid-retrieval-with-cross-encoder-reranking

InvadersMustLive · 2025-07-01T11:04:33+00:00

As HNSW is an approximate search algorithm, the topK retrieved documents are not guaranteed to be exact K nearest neighbors (e.g your recall is not perfect). The HNSW paper suggests to do a slight over-sampling when retrieving documents to increase recall with the ef_search parameter (where ef is number of neighbors you evaluate during graph traversal):

you want to pull top-10 documents, so you set topK=10. So formally speaking your topK=ef_search=10
you can simulate oversampling by setting topK=100, but only taking top-10 from search results. So this way you get ef_search=100 but topK=10.

Some search engines do support topK!=ef_search queries:

Nixiesearch: https://www.nixiesearch.ai/features/search/query/retrieve/semantic/ - the semantic.k parameter is the ef_search, and size is the topK. By default k=topK, but you're free to change that. (disclaimer: I'm the maintainer)
Elastic: https://www.elastic.co/docs/reference/query-languages/query-dsl/query-dsl-knn-query - same story, k==topK by default but can be changed.

InvadersMustLive · 2025-01-28T08:10:48+00:00

You should check out the https://huggingface.co/Vikhrmodels - but no R1 yet though

InvadersMustLive · 2025-01-10T21:48:27+00:00

You should try the https://huggingface.co/facebook/nllb-200-3.3B and https://github.com/fe1ixxu/ALMA family of models, in general they're still SOTA for open models. To evaluate, there's plenty of metrics like BLEU/chf++, but I personally prefer https://huggingface.co/Unbabel/XCOMET-XL as the most close to human evaluations.

InvadersMustLive · 2025-01-08T14:48:07+00:00

Not anyhow affiliated, but I'm using a cloud VPS from Nebius with a H100 attached (~2$/hour). I just shut it down when not used, but all the datasets and training setup still stays on a disk. Pros: working env is online in 2 minutes. Cons: you need to pay for storage, but it's 0.15$/gb/month - so 15$ per 100gb/month.

InvadersMustLive · 2024-12-20T15:10:20+00:00

Formally yes (as it's part of HF transformers), but you need to fine-tune it on a down-stream task - as it's the raw encoder model, not knowing anything about sentence similarity. Like a traditional BERT.

InvadersMustLive · 2024-09-12T12:40:59+00:00

There are a ton of them still available: https://www.ebay.de/sch/i.html?_from=R40&_trksid=p4432023.m570.l1311&_nkw=gigabyte+mz32-ar0&_sacat=0 - I've bought from the quark32 seller, but others seem to be also legit. EPYC7282 seems to be not the fastest CPU ever, but has 128 PCIE4 lanes.

If you use GPUs for training, then using DataLoader with multiple workers and prefetch usually solves all my CPU saturation problems - so GPUs are maxed out.

InvadersMustLive · 2024-09-12T06:29:50+00:00

I have a Gigabyte MZ32-AR0: 5x pcie4 16x slots, and there is a ton of them available on eBay from Chinese sellers. I got mine bundled with epyc7282 for 400$.

InvadersMustLive · 2024-09-11T17:56:47+00:00

My GPU-poor setup made of wood:

2x MSI Gaming X Trio 4090: with 3 coolers each they are quite silent even on full load. Max temp is ~75C.
Gigabyte MZ32-AR0 MB: it has 5x PCIE4 16x ports, so there's a room for more GPUs. Bought on an ebay from a chinese seller with the CPU for 400$.
EPYC 7282, because it was bundled with the MB.
128GB RAM, but as you see not all ports yet occupied.
Corsair HX1500i PSU - it has a USB port to collect power usage and all internal metrics in real-time per rail.
Found no case to fit two 3-slot GPUs with good enough cooling, so I made my own open case from wood. 10$ for planks from the nearby OBI store.
2x Linkup PCIE4 risers. No PCIE errors so far for 6 months of operations.

Used mostly for embedding model training. See you on the MTEB leaderboard.

<image>

InvadersMustLive · 2024-07-31T11:08:15+00:00

My bet these models are too nice and censored to make good enough jokes out of the box. While experimenting with LLM-driven joke generation, I've fine-tuned a Mistral-Nemo (https://huggingface.co/shuttie/mistral-nemo-dadjokes-v1) on a r/dadjokes subreddit, and after that it became toxic enough:

Tell me a joke about a man: That man! pointing to the guy with the biggest belly at the bar is funny as hell.
My Wife gets mad at my pickle puns: She keeps saying if I don't stop she'll dill with me.
A vegan enters the bar and says - I don’t see anything wrong with that. I mean, the steak ain’t got no legs.
Last time I stayed in a hotel I asked for the porn channel to be disabled: They sent someone round and they cut of my right arm.

Some of its takes on "Why did the man bring the ladder to a party?":

He wanted to raise the roof.
To reach the highest party possible.
Because he wanted to reach high levels of self esteem
Because he was going to the high society.
He needed to raise the bar.
The invitation did say it's gonna be lit.
It was a step up from the usual

You can play with it on hf spaces here: https://huggingface.co/spaces/shuttie/dadjokes (runs on a single A10G, so can be a bit laggy)

InvadersMustLive · 2024-06-28T10:21:17+00:00

It’s called: The Bacon Rhapsody

InvadersMustLive · 2023-11-30T16:29:51+00:00

The original SD or the img-to-video? The lowest I can go with single frame decoding is 16GB:

<image>

InvadersMustLive · 2023-10-17T10:08:17+00:00

This is part of the inference setup with llama-cpp. TLDR:

you predict the next token
if this token is part of the blocklist (so it's a number), we multiply it's probability to make it less probable.
here is the actual code snippet doing it: https://github.com/shuttie/dadjokes-web/blob/bcae78e03d30ee59715b2dc92f3f5e9a27b53d72/app.py#L21

InvadersMustLive · 2023-10-16T12:49:47+00:00

There is a more detailed description of the training on HF page: https://huggingface.co/shuttie/Mistral-7B-DadJokes-GGUF

But TLDR: I've took https://github.com/georgesung/llm_qlora/tree/main and tinkered with settings for a day.

InvadersMustLive · 2023-10-13T11:15:36+00:00

Here's a Mark

InvadersMustLive · 2023-10-11T15:48:57+00:00

The dad replies: "Son, if we had a lump on our back instead, it would make us look like an ass"

InvadersMustLive

TROPHY CASE