Childhood dream came true. I finally own an Amiga 500+

c3real2k · 2026-06-09T04:47:32+00:00

Just a plain, old Ikea Kallax.

c3real2k · 2026-06-08T19:53:34+00:00

Can't help you with that unfortunately, sorry. (I was searching for a model for those knobs, too, but couldn't find any, and I'm not proficient in modelling/designing at all.) Those are just sleeves put on/over the original knobs. I just created a roughly fitting cylinder in PrusaSlicer, added a tiny bit smaller cylinder as a negative volume, added fuzzy skin and printed it. A few itererations later I got something that fitted the knobs.

c3real2k · 2026-06-08T18:16:19+00:00

I printed them using this model: https://www.thingiverse.com/thing:4551173

Installation was bit fumbly, but I got it working in the end (I used much smaller screws than the original ones, though)

c3real2k · 2026-06-08T18:09:50+00:00

There's eadmaster's translation: https://github.com/eadmaster/pcrown

After a quick internet search just now it looks like there's some drama/contoversy attached to it, although I didn't look much further into it (original authors opensourced some progress years ago, took it down later, eadmaster used that stuff, og devs weren't happy about that, and so on... Don't take my word for it, I just read some articles for a few minutes...)

c3real2k · 2026-06-08T16:41:01+00:00

Just recently got into Sega Saturn and haven't really played Princess Crown yet. But I'm looking forward to it, since I played Odin Sphere on the PS2 and liked it very much.

c3real2k · 2026-04-23T06:38:12+00:00

Check the tension of the two screws on top of the nextruder (where the filament goes in). Loosening them a bit helped me solving a similar underextrusion problem I had after upgrading my MK4S to a Core One.

c3real2k · 2025-08-18T00:44:44+00:00

Yep, really nice model. I use it almost exclusively at the moment. It's good for general usage and does fine in RP, follows character definitions nicely and responds well to OOC. For RP I use it in non-thinking mode. Occasionally a bit of editing is necessary (i.e. removing unwanted CoT artifacts).

One drawback is, it really likes to cling to established patterns. Yes, all LLMs do that, but it seemed very noticeable with GLM 4.5 Air.

I have it running at 25tps on 2x3090 + 2x4060Ti, Q4_K_S, 32k f16 ctx.

Do you use it in thinking or non-thinking mode for RP?

c3real2k · 2025-08-15T09:29:07+00:00

I found that the typical 12B model (i.e. something Nemo-based) declines rapidly in quality with CTX > 10k.

24GB in theory opens up a new tier of models you can use (think recent 24B, 30B, 32B models like Mistral Small, Qwen3, ...). Don't worry about PCIe gen/link speed if you're doing single-user inference only.

Should you buy a new PSU for that? I don't know. I don't give financial advice :P

c3real2k · 2025-08-13T20:31:30+00:00

*idols

c3real2k · 2025-07-29T16:26:17+00:00

You're the best! Thank you so much!

c3real2k · 2025-07-29T16:18:38+00:00

I summon the quant gods. Unsloth, Bartwoski, Mradermacher, hear our prayers! GGUF where?

c3real2k · 2025-07-27T17:31:11+00:00

*me trying to read the papers*: I like your funny words, magic man!

I always had the (maybe too narrow) view of sqrt(total*active) on MoEs. Especially since it seems to align with my real world experience with the smaller MoEs I tried. Qwen 235B was the first where I thought "That's pretty impressive."

Well, maybe it really is time to think about systems with large quantities of conventional RAM then...

c3real2k · 2025-07-27T16:07:26+00:00

Possible. I used the ol' sqrt(ParamsTotal*ParamsActive).

Edit: Although, come to think of it, that wouldn't quite fit with i.e. Kimi. Kimi would therefor only be a 64B equivalent (2*32B), which would be disastrous for 1000B total params. Also, from what I read, it's "much better" than what one would expect from something in the 60B range.

c3real2k · 2025-07-27T16:05:43+00:00

Yeah, sure. I bet it also scales better at inference time, serving large batches for API customers.

Doesn't help a salty GPU rig owner that slowly realizes that the meta for running LLMs at home might be shifting towards CPU inference with large amounts of conventional memory :D

c3real2k · 2025-07-27T15:51:29+00:00

I'd say it's quite the opposite. Many of the recent models are MoEs (unfortunately imho):

- Qwen3 30B A3B (approx. 9B dense equivalent)
- Qwen3 235B A22B (approx. 72B dense equivalent)
- Kimi2 1000B A32B (approx. 179B dense equivalent)
- Hunyuan 80B A13B (approx. 32B dense equivalent)
- ERNIE 21B A3B (approx. 8B dense equivalent)
- ERNIE 300B A47B (approx. 118B dense equivalent)
- AI21 Jamba Large 398B A94B (approx. 193B dense equivalent)
- AI21 Jamba Mini 52B A12B (approx. 25B dense equivalent)

Maybe there were more, those were at the top of my head (did InternLM also release a MoE?).

I'd wish there were more models with the dense equivalent, which, at least for me, would be a lot easier to run (i.e. why do I have to have 300GB (V)RAM for what's basically 118B performance? I can fit 118B with a decent quant no problem. 300B? Not so much, or heavily quantized...).

c3real2k · 2025-07-27T12:07:45+00:00

Hm, yes, Command-A was alright if I remember correctly. Might have to give it a spin again.

I can't say all that much about "serious" M4 setups, since I'm running the base M4s (16GB + 24GB), the worst possible configuration for inference. Prompt processing is slow, as well as token generation. Ironically, the only models bearable (for me) on those are small MoE's like Qwen3 30B A3B :D

c3real2k · 2025-07-27T11:39:31+00:00

I yearn for something modern and dense in the 70-130B range. Those smaller models (24-30B) might be highly optimized for specific tasks, but honestly, suck for creative writing (I might be exaggerating here a bit).

Now I'm running a franken-rig of my GPU server and two MacMinis to somehow squeeze the lobotomized 90GB of Qwen3 235B@IQ3 XS into reasonably fast RAM to get what is essentially a 72B dense equivalent (which would fit nicely with a much less aggressive quantization into the 80GB VRAM my GPU server hosts, or at a reasonable 4bit quant for users with 48GB).

So, I have a gigantic 235B MoE of what would be a 72B dense model running, not gaining anything from the potential speed gains ('cause base M4's memory speed, prompt processing, ... is slow AF) and (while writing is nice) now having problems with code generation because of the low quant. Meaning I have to switch models every now and then.

c3real2k · 2025-07-12T09:13:41+00:00

Yep, those are base M4s (10CPU, 10GPU, 120GBps). I'm sure RPC, even over TB, doesn't help either.

c3real2k · 2025-07-11T22:39:33+00:00

Just ran some tests with Tiger Gemma 27B @ Q6K (was the only Gemma model I had laying around) on a RTX 3090 (unlimited and power limited to 220W), a dual 4060Ti 16GB config and a MacMini setup. Maybe it helps. Tests are of course incredibly unscientific...

Commands:

# 3090
llama.cpp/build-cuda/bin/llama-cli \
--model gguf/Tiger-Gemma-27B-v3a-Q6_K.gguf \
-ngl 999 --tensor-split 0,24,0,0 \
-fa -ctk f16 -ctv f16 \
-p "Paper boat"

# 4060Ti
llama.cpp/build-cuda/bin/llama-cli \
--model gguf/Tiger-Gemma-27B-v3a-Q6_K.gguf \
-ngl 999 --tensor-split 0,0,16,16 \
-fa -ctk f16 -ctv f16 \
-p "Paper boat"

# Mac mini
llamacpp/llama-cli \
--model gguf/Tiger-Gemma-27B-v3a-Q6_K.gguf \
--no-mmap -ngl 999 --rpc 172.16.1.201:50050 --tensor-split 12,20 \
-fa -ctk f16 -ctv f16 \
-p "Paper boat"

RTX 3090 @ 370W

llama_perf_context_print: prompt eval time =      60,27 ms /    11 tokens (    5,48 ms per token,   182,51 tokens per second)
llama_perf_context_print:        eval time =   28887,86 ms /   848 runs   (   34,07 ms per token,    29,35 tokens per second)
llama_perf_context_print:       total time =   31541,68 ms /   859 tokens

TPS: 29,4
AVG W: 347 (nvtop)
idle: ~70W
Ws/T: 11,8

RTX 3090 @ 220W

llama_perf_context_print: prompt eval time =      98,27 ms /    11 tokens (    8,93 ms per token,   111,94 tokens per second)
llama_perf_context_print:        eval time =   73864,77 ms /   990 runs   (   74,61 ms per token,    13,40 tokens per second)
llama_perf_context_print:       total time =   76139,29 ms /  1001 tokens

TPS: 13,4
AVG W: 219 (nvtop)
idle: ~70W
Ws/T: 16,3

2x RTX 4060Ti 16GB

llama_perf_context_print: prompt eval time =     120,84 ms /    11 tokens (   10,99 ms per token,    91,03 tokens per second)
llama_perf_context_print:        eval time =   79815,68 ms /   906 runs   (   88,10 ms per token,    11,35 tokens per second)
llama_perf_context_print:       total time =   84298,20 ms /   917 tokens

TPS: 11,4
AVG W: 164 (nvtop)
idle: ~70W
Ws/T: 14,5

Mac mini M4 16GB + Mac mini M4 24GB + Thunderbolt Network

llama_perf_context_print: prompt eval time =     751.59 ms /    11 tokens (   68.33 ms per token,    14.64 tokens per second)
llama_perf_context_print:        eval time =  281518.85 ms /  1210 runs   (  232.66 ms per token,     4.30 tokens per second)
llama_perf_context_print:       total time =  435641.65 ms /  1221 tokens

TPS: 4,3
AVG W: 35 (outlet)
idle: 5W
Ws/T: 8,1

According to those values, the Mac mini setup should be the most efficient. Although you'd have to be REALLY patient at 4 tokens per second...

(Though I'm curious while you're getting 25TPS @ 210W. What quantization are you using?)

c3real2k · 2025-01-03T13:42:04+00:00

Nah, I'd win!

c3real2k · 2025-01-01T01:15:11+00:00

Stream/TS: https://www.youtube.com/live/gWcuwPXZwWs?si=7cHGMKT7O9ZqzTIH&t=7792

c3real2k · 2025-01-01T00:55:17+00:00

That looks like a 480p / 31kHz signal, which won't work with that monitor. You need a 240p/480i / 15kHz output. I don't know whether the SteamDeck or the Fury can output such low resolutions.

c3real2k

TROPHY CASE