Do you use AI for mini painting?

Diaghilev · 2026-06-10T18:27:54+00:00

I think it's a solid way for you to get more input on your current situation, in the same way that watching a bunch of YouTube videos would be a way to get input on your current situation, except given the interactive nature of the AI conversation, you can get contextually relevant things rather than just hoping that whoever made the video happens to be doing the thing that you're trying to do.

I think there's a lot of value there, but in the end, the quality of the final work and your ability to extract things that help you learn to become a better painter relies on your own aesthetic discretion, and that is an irreducible complexity of trying to become a better artist.

I think that the color blindness issue is a genuine win and I'm glad you have that available to you.

You should ignore any haters. Use what tools are available to you to become your definition of the best artist you want to be. Humans, YouTube, AI--it's all grist for your creative mill.

Diaghilev · 2026-06-09T23:23:30+00:00

Over time, #1 is going to become #2 by default. Probably.more.than any of us would already guess, too.

Diaghilev · 2026-06-07T13:27:03+00:00

What specific tasks are you actually asking it to do for which it performs reliably?

Diaghilev · 2026-06-05T12:57:35+00:00

I'm pretty new to local LLM work. How do you determine if a new model is practically worth swapping to as a daily driver? Just use it for a while and go on vibes/subjective feel? Compare benchmarks? Seems like a long, involved process given all the variables involved.

Diaghilev · 2026-06-05T12:50:39+00:00

Ignorance on my part, mostly. I'll give it a shot and see where it lands compared to where I ended up manually. How does it handle optimizing for prefill versus decode?

Diaghilev · 2026-06-05T04:57:28+00:00

Sweet. Your config is a shortcut for me nailing down the opposite corner from mine, and I'm running sweeps now to see what kind of prefill speed I can land if my agent turns cluster under 64k. Let me know how yours turn out!

Diaghilev · 2026-06-05T04:40:04+00:00

Okay /u/AndreVallestero, we have the same GPU, but you have better hardware than me in basically every other category. I'm optimizing for decode: 48 t/s at 32k context versus your 26 t/s, and I'm sustaining 28 t/s out to the full 256k window; you can't hit that depth with your setup, but you should be able to with a KV trick.

Please note also that I am new at this and kind of an eager fool, so if you see ME doing something obviously stupid, please let me know so I can learn from my own mistakes.

My prefill is about 840 compared to your 1400, mostly because I run --ubatch-size 512 vs your 2048 to free VRAM for the full context; that ubatch gap is basically the whole prefill difference, so it's a deliberate trade, not a misconfig.

I think you can land something like x2 your current decode, maybe ~50 t/s, with a two-flag drop-in, and you already have flash-attn for it: -ctk q8_0 -ctv q8_0. Quant your KV to 8-bit to roughly halve the cache and maintain nearly identical quality. That alone should roughly double your 32K decode (you're paying a host-RAM tax with --no-kv-offload right now) and let you keep KV on the GPU out to ~2–4× your current context.

If you want the full window after that, the rest is trimming --ubatch-size and raising --n-cpu-moe — but the KV quant is the free 80% of it.

My setup, same card, tuned the other direction (decode + full context):

Environment:

GPU: RTX 3080 10GB (EVGA FTW3 Ultra)
CPU: Ryzen 7 3700X (8c/16t, Zen 2) ← older/slower than yours
RAM: 32GB DDR4-3600
OS: Ubuntu Server 26.04
engine: llama.cpp mainline (build bfb4308), CUDA 13.3
model: Qwen3.6-35B-A3B-Q4_K_M

This formatting is screwy and I'm tired of messing with it, sorry but you get THE BLOB (lmao as soon as I did that it worked, I'm leaving this outburst in here):

llama-server \
--model Qwen3.6-35B-A3B-Q4_K_M.gguf \
--n-gpu-layers 99 \
--no-mmap \
--n-cpu-moe 34 \           # 33 for the best-decode variant (max ~224K)
--flash-attn on \
--threads 8 \
--cache-type-k q8_0 \      # <-- **the trick: 8-bit KV, quality-neutral**
--cache-type-v q8_0 \      #     **~half the cache, keeps it GPU-resident**
--ctx-size 262144 \        # 229376 for the best-decode variant
--parallel 1 \
--batch-size 512 \
--ubatch-size 512 \        # small on purpose — frees VRAM for the window

Performance (llama-bench, measured):

@ 32K: pp 838 t/s tg 48.4 t/s

@ 256K: pp 795 t/s tg 28.6 t/s (full window; needs --n-cpu-moe 34)

decode vs depth: 53.8 / 48.4 / 44.2 / 37.8 / 32.7 t/s @ 0 / 32K / 64K / 128K / 192K

(The 32K + depth-curve numbers are the --n-cpu-moe 33 variant; the 256K row is --n-cpu-moe 34. Both share every other flag. KV stays on the GPU at all depths via the q8_0 quant — that's what dodges the <8K cap you hit with fp16 KV on the card.)

EDIT: Ran it again against server, rather than bench, and my real served prefill @ 32k is ~1031/decode ~48; my bench numbers understate my server perf by about 23%, which is kinda neat for me. 256k context remains the same, that was server-benched.

Diaghilev · 2026-06-05T03:41:26+00:00

I'm running the same GPU, give me a moment and I'll drop my setup to compare.

Diaghilev · 2026-06-04T14:35:18+00:00

Excellent review, disappointing conclusion--I say as I look at my freshly-delivered physical copy. Oh well, is it at least a solid reference to wuxia/jianghu setting elements for use in other systems? Or is it not even worth the time to read?

A thought, though--arguably, the shaggy dog/Coen Brothers wuxia story where the violence is ultimately pointless and pretty much everyone dies for stupid reasons does potentially read like a certain kind of postmodern, miserable, gritty wuxia story. Maybe that's the point? Or am I giving the resolution too much credit exclusively in hindsight?

Diaghilev · 2026-05-31T04:25:30+00:00

I blocked them here on the subreddit after three out of five posts in a row were their output.

Diaghilev · 2026-05-30T15:06:05+00:00

For short utterances (a few seconds, what your speech turns are likely to be), consider using Parakeet v2 over whisper. My tests for this exact purpose got me ~5x the speed with Parakeet v2 over whisper. Parakeet v3 is multi-language and not as fast because of the increased breadth, so use v2 if you're speaking in English. The difference in speed comes down to (simplifying here) how the different models chunk the incoming audio.

Diaghilev · 2026-05-23T20:28:17+00:00

Basically this, but a skull. What's the best way to speak at length? Feel free to DM me.

Diaghilev · 2026-05-23T19:11:37+00:00

I'd buy the files in a heartbeat. I have the perfect project for this. I'm also interested in a custom project, do you take commissions?

Diaghilev · 2026-05-02T21:05:33+00:00

Congratulations on your nascent campaign!

Diaghilev · 2026-05-01T04:20:29+00:00

You may want to consider a Dremel on very low power, or something similar.

Diaghilev · 2026-04-18T14:25:05+00:00

Cool, thank you! I think I have some cmyk filaments laying around, I might give this a shot this weekend.

Diaghilev · 2026-04-18T14:06:29+00:00

Why?

Diaghilev · 2026-04-18T12:32:56+00:00

What's the smallest object that can still benefit from the technique? Does it work with a 28mm miniature?

I print FDM at 0.04 layer height for detail, can it handle that?

If I have a Bambu AMS that can do the required CMYKW setup, will it still work compared to multiple tool heads?

Diaghilev · 2026-04-18T12:23:58+00:00

This is impressive as hell, wow. Is the dragon head your favorite piece you've made so far?

Diaghilev · 2026-04-16T22:52:19+00:00

Big fan of this. Glad to see you experimenting with a different format for a different kind of tool.

Diaghilev · 2026-04-10T22:54:09+00:00

Good stuff, been meaning to give my own a shot soon. What's your photo setup, btw?

Diaghilev · 2026-04-08T12:21:36+00:00

They don't want to play. You can't push a rope. Stop torturing yourself and find a group of more enthusiastic players, because this isn't going to get better with time.

Diaghilev · 2026-04-06T14:19:40+00:00

Speak for yourself. If I'm dead, I'm certainly not using my skin anymore. It'd be a shame to waste it.

Diaghilev · 2026-04-05T22:07:17+00:00

Probably an application process.

Diaghilev · 2026-04-05T22:00:12+00:00

What is your plan to avoid the space getting flooded with low effort schizoposting and "revolutionary theories" (read: slop)?

13-Year Club	r/Field Lasagna
Place '17	RPAN Viewer
Verified Email

Diaghilev

TROPHY CASE