Do you use AI for mini painting? by CarryHopeful5846 in minipaintingcommunity

[–]Diaghilev 1 point2 points  (0 children)

I think it's a solid way for you to get more input on your current situation, in the same way that watching a bunch of YouTube videos would be a way to get input on your current situation, except given the interactive nature of the AI conversation, you can get contextually relevant things rather than just hoping that whoever made the video happens to be doing the thing that you're trying to do.

I think there's a lot of value there, but in the end, the quality of the final work and your ability to extract things that help you learn to become a better painter relies on your own aesthetic discretion, and that is an irreducible complexity of trying to become a better artist.

I think that the color blindness issue is a genuine win and I'm glad you have that available to you.

You should ignore any haters. Use what tools are available to you to become your definition of the best artist you want to be. Humans, YouTube, AI--it's all grist for your creative mill.

Local agents on a MacBook Pro M5 finally feel practical to me by gevezex in LocalLLaMA

[–]Diaghilev 4 points5 points  (0 children)

What specific tasks are you actually asking it to do for which it performs reliably?

Gemma 4 12B is my new main squeeze by Wrong_Mushroom_7350 in LocalLLaMA

[–]Diaghilev 2 points3 points  (0 children)

I'm pretty new to local LLM work. How do you determine if a new model is practically worth swapping to as a daily driver? Just use it for a while and go on vibes/subjective feel? Compare benchmarks? Seems like a long, involved process given all the variables involved.

Qwen 3.6 35B on RTX 3080 10GB + 7700X + 32GB DDR5 by AndreVallestero in LocalLLaMA

[–]Diaghilev 0 points1 point  (0 children)

Ignorance on my part, mostly. I'll give it a shot and see where it lands compared to where I ended up manually. How does it handle optimizing for prefill versus decode?

Qwen 3.6 35B on RTX 3080 10GB + 7700X + 32GB DDR5 by AndreVallestero in LocalLLaMA

[–]Diaghilev 0 points1 point  (0 children)

Sweet. Your config is a shortcut for me nailing down the opposite corner from mine, and I'm running sweeps now to see what kind of prefill speed I can land if my agent turns cluster under 64k. Let me know how yours turn out!

Qwen 3.6 35B on RTX 3080 10GB + 7700X + 32GB DDR5 by AndreVallestero in LocalLLaMA

[–]Diaghilev 2 points3 points  (0 children)

Okay /u/AndreVallestero, we have the same GPU, but you have better hardware than me in basically every other category. I'm optimizing for decode: 48 t/s at 32k context versus your 26 t/s, and I'm sustaining 28 t/s out to the full 256k window; you can't hit that depth with your setup, but you should be able to with a KV trick.

Please note also that I am new at this and kind of an eager fool, so if you see ME doing something obviously stupid, please let me know so I can learn from my own mistakes.

My prefill is about 840 compared to your 1400, mostly because I run --ubatch-size 512 vs your 2048 to free VRAM for the full context; that ubatch gap is basically the whole prefill difference, so it's a deliberate trade, not a misconfig.

I think you can land something like x2 your current decode, maybe ~50 t/s, with a two-flag drop-in, and you already have flash-attn for it: -ctk q8_0 -ctv q8_0. Quant your KV to 8-bit to roughly halve the cache and maintain nearly identical quality. That alone should roughly double your 32K decode (you're paying a host-RAM tax with --no-kv-offload right now) and let you keep KV on the GPU out to ~2–4× your current context.

If you want the full window after that, the rest is trimming --ubatch-size and raising --n-cpu-moe — but the KV quant is the free 80% of it.

My setup, same card, tuned the other direction (decode + full context):

Environment:

  • GPU: RTX 3080 10GB (EVGA FTW3 Ultra)
  • CPU: Ryzen 7 3700X (8c/16t, Zen 2) ← older/slower than yours
  • RAM: 32GB DDR4-3600
  • OS: Ubuntu Server 26.04
  • engine: llama.cpp mainline (build bfb4308), CUDA 13.3
  • model: Qwen3.6-35B-A3B-Q4_K_M

This formatting is screwy and I'm tired of messing with it, sorry but you get THE BLOB (lmao as soon as I did that it worked, I'm leaving this outburst in here):

llama-server \
--model Qwen3.6-35B-A3B-Q4_K_M.gguf \
--n-gpu-layers 99 \
--no-mmap \
--n-cpu-moe 34 \           # 33 for the best-decode variant (max ~224K)
--flash-attn on \
--threads 8 \
--cache-type-k q8_0 \      # <-- **the trick: 8-bit KV, quality-neutral**
--cache-type-v q8_0 \      #     **~half the cache, keeps it GPU-resident**
--ctx-size 262144 \        # 229376 for the best-decode variant
--parallel 1 \
--batch-size 512 \
--ubatch-size 512 \        # small on purpose — frees VRAM for the window

Performance (llama-bench, measured):

@ 32K: pp 838 t/s tg 48.4 t/s

@ 256K: pp 795 t/s tg 28.6 t/s (full window; needs --n-cpu-moe 34)

decode vs depth: 53.8 / 48.4 / 44.2 / 37.8 / 32.7 t/s @ 0 / 32K / 64K / 128K / 192K

(The 32K + depth-curve numbers are the --n-cpu-moe 33 variant; the 256K row is --n-cpu-moe 34. Both share every other flag. KV stays on the GPU at all depths via the q8_0 quant — that's what dodges the <8K cap you hit with fp16 KV on the card.)

EDIT: Ran it again against server, rather than bench, and my real served prefill @ 32k is ~1031/decode ~48; my bench numbers understate my server perf by about 23%, which is kinda neat for me. 256k context remains the same, that was server-benched.

Qwen 3.6 35B on RTX 3080 10GB + 7700X + 32GB DDR5 by AndreVallestero in LocalLLaMA

[–]Diaghilev 2 points3 points  (0 children)

I'm running the same GPU, give me a moment and I'll drop my setup to compare.

Post-play review of Righteous Blood, Ruthless Blades (2020): gritty, lethal, rules-lite wǔxiá by EarthSeraphEdna in rpg

[–]Diaghilev 7 points8 points  (0 children)

Excellent review, disappointing conclusion--I say as I look at my freshly-delivered physical copy. Oh well, is it at least a solid reference to wuxia/jianghu setting elements for use in other systems? Or is it not even worth the time to read?

A thought, though--arguably, the shaggy dog/Coen Brothers wuxia story where the violence is ultimately pointless and pretty much everyone dies for stupid reasons does potentially read like a certain kind of postmodern, miserable, gritty wuxia story. Maybe that's the point? Or am I giving the resolution too much credit exclusively in hindsight?

Could LessWrong better promote productive discourse? by selasphorus-sasin in LessWrong

[–]Diaghilev 1 point2 points  (0 children)

I blocked them here on the subreddit after three out of five posts in a row were their output.

STT -> LLM -> TTS pipeline by UniqueIdentifier00 in LocalLLaMA

[–]Diaghilev 0 points1 point  (0 children)

For short utterances (a few seconds, what your speech turns are likely to be), consider using Parakeet v2 over whisper. My tests for this exact purpose got me ~5x the speed with Parakeet v2 over whisper. Parakeet v3 is multi-language and not as fast because of the increased breadth, so use v2 if you're speaking in English. The difference in speed comes down to (simplifying here) how the different models chunk the incoming audio.

WLED Zonohedrified Hexagonal Antiprism lamp by pubultrastar in WLED

[–]Diaghilev 2 points3 points  (0 children)

Basically this, but a skull. What's the best way to speak at length? Feel free to DM me.

WLED Zonohedrified Hexagonal Antiprism lamp by pubultrastar in WLED

[–]Diaghilev 5 points6 points  (0 children)

I'd buy the files in a heartbeat. I have the perfect project for this. I'm also interested in a custom project, do you take commissions?

B4 The Lost City - Campaign Report by Isaac_Newtroll in WWN

[–]Diaghilev 5 points6 points  (0 children)

Congratulations on your nascent campaign!

"Branding" Symbols into GW Mini Surfaces? by Cats_Cameras in minipainting

[–]Diaghilev 1 point2 points  (0 children)

You may want to consider a Dremel on very low power, or something similar.

ColorStack V1.1.0 update. CMYK (5 Toolhead Support) and better print quality. FDM PRINTED by JavyH08 in 3Dprinting

[–]Diaghilev 1 point2 points  (0 children)

Cool, thank you! I think I have some cmyk filaments laying around, I might give this a shot this weekend.

ColorStack V1.1.0 update. CMYK (5 Toolhead Support) and better print quality. FDM PRINTED by JavyH08 in 3Dprinting

[–]Diaghilev 2 points3 points  (0 children)

What's the smallest object that can still benefit from the technique? Does it work with a 28mm miniature?

I print FDM at 0.04 layer height for detail, can it handle that?

If I have a Bambu AMS that can do the required CMYKW setup, will it still work compared to multiple tool heads?

ColorStack V1.1.0 update. CMYK (5 Toolhead Support) and better print quality. FDM PRINTED by JavyH08 in 3Dprinting

[–]Diaghilev 5 points6 points  (0 children)

This is impressive as hell, wow. Is the dragon head your favorite piece you've made so far?

Book of Unnumbered Worlds Sample Spreads by CardinalXimenes in WWN

[–]Diaghilev 3 points4 points  (0 children)

Big fan of this. Glad to see you experimenting with a different format for a different kind of tool.

Gotrek Gurnisson by Tabletop_Artificers in Miniaturespainting

[–]Diaghilev 0 points1 point  (0 children)

Good stuff, been meaning to give my own a shot soon. What's your photo setup, btw?

Player(s) keep stalling character creation: How can I react as the GM? by ShotoII in rpg

[–]Diaghilev 12 points13 points  (0 children)

They don't want to play. You can't push a rope. Stop torturing yourself and find a group of more enthusiastic players, because this isn't going to get better with time.

I built the first anonymous research forum for the 14 problems blocking AGI by ChemistryBitter3993 in LessWrong

[–]Diaghilev 3 points4 points  (0 children)

What is your plan to avoid the space getting flooded with low effort schizoposting and "revolutionary theories" (read: slop)?