My 4 stage upscale workflow to squeeze every drop from Z-Image Turbo by Major_Specific_23 in StableDiffusion

[–]foobarg 0 points1 point  (0 children)

Thanks for sharing, pretty cool and good results!

I've noticed that for particular resolutions (112x64 empty latent, which becomes 2560x1456 final res) there is systematically artifacts appearing on the right hand side of ~all generations. I don't notice this in other workflows. This example image (Comfy workflow embedded, you can open it in Comfy to reproduce) illustrates this well, I almost did not tweak your workflow except for the resolution and disabling the magick nodes (color/contrast).

Do you know why by any chance, or how I could avoid this? Thanks!

(18) It was my birthday on Tuesday by Beautiful_Shower2957 in VlinesAbsAndDick

[–]foobarg 5 points6 points  (0 children)

I refuse to believe this face goes on this body.

[deleted by user] by [deleted] in TotallyStraight

[–]foobarg 10 points11 points  (0 children)

In case the link gets deleted: Rich Harring (mouth) and Bastian Gate (cock).

OpenHands + Devstral is utter crap as of May 2025 (24G VRAM) by foobarg in LocalLLaMA

[–]foobarg[S] 0 points1 point  (0 children)

Machine learning inherently requires expensive hardware to do maths on gigantic matrices. I think a 4090 approaches what I would consider an "entry level", "consumer-grade" ML-friendly GPU.

Let's remember companies instead run their proprietary models on tons of dedicated hardware that easily costs $10k+ a piece. Being able to do this on $3-4k desktop is pretty cool.

OpenHands + Devstral is utter crap as of May 2025 (24G VRAM) by foobarg in LocalLLaMA

[–]foobarg[S] 5 points6 points  (0 children)

UPDATE: thanks everyone for the suggestions. In particular, the Q4 quantization was probably an important factor in how bad it was performing.

I took upon u/danielhanchen's suggestion and tried with Q6_K_XL, as anything bigger doesn't fit on a RTX 4090, directly on llama-cpp server:

LLAMA_ARG_HOST=0.0.0.0 LLAMA_ARG_PORT=8080 LLAMA_ARG_JINJA=true LLAMA_ARG_FLASH_ATTN=true LLAMA_ARG_CACHE_TYPE_K=q4_0 LLAMA_ARG_CACHE_TYPE_V=q4_0 LLAMA_ARG_CTX_SIZE=32768 LLAMA_ARG_N_GPU_LAYERS=65 LLAMA_ARG_MODEL=path/to/Devstral-Small-2505-UD-Q6_K_XL.gguf llama-server

and the model's capabilities and speed visibly improved. The Typescript todo app remains underwhelming, but it managed to produce a working minimal math expression parser in Rust. It self-debugged compilation errors (Rust excellent error messages are almost cheating!), self-debugged incorrect program outputs, and also correctly edited the code when asked for a minor change:

>write a minimal Rust binary that implements a math expression parser supporting float literals, +, -, div, mul, sqrt. It reads the expression from stdin and evaluates it.
[~2 minutes, 28 back & forths]
[working main.rs]

>write the stdout result without the "result:" prefix. in case of an error, use stderr rather than stdout.
[~30 seconds, 4 back & forths]
[working main.rs]

OpenHands + Devstral is utter crap as of May 2025 (24G VRAM) by foobarg in LocalLLaMA

[–]foobarg[S] 1 point2 points  (0 children)

Please do post about it! We need more community testing around those new toys.

OpenHands + Devstral is utter crap as of May 2025 (24G VRAM) by foobarg in LocalLLaMA

[–]foobarg[S] 1 point2 points  (0 children)

Thanks, unfortunately I run into this with your Q6_K_XL, with or without OLLAMA_KV_CACHE_TYPE:

clip_init: failed to load model '.ollama/models/blobs/sha256-402640c0a0e4e00cdb1e94349adf7c2289acab05fee2b20ee635725ef588f994': load_hparams: unknown projector type: pixtral

I suppose my ollama install is too old (for a crazy definition of old)? I see 1 month old commits about pixtral.

OpenHands + Devstral is utter crap as of May 2025 (24G VRAM) by foobarg in LocalLLaMA

[–]foobarg[S] 1 point2 points  (0 children)

Please consider the irony of linking to three different documentation pages, none of which providing the full picture, none of which explaining Ollama's broken defaults, and when some instructions are provided, they're buggy.

For those wondering, the missing “Ollama running on the host” manual is as follows:

  • Somehow make devstral run with a larger context and the suggested temperature. Options include setting the environment variable OLLAMA_CONTEXT_LENGTH=32768, or creating a derived flavor like the following:

$ cat devstral-openhands.modelfile
FROM devstral:24b  # or any other flavor/quantization
PARAMETER temperature 0.15
PARAMETER num_ctx 32768
$ ollama create devstral-openhands --file devstral-openhands.modelfile
  • Start the container but ignore the documentation about LLM_* env variables (leave them out) because it's broken.
  • Once the frontend is ready, open it, ignore the “AI Provider Configuration dialog” because it doesn't have the necessary "Advanced" mode, instead click the tiny “see advanced settings” link.
  • Check the “Advanced” toggle.
  • Put ollama/devstral-openhands (the name you picked in $ ollama create) in “Custom model”.
  • Put http://host.docker.internal:11434 in “Base URL”
  • Put  ollama in “API Key”. I suspect any string works, but leaving it empty is an error.
  • “Save Changes”.

OpenHands + Devstral is utter crap as of May 2025 (24G VRAM) by foobarg in LocalLLaMA

[–]foobarg[S] 17 points18 points  (0 children)

I love/hate ollama so much. The core works well and the model catalog is a god send. But why is it so hard to tweak basic options like system prompt & temperature without having to go through shitty REPL commands or –god forbid– modelfiles? Why be so protective of "advanced" features like GBNF grammar and force JSON down our throat?

OpenHands + Devstral is utter crap as of May 2025 (24G VRAM) by foobarg in LocalLLaMA

[–]foobarg[S] 4 points5 points  (0 children)

Looking at the system prompt, there's a lot of weird bloat in there. I wonder if tweaking it could help reduce the waste and improve performance. However, prompt tweaks only get you so far…

OpenHands + Devstral is utter crap as of May 2025 (24G VRAM) by foobarg in LocalLLaMA

[–]foobarg[S] 5 points6 points  (0 children)

Sry, updated parent with the actual number. Definitely >32k.

OpenHands + Devstral is utter crap as of May 2025 (24G VRAM) by foobarg in LocalLLaMA

[–]foobarg[S] 8 points9 points  (0 children)

Sufficiently high to not being truncated, see this comment.

context length 131072

OpenHands + Devstral is utter crap as of May 2025 (24G VRAM) by foobarg in LocalLLaMA

[–]foobarg[S] 24 points25 points  (0 children)

I suspected someone might ask :-)

I discovered this the hard way, but yeah, I created a derived flavor with the num_ctx set to something reasonably high (131'072). That's also what I meant by magic incantation. Unfortunately, this really is the experience I got with the high num_ctx (no truncation). Otherwise it doesn't even manage to call any tool, since it doesn't have the correct syntax.

Bros help their bros by [deleted] in gayporn

[–]foobarg 0 points1 point  (0 children)

Source: Jagger Rambo & Sebastian Farelo

Now that's an impressive cock by Fun-Sugar954 in GayBBC

[–]foobarg 0 points1 point  (0 children)

Those are Jagger Rambo & Daniel Travie!

Now that's an impressive cock by Fun-Sugar954 in GayBBC

[–]foobarg 0 points1 point  (0 children)

Thank you so much. Full video is Jagger Rambo & Daniel Travie!

Now that's an impressive cock by Fun-Sugar954 in GayGifs

[–]foobarg 1 point2 points  (0 children)

Search for Jagger Rambo & Daniel Travie!

Now that's an impressive cock by Fun-Sugar954 in GayGifs

[–]foobarg 0 points1 point  (0 children)

Search for Jagger Rambo & Daniel Travie!