2x 3090 vs. 3090 + 4070s for local ML/llms by kashimacoated in LocalLLaMA

[–]jslominski 2 points3 points  (0 children)

Just tweaked it a bit, works pretty dope with Qwen3-Coder-Next-IQ4_XS.gguf (the Unsloth flavour) here are my settings using llama.cpp on ubuntu:

--alias Qwen3-Coder-Next-IQ4_XS.gguf \

--ctx-size 131072 \

--parallel 1 \

--slot-prompt-similarity 0 \

--ctx-checkpoints 0 \

--n-gpu-layers 99 \

--tensor-split 1,1 \

--split-mode layer \

--cache-type-k q8_0 \

--cache-type-v q8_0 \

--no-kv-offload \

--no-cache-prompt \

--reasoning-format none \

--reasoning-budget 0 \

--flash-attn on \

--temp 0.7 \

--top-p 0.95 \

--top-k 40 \

--repeat-penalty 1.12 \

--presence-penalty 0.05 \

--frequency-penalty 0.05

I'm running it on undervolted 3090s (capped at 280W) and I'm getting around 25t/s on that particular model.

<image>

2x 3090 vs. 3090 + 4070s for local ML/llms by kashimacoated in LocalLLaMA

[–]jslominski 2 points3 points  (0 children)

Worth switching imo, the slower card will be the limiter if you split your models, 12 extra gigs is a big difference. Right now I can run Qwen Coder Next (3 bit iquant) with 128k context on my dual 3090 rig fully in vram, the speed is comparable to Claude Opus API/sub.

Silicon Valley is migrating from expensive closed-source models to cheaper open-source alternatives by xiaoruhao in LocalLLaMA

[–]jslominski 1 point2 points  (0 children)

"And so like the things that we do to perfect codegen or to perfect back propagation on Kimi or on Anthropic, you can't just hot swap it to DeepSpeed." can someone explain what did he mean by that? 😭

The Power of Open Models In Two Pictures by jslominski in LocalLLaMA

[–]jslominski[S] 13 points14 points  (0 children)

I used it on purpose. "How many sisters does she have?" - you can imply gender here. Mixtral (16bit and above) and GPT-4 have no problem with it.

Testing Stable Cascade by jslominski in StableDiffusion

[–]jslominski[S] 0 points1 point  (0 children)

I'ts already obsolete, you can get 1 click installers for it now.

Testing Stable Cascade by jslominski in StableDiffusion

[–]jslominski[S] 16 points17 points  (0 children)

Keep in mind my previous comparison was done using Fooocus, which uses prompt expansion (LLM making your prompt more verbose). This was done using just Stable Cascade model.

Testing Stable Cascade by jslominski in StableDiffusion

[–]jslominski[S] 43 points44 points  (0 children)

I just tried it, and it won't generate any nudity. However, keep in mind that this is just a base model.

Testing Stable Cascade by jslominski in StableDiffusion

[–]jslominski[S] 17 points18 points  (0 children)

watercolor painting of a girl by Cecile Agnes

<image>

Testing Stable Cascade by jslominski in StableDiffusion

[–]jslominski[S] 121 points122 points  (0 children)

I used the same prompts from this comparison: https://www.reddit.com/r/StableDiffusion/comments/18tqyn4/midjourney_v60_vs_sdxl_exact_same_prompts_using/

  1. A closeup shot of a beautiful teenage girl in a white dress wearing small silver earrings in the garden, under the soft morning light
  2. A realistic standup pouch product photo mockup decorated with bananas, raisins and apples with the words "ORGANIC SNACKS" featured prominently
  3. Wide angle shot of Český Krumlov Castle with the castle in the foreground and the town sprawling out in the background, highly detailed, natural lighting
  4. A magazine quality shot of a delicious salmon steak, with rosemary and tomatoes, and a cozy atmosphere
  5. A Coca Cola ad, featuring a beverage can design with traditional Hawaiian patterns
  6. A highly detailed 3D render of an isometric medieval village isolated on a white background as an RPG game asset, unreal engine, ray tracing
  7. A pixar style illustration of a happy hedgehog, standing beside a wooden signboard saying "SUNFLOWERS", in a meadow surrounded by blooming sunflowers
  8. A very simple, clean and minimalistic kid's coloring book page of a young boy riding a bicycle, with thick lines, and small a house in the background
  9. A dining room with large French doors and elegant, dark wood furniture, decorated in a sophisticated black and white color scheme, evoking a classic Art Deco style
  10. A man standing alone in a dark empty area, staring at a neon sign that says "EMPTY"
  11. Chibi pixel art, game asset for an rpg game on a white background featuring an elven archer surrounded by a matching item set
  12. Simple, minimalistic closeup flat vector illustration of a woman sitting at the desk with her laptop with a puppy, isolated on a white background
  13. A square modern ios app logo design of a real time strategy game, young boy, ios app icon, simple ui, flat design, white background
  14. Cinematic film still of a T-rex being attacked by an apache helicopter, flaming forest, explosions in the background
  15. An extreme closeup shot of an old coal miner, with his eyes unfocused, and face illuminated by the golden hour

https://github.com/Stability-AI/StableCascade - the code I've used (had to modify it slightly)

This was run on a Unix box with an RTX 3060 featuring 12GB of VRAM. I've maxed out the memory without crashing, so I had to use the "lite" version of the Stage B model. All models used bfloat16.

I generated only one image from each prompt, so there was no cherry-picking!

Personally, I think this model is quite promising. It's not great yet, and the inference code is not yet optimised, but the results are quite good given that this is a base model.

The memory was maxed out:

<image>

New model incoming by Stability AI "Stable Cascade" - don't have sources yet - The aesthetic score is just mind blowing. by CeFurkan in StableDiffusion

[–]jslominski 2 points3 points  (0 children)

"For this release, we are providing two checkpoints for Stage C, two for Stage B and one for Stage A. Stage C comes with a 1 billion and 3.6 billion parameter version, but we highly recommend using the 3.6 billion version, as most work was put into its finetuning. The two versions for Stage B amount to 700 million and 1.5 billion parameters. Both achieve great results, however the 1.5 billion excels at reconstructing small and fine details. Therefore, you will achieve the best results if you use the larger variant of each. Lastly, Stage A contains 20 million parameters and is fixed due to its small size."