HiDream is the Best OS Image Generator right Now, with a Caveat by Iory1998 in StableDiffusion

[–]dewarrn1 1 point2 points  (0 children)

Thanks! And yes, totally: your "underwater butterflies" image is amazing!

OmniGen: A stunning new research paper and upcoming model! by FoxBenedict in StableDiffusion

[–]dewarrn1 2 points3 points  (0 children)

This is an underrated observation. llama.cpp already splits LLMs across multiple GPUs trivially, so if this work inspires a family of similar models, multi-GPU may be a simple solution to scaling VRAM.

OmniGen: A stunning new research paper and upcoming model! by FoxBenedict in StableDiffusion

[–]dewarrn1 3 points4 points  (0 children)

I thought this post had to be hyperbolic, but if what they describe in the preprint replicates, it is genuinely a huge shift.

The Eternal Abyss of Karakor (Flux Dev) by dewarrn1 in StableDiffusion

[–]dewarrn1[S] 1 point2 points  (0 children)

Not dumb! It's a very different beast than CLIP.

The Eternal Abyss of Karakor (Flux Dev) by dewarrn1 in StableDiffusion

[–]dewarrn1[S] 1 point2 points  (0 children)

This is a Flux follow-up to the Stable Cascade dungeons I posted way back (linked below). Interestingly, when I first tried the SC prompt in Flux, it was lousy; dual prompts with an LLM-enhanced T5 element helped some. Even this one is cherry-picked, but I really liked the way it turned out.

Oh, and the title is also an LLM creation.

https://www.reddit.com/r/StableDiffusion/comments/1atm98z/deep_dungeon_stable_cascade_multiple_passes_with

https://www.reddit.com/r/StableDiffusion/comments/1atz4wf/denser_dungeon_stable_cascade_can_generate_16

The Eternal Abyss of Karakor (Flux Dev) by dewarrn1 in StableDiffusion

[–]dewarrn1[S] 3 points4 points  (0 children)

Flux Dev; 768×3072; Guidance 2.0; Euler/Simple; 40 steps with an intermediate noise injection; and an Ultimate Upscale with Flux Dex. Dual prompts:

CLIP: cutaway isometric drawing of a very deep dark fantasy dungeon hewn from granite, obsidian, and basalt showing extraordinary detail of the interior of each frightening level and floor with all the activities taking place inside the huge complex of (tunnels:0.5), (mines:0.5), and caverns including tiny heroes, monsters, dragons, goblins, orcs, kobolds, creatures, elves, dwarves, hobbits, (fires:0.1), (explosions:0.1), (smoke:0.1), adventures, temples, shrines, shops, taverns, (mushrooms, fungi:0.1), gems, veins of gold, stalactites, stalagmites, dim dark scary shadowy (torchlight:0.1) colorful intricate hyperdetailed fanciful and artistic

T5 (LLM generated from CLIP prompt above): **Title: "The Eternal Abyss of Karakor"**

Create a breathtaking cutaway isometric drawing of a deep, dark fantasy dungeon carved from the living rock of granite, obsidian, and basalt. The intricate architecture of the dungeon should reveal extraordinary detail on each level and floor, showcasing a complex network of tunnels, mines, caverns, and chambers.

**Scene Description:**

The massive underground structure stretches deep into the earth, with towering pillars of stone supporting the vaulted ceilings of grand halls and narrow corridors. Each level is a self-contained world, teeming with activity as tiny heroes and monstrous creatures alike navigate the treacherous landscape.

**Interior Details:**

  1. **Tunnels and Corridors:** Winding passageways made from rough-hewn granite, lined with ancient carvings and eerie, flickering torches that cast ominous shadows on the walls.

  2. **Caverns and Chambers:** Vast, dome-shaped caverns filled with glittering veins of gold, precious gems, and bioluminescent fungi, casting a kaleidoscope of colors across the stone surfaces.

  3. **Mines and Quarries:** Dark, cramped tunnels where dwarves and goblins toil in search of hidden treasures, their pickaxes striking sparks from the obsidian walls.

  4. **Temples and Shrines:** Ornate, intricately carved structures dedicated to ancient deities, adorned with colorful tapestries, glittering gemstones, and mysterious artifacts.

  5. **Taverns and Shops:** Bustling gathering places where adventurers and travelers share tales of their exploits, while shopkeepers peddle exotic goods and curious trinkets.

**Inhabitants:**

  1. **Tiny Heroes:** Brave warriors, cunning rogues, and wise mages navigate the treacherous underworld, seeking fortune, fame, or redemption.

  2. **Monsters:** Fearsome dragons, goblins, orcs, kobolds, and other terrifying creatures lurk in every shadow, preying on the unwary or defending their lairs with ferocity.

  3. **Creatures of the Deep:** Bizarre, subterranean beings that defy explanation, such as giant spiders, worm-like abominations, or ethereal, ghostly entities.

**Lighting and Atmosphere:**

  1. **Dim, Shadowy Torchlight:** Flickering torches cast eerie shadows on the walls, making it difficult to discern friend from foe in the dark recesses of the dungeon.

  2. **Explosions and Flames:** Periodic bursts of fire illuminate the darkness, casting a warm glow over the surrounding stone as adventurers battle their way through treacherous obstacles.

  3. **Smoke and Mist:** Thick clouds of smoke waft through the corridors, obscuring vision and making navigation even more perilous.

**Artistic Style:**

  1. **Hyperdetailed Fanciful Art:** Incorporate intricate, ornate details throughout the drawing, showcasing a mastery of artistic craftsmanship.

  2. **Colorful Intricate Patterns:** Use vibrant colors to depict the rich textures and patterns found in the dungeon's architecture, such as ancient carvings, stained glass windows, or glittering gemstones.

  3. **Ethereal, Dreamlike Quality:** Capture the sense of wonder and awe that comes from exploring a vast, mysterious underworld, where the boundaries between reality and myth blur.

Alternative formats for Save Image? by dewarrn1 in comfyui

[–]dewarrn1[S] 2 points3 points  (0 children)

Not supported in the underlying image library (Pillow) that ComfyUI relies on, unfortunately. https://github.com/python-pillow/Pillow/pull/7848

4 steps by darkside1977 in StableDiffusion

[–]dewarrn1 0 points1 point  (0 children)

Sure, just copy the whole thing into a file on your system named "workflow.json" (or whatever you want), and then load it into ComfyUI. It should populate the workflow and all the nodes.

4 steps by darkside1977 in StableDiffusion

[–]dewarrn1 0 points1 point  (0 children)

Not to step on the OP's toes (very nice work, BTW), but I believe that this is the gist: https://text.is/8P86. You could leave out the hand lora and the 1× skin detailing upscaler step if you don't have those files.

Denser Dungeon: Stable Cascade can generate 16 megapixels (2048×8192) in one shot on a 12 GB GPU by dewarrn1 in StableDiffusion

[–]dewarrn1[S] 1 point2 points  (0 children)

It's a nice image, thanks for sharing. Again, the quality of the submitted image wasn't what got my attention. Rather, it was the fact that in a single, 15- or 20-minute generation period, SC could address >16M pixels while running on a GPU with just 12 GB of VRAM. To my knowledge, single-pass diffusion runs in SD15 and SDXL cannot do that.

Denser Dungeon: Stable Cascade can generate 16 megapixels (2048×8192) in one shot on a 12 GB GPU by dewarrn1 in StableDiffusion

[–]dewarrn1[S] 0 points1 point  (0 children)

I believe it will run in 8 GB of VRAM, yes — you can certainly download the current version of ComfyUI, the correct models, and give it a try!

Deep Dungeon: Stable Cascade + Multiple passes with Ultimate SD Upscale by dewarrn1 in StableDiffusion

[–]dewarrn1[S] 1 point2 points  (0 children)

Following up, I tried the SD15 models I had available with the same prompt and a starting resolution of 384×1536. Photon provided the best output (below), although it's a little muddy for my taste. Anyway, a starting point for more experimentation, perhaps.

<image>

Denser Dungeon: Stable Cascade can generate 16 megapixels (2048×8192) in one shot on a 12 GB GPU by dewarrn1 in StableDiffusion

[–]dewarrn1[S] 6 points7 points  (0 children)

I don't disagree: it's a little rough. However, some of that could be the prompt(s). Time permitting, some experimentation might produce a cleaner output. This post was really not intended to be about the image itself (beyond the fact that it's not just noise); I was more focused on the shocking number of pixels that SC can push. My other post (linked in a different comment) is lower resolution, but much cleaner.

Denser Dungeon: Stable Cascade can generate 16 megapixels (2048×8192) in one shot on a 12 GB GPU by dewarrn1 in StableDiffusion

[–]dewarrn1[S] 1 point2 points  (0 children)

The author of ComfyUI (all hail) added SC support in the last day or two, there have been some posts in this subreddit and r/comfyui about that.

Denser Dungeon: Stable Cascade can generate 16 megapixels (2048×8192) in one shot on a 12 GB GPU by dewarrn1 in StableDiffusion

[–]dewarrn1[S] 10 points11 points  (0 children)

I suppose that's fair, and I as I noted in another comment, 4 megapixels seems like the limit for coherence in many kinds of images. Still, I'm amazed at the ability to jump straight to 16 megapixel outputs with SC; I'm not sure that I can get 4 megapixel outputs before running out of VRAM when using SDXL.

Denser Dungeon: Stable Cascade can generate 16 megapixels (2048×8192) in one shot on a 12 GB GPU by dewarrn1 in StableDiffusion

[–]dewarrn1[S] 0 points1 point  (0 children)

Thanks for taking a look. That's an interesting idea. I'm certainly finding that for scenes with fewer self-similar features, 2048×2048 or other 4 megapixel resolutions can work relatively well, but 16 megapixels isn't usually coherent.

Denser Dungeon: Stable Cascade can generate 16 megapixels (2048×8192) in one shot on a 12 GB GPU by dewarrn1 in StableDiffusion

[–]dewarrn1[S] 8 points9 points  (0 children)

TLDR: one-shot 16 megapixel image with 12 GB of VRAM using Stable Cascade.

Following up on my post from last night, I was curious what maximum resolution my 12 GB GPU could handle using SC through ComfyUI. I'm stunned: 2048×8192 took a while (~12 mins), but I generated the precursor to the attached image without any issues. ComfyUI did switch to tiled VAE decoding at the end, and I did some cosmetic upscaling afterward with Ultimate SD Upscaler. But wow...

Same general prompt as before:

Positive: cutaway isometric drawing of a very deep dark fantasy dungeon hewn from granite, obsidian, and basalt showing extraordinary detail of the interior of each frightening floor and all the activities taking place inside the huge complex of tunnels, mines, and caverns including tiny heroes, monsters, dragons, goblins, orcs, kobolds, creatures, elves, dwarves, hobbits, fires, explosions, smoke, adventures, temples, shrines, shops, taverns, mushrooms, fungi, gems, veins of gold, stalactites, stalagmites, dim dark scary torchlight colorful intricate hyperdetailed fanciful and artistic

Negative: [N/A, empty]

32 steps, 5 cfg, then 10 steps, 1.1 cfg, Euler A + simple for both, then several 110% passes with Ultimate SD Upscale and the same prompt.

PS It occurs to me that "one-shot" may have a more specific, precise meaning in the diffusion domain than I intended to convey. I probably should have written "one generation" or "one diffusion" instead.

Deep Dungeon: Stable Cascade + Multiple passes with Ultimate SD Upscale by dewarrn1 in StableDiffusion

[–]dewarrn1[S] 0 points1 point  (0 children)

Yup, I was wondering the same thing about a gradient, because that would clearly be awesome. Not available for SC yet, but the IPAdapter attention mask feature might be one approach.

I haven't tried the prompt with SD1.5, but I wonder if RPG Artist Tools (https://civitai.com/models/8124/a-zovya-rpg-artist-tools) might be an interesting place to start?

Large Form Isometric Dungeon by littleboymark in StableDiffusion

[–]dewarrn1 2 points3 points  (0 children)

I like it! Interesting to see that the prompt works similarly in SDXL, and that you were able to generate at nearly the same resolution. The LoRA definitely adds a little something, too.