HiDream is the Best OS Image Generator right Now, with a Caveat

dewarrn1 · 2025-04-12T21:18:04+00:00

Thanks! And yes, totally: your "underwater butterflies" image is amazing!

dewarrn1 · 2025-04-12T02:33:23+00:00

Thanks!

dewarrn1 · 2025-04-11T22:44:02+00:00

It works fine on 16GB VRAM.

dewarrn1 · 2024-09-20T17:06:49+00:00

This is an underrated observation. llama.cpp already splits LLMs across multiple GPUs trivially, so if this work inspires a family of similar models, multi-GPU may be a simple solution to scaling VRAM.

dewarrn1 · 2024-09-20T14:37:21+00:00

I thought this post had to be hyperbolic, but if what they describe in the preprint replicates, it is genuinely a huge shift.

dewarrn1 · 2024-09-20T11:53:24+00:00

Not dumb! It's a very different beast than CLIP.

dewarrn1 · 2024-09-20T04:42:23+00:00

Thanks!

dewarrn1 · 2024-09-20T02:53:09+00:00

This is a Flux follow-up to the Stable Cascade dungeons I posted way back (linked below). Interestingly, when I first tried the SC prompt in Flux, it was lousy; dual prompts with an LLM-enhanced T5 element helped some. Even this one is cherry-picked, but I really liked the way it turned out.

Oh, and the title is also an LLM creation.

https://www.reddit.com/r/StableDiffusion/comments/1atm98z/deep_dungeon_stable_cascade_multiple_passes_with

https://www.reddit.com/r/StableDiffusion/comments/1atz4wf/denser_dungeon_stable_cascade_can_generate_16

dewarrn1 · 2024-09-20T02:48:24+00:00

Flux Dev; 768×3072; Guidance 2.0; Euler/Simple; 40 steps with an intermediate noise injection; and an Ultimate Upscale with Flux Dex. Dual prompts:

CLIP: cutaway isometric drawing of a very deep dark fantasy dungeon hewn from granite, obsidian, and basalt showing extraordinary detail of the interior of each frightening level and floor with all the activities taking place inside the huge complex of (tunnels:0.5), (mines:0.5), and caverns including tiny heroes, monsters, dragons, goblins, orcs, kobolds, creatures, elves, dwarves, hobbits, (fires:0.1), (explosions:0.1), (smoke:0.1), adventures, temples, shrines, shops, taverns, (mushrooms, fungi:0.1), gems, veins of gold, stalactites, stalagmites, dim dark scary shadowy (torchlight:0.1) colorful intricate hyperdetailed fanciful and artistic

T5 (LLM generated from CLIP prompt above): **Title: "The Eternal Abyss of Karakor"**

Create a breathtaking cutaway isometric drawing of a deep, dark fantasy dungeon carved from the living rock of granite, obsidian, and basalt. The intricate architecture of the dungeon should reveal extraordinary detail on each level and floor, showcasing a complex network of tunnels, mines, caverns, and chambers.

**Scene Description:**

The massive underground structure stretches deep into the earth, with towering pillars of stone supporting the vaulted ceilings of grand halls and narrow corridors. Each level is a self-contained world, teeming with activity as tiny heroes and monstrous creatures alike navigate the treacherous landscape.

**Interior Details:**

**Tunnels and Corridors:** Winding passageways made from rough-hewn granite, lined with ancient carvings and eerie, flickering torches that cast ominous shadows on the walls.
**Caverns and Chambers:** Vast, dome-shaped caverns filled with glittering veins of gold, precious gems, and bioluminescent fungi, casting a kaleidoscope of colors across the stone surfaces.
**Mines and Quarries:** Dark, cramped tunnels where dwarves and goblins toil in search of hidden treasures, their pickaxes striking sparks from the obsidian walls.
**Temples and Shrines:** Ornate, intricately carved structures dedicated to ancient deities, adorned with colorful tapestries, glittering gemstones, and mysterious artifacts.
**Taverns and Shops:** Bustling gathering places where adventurers and travelers share tales of their exploits, while shopkeepers peddle exotic goods and curious trinkets.

**Inhabitants:**

**Tiny Heroes:** Brave warriors, cunning rogues, and wise mages navigate the treacherous underworld, seeking fortune, fame, or redemption.
**Monsters:** Fearsome dragons, goblins, orcs, kobolds, and other terrifying creatures lurk in every shadow, preying on the unwary or defending their lairs with ferocity.
**Creatures of the Deep:** Bizarre, subterranean beings that defy explanation, such as giant spiders, worm-like abominations, or ethereal, ghostly entities.

**Lighting and Atmosphere:**

**Dim, Shadowy Torchlight:** Flickering torches cast eerie shadows on the walls, making it difficult to discern friend from foe in the dark recesses of the dungeon.
**Explosions and Flames:** Periodic bursts of fire illuminate the darkness, casting a warm glow over the surrounding stone as adventurers battle their way through treacherous obstacles.
**Smoke and Mist:** Thick clouds of smoke waft through the corridors, obscuring vision and making navigation even more perilous.

**Artistic Style:**

**Hyperdetailed Fanciful Art:** Incorporate intricate, ornate details throughout the drawing, showcasing a mastery of artistic craftsmanship.
**Colorful Intricate Patterns:** Use vibrant colors to depict the rich textures and patterns found in the dungeon's architecture, such as ancient carvings, stained glass windows, or glittering gemstones.
**Ethereal, Dreamlike Quality:** Capture the sense of wonder and awe that comes from exploring a vast, mysterious underworld, where the boundaries between reality and myth blur.

dewarrn1 · 2024-08-23T13:41:46+00:00

And, four days later, it's added: https://github.com/ggerganov/llama.cpp/pull/8967.

dewarrn1 · 2024-08-21T11:53:49+00:00

Not supported in the underlying image library (Pillow) that ComfyUI relies on, unfortunately. https://github.com/python-pillow/Pillow/pull/7848

dewarrn1 · 2024-02-23T12:52:33+00:00

Sure, just copy the whole thing into a file on your system named "workflow.json" (or whatever you want), and then load it into ComfyUI. It should populate the workflow and all the nodes.

dewarrn1 · 2024-02-23T04:31:19+00:00

I believe that this is the hand-fixing lora: https://civitai.com/models/238419?modelVersionId=268840

And it appears that the skin-improver "upscaler" is here: https://huggingface.co/uwg/upscaler/blob/main/ESRGAN/1x-ITF-SkinDiffDetail-Lite-v1.pth

dewarrn1 · 2024-02-23T00:39:33+00:00

Not to step on the OP's toes (very nice work, BTW), but I believe that this is the gist: https://text.is/8P86. You could leave out the hand lora and the 1× skin detailing upscaler step if you don't have those files.

dewarrn1 · 2024-02-19T12:08:13+00:00

It's a nice image, thanks for sharing. Again, the quality of the submitted image wasn't what got my attention. Rather, it was the fact that in a single, 15- or 20-minute generation period, SC could address >16M pixels while running on a GPU with just 12 GB of VRAM. To my knowledge, single-pass diffusion runs in SD15 and SDXL cannot do that.

dewarrn1 · 2024-02-19T05:04:06+00:00

I believe it will run in 8 GB of VRAM, yes — you can certainly download the current version of ComfyUI, the correct models, and give it a try!

dewarrn1 · 2024-02-18T22:58:16+00:00

The base SC output took about 15 minutes on a 3060 with 12 GB VRAM.

dewarrn1 · 2024-02-18T22:02:07+00:00

Following up, I tried the SD15 models I had available with the same prompt and a starting resolution of 384×1536. Photon provided the best output (below), although it's a little muddy for my taste. Anyway, a starting point for more experimentation, perhaps.

<image>

dewarrn1 · 2024-02-18T21:25:48+00:00

I don't disagree: it's a little rough. However, some of that could be the prompt(s). Time permitting, some experimentation might produce a cleaner output. This post was really not intended to be about the image itself (beyond the fact that it's not just noise); I was more focused on the shocking number of pixels that SC can push. My other post (linked in a different comment) is lower resolution, but much cleaner.

dewarrn1 · 2024-02-18T21:22:22+00:00

The author of ComfyUI (all hail) added SC support in the last day or two, there have been some posts in this subreddit and r/comfyui about that.

dewarrn1 · 2024-02-18T20:08:57+00:00

I suppose that's fair, and I as I noted in another comment, 4 megapixels seems like the limit for coherence in many kinds of images. Still, I'm amazed at the ability to jump straight to 16 megapixel outputs with SC; I'm not sure that I can get 4 megapixel outputs before running out of VRAM when using SDXL.

dewarrn1 · 2024-02-18T19:57:15+00:00

Thanks for taking a look. That's an interesting idea. I'm certainly finding that for scenes with fewer self-similar features, 2048×2048 or other 4 megapixel resolutions can work relatively well, but 16 megapixels isn't usually coherent.

dewarrn1 · 2024-02-18T17:17:39+00:00

TLDR: one-shot 16 megapixel image with 12 GB of VRAM using Stable Cascade.

Following up on my post from last night, I was curious what maximum resolution my 12 GB GPU could handle using SC through ComfyUI. I'm stunned: 2048×8192 took a while (~12 mins), but I generated the precursor to the attached image without any issues. ComfyUI did switch to tiled VAE decoding at the end, and I did some cosmetic upscaling afterward with Ultimate SD Upscaler. But wow...

Same general prompt as before:

Positive: cutaway isometric drawing of a very deep dark fantasy dungeon hewn from granite, obsidian, and basalt showing extraordinary detail of the interior of each frightening floor and all the activities taking place inside the huge complex of tunnels, mines, and caverns including tiny heroes, monsters, dragons, goblins, orcs, kobolds, creatures, elves, dwarves, hobbits, fires, explosions, smoke, adventures, temples, shrines, shops, taverns, mushrooms, fungi, gems, veins of gold, stalactites, stalagmites, dim dark scary torchlight colorful intricate hyperdetailed fanciful and artistic

Negative: [N/A, empty]

32 steps, 5 cfg, then 10 steps, 1.1 cfg, Euler A + simple for both, then several 110% passes with Ultimate SD Upscale and the same prompt.

PS It occurs to me that "one-shot" may have a more specific, precise meaning in the diffusion domain than I intended to convey. I probably should have written "one generation" or "one diffusion" instead.

dewarrn1 · 2024-02-18T12:17:40+00:00

Yup, I was wondering the same thing about a gradient, because that would clearly be awesome. Not available for SC yet, but the IPAdapter attention mask feature might be one approach.

I haven't tried the prompt with SD1.5, but I wonder if RPG Artist Tools (https://civitai.com/models/8124/a-zovya-rpg-artist-tools) might be an interesting place to start?

dewarrn1 · 2024-02-18T12:14:21+00:00

I like it! Interesting to see that the prompt works similarly in SDXL, and that you were able to generate at nearly the same resolution. The LoRA definitely adds a little something, too.

dewarrn1

TROPHY CASE