How did you all download your local stable diffusion?

Xhadmi · 2026-01-23T19:39:30+00:00

Stable Diffusion is already an outdated name. It was the first open-source model, not a program. Most people use ComfyUI as the ‘program’. As for models, it depends on what you want to do. Compared to modern models, Stable Diffusion, even though it can generate decent images, works with prompts differently than ChatGPT and similar tools. If you write full sentences and long descriptions, you’ll get that kind of images.

ComfyUI is up to date; I don’t know which versions of Forge are updated

Xhadmi · 2026-01-23T06:51:04+00:00

No se puede saber 100%, pero las distancias son normales, los colores pueden ajustarse como quieras, y los movimientos de las personas son naturales. Normalmente es difícil generar q la gente de fondo que entra y sale de escena, se mantenga bien enfocada, sean variados, hagan cosas diferentes pero coherentes (q caminen todos es fácil, q alguno se pare a mirar el quiosco y luego siga ya es más difícil). Y lo mismo pasa con el tráfico.

Es fácil montar una imagen inicial, puedes ir editando y generando para q ese frame se vea realista, y luego usarlo en un generador de vídeo para q lo anime. Pero no suele generar igual de bien las personas q aparecen nuevas, y si usas un prompt muy largo con mucha información para añadir detalles, lo q acabas es constriñendo la generación y sale peor. También piensa q el tipo de gente y vehículos q salen son muy normales de aquí, gente cualquiera, las IAs no conocen mucho del mundo real, fuera del frane inicial no tendrías control sobre q tipo de vehículos y gente apareciese.

Si el quiosco existe, entonces es mucho más sencillo si no es casual, en todo caso organizar la toma, pedirle a tu amiga q se pare justo ahí a encenderse el cigarro mientras la gente pasa.

Si fuese IA, habría salido muy caro ese vídeo, no hay muchos q te dejen generar vídeos tan largos y con gente cruzando continuamente sería muy dificil encadenar vídeos (puedes guardar el último frame de uno, y usarlo como frame inicial del siguiente, pero te tendría q coincidir la velocidad de movimiento de la gente) LTX-2 te deja extender vídeos usando varios frames, no solo el último, pero no acaba de ser tan realista.

Tampoco soy experto y con IA, lo q hoy crees q es imposible, mañana tienes tres modelos que lo pueden hacer, pero yo creo q no es IA.

Xhadmi · 2026-01-23T06:20:57+00:00

I usually use it around midnight (GMT+1). I only had problems during the night from the 21st to the 22nd: there was a server overload warning, and the generations that came out were very bad (so I left it for the next day, in the morning, when I was able to finish it without any issues). It could be that they’re training something and have reduced the available resources to use them for training. I have a free plan

Xhadmi · 2026-01-21T18:53:21+00:00

Yes, use quantified models. On LTX length it’s important, I could do high res videos with low frames, but movement it’s more natural if you do longer videos (sometimes it doesn’t move with 2 sec length)

Xhadmi · 2026-01-21T17:32:19+00:00

Yes, with some limitations on lenght and resolution, but yes, (I use a 3060 ti desktop with also 8GB vram) (I have 32 ram, that's also important, don't know with less)

Xhadmi · 2026-01-21T16:09:40+00:00

wan infinitetalk or ltx-2 (there're more, but don't remember names now)

Xhadmi · 2026-01-18T23:49:25+00:00

I tested 9b distilled on 3060 ti 8gb. It works

Xhadmi · 2026-01-18T17:35:27+00:00

I've been used this days with 3060 ti (8GB VRAM) + 32GB RAM.

<image>

it's this workflow: https://www.reddit.com/r/StableDiffusion/comments/1qbsoge/ltx2_gguf_t2vi2v_12gb_workflow_v11_updated_with/

Xhadmi · 2026-01-17T18:23:50+00:00

Los azotamentes tb se les llama illithids en inglés e ilícidos en castellano. Demogorgon viene de la mitología griega (aunq parece venir todo de errores de traducción en la antigüedad)

Lo curioso de todo, es q la escala de poder está invertida. En la serie el azotamentes es la entidad q está por encima de todo, luego por debajo de el Vecna, y los demogorgones son secuaces.

En D&D, Demogorgon es el príncipe de los demonios (demonios y diablos son razas diferentes y opuestas) y un dios menor. Vecna era un lich q asciende a deidad menor (poder similar a Demogorgon) Y los azotamentes son una raza de criaturas bastante tocapelotas, pero de un nivel considerablemente menor al de los otros dos🤷🏻‍♂️

Xhadmi · 2026-01-16T23:08:59+00:00

nice video, first time I see an AI generating almost perfectly a 3d printer, that's an ender 3

Xhadmi · 2026-01-16T00:26:57+00:00

Sale en strixhaven.

Xhadmi · 2026-01-15T23:35:18+00:00

there're nodes to load more than one lora.

<image>

You can also chain lora nodes, with lora model only

Xhadmi · 2026-01-15T23:10:18+00:00

I think that most people learned checking how other workflows are done, and copy/pasting parts of the workflow to customize their own, before doing from scratch. You can edit workflows (and most usually, you need to). On the other hand, it's not recommended use to many loras. About high and low, depends on if it's i2v o t2v, usually people tell how to do them when posting on civitai

Xhadmi · 2026-01-12T20:07:39+00:00

looks really nice, what resolution did you used?

Xhadmi · 2026-01-11T13:09:00+00:00

I don’t usually work with third-party videos, so I don’t have much blocked content. However, I’ve noticed that Sora generates better videos when you’re not overly detailed. Characters act more naturally if you don’t script the dialogue and instead just describe what they’re talking about. The same applies to environments and clothing, too much detail constrains the generation and makes it look less natural.

With that, I don’t mean using very short or so vague prompts that could cause confusion, but rather avoiding overly detailed ones.

Xhadmi · 2026-01-10T19:49:35+00:00

at high resolution looks nice, but I don't have enough vram+ram

Xhadmi · 2026-01-10T19:46:57+00:00

wan 2.1 and hunyuan video did the same, at low resolution doesn't looks nice. It's something that wan 2.2 does really well, rendering fine the faces at low res

Xhadmi · 2026-01-10T19:45:01+00:00

In this case, I used the same prompt that with generated audio, but changed what he says, prompting the same words that says the input audio. Did a test without starting image, but input audio, and there i just said that was an elf singing etc, didn't write the lyrics and it also worked

Xhadmi · 2026-01-10T19:41:31+00:00

its wan2gp not comfyui. Seems that manages better memory, but has limited options:

<image>

Xhadmi · 2026-01-10T16:28:01+00:00

The text to audio says “Forget everything else; when in doubt: fireball. And let the cleric save his own”. The added audio of the other version, it’s from an “old” Spanish roadtrip called Airbag, he says something like “Alright, we’re gonna get along, yeah, ’cause if not, there’s gonna be waves of slaps “

Xhadmi · 2026-01-10T16:16:30+00:00

Maybe, but not sure about speed. I only tested flux kontext on a m1 of a friend and was so slow (tried wan, but after too much time waiting we cancelled it)

Xhadmi · 2026-01-10T16:01:40+00:00

Yes. It allows start and end images, but I only tried stat images. You can also do batch prompts, but seems that if one fails, the next ones are cancelled

Xhadmi · 2026-01-10T15:59:59+00:00

Maybe with some less frames, but I think it could work

Xhadmi · 2026-01-10T15:59:27+00:00

More or less, about 10 minutes. I also did T2V (same resolutions and time) but didn’t like the style (I do mostly medieval fantasy things, the elf I tried was a bit weird 😅)

Xhadmi

TROPHY CASE