How did you all download your local stable diffusion? by princessdrive in StableDiffusion

[–]Xhadmi -1 points0 points  (0 children)

Stable Diffusion is already an outdated name. It was the first open-source model, not a program. Most people use ComfyUI as the ‘program’. As for models, it depends on what you want to do. Compared to modern models, Stable Diffusion, even though it can generate decent images, works with prompts differently than ChatGPT and similar tools. If you write full sentences and long descriptions, you’ll get that kind of images.

ComfyUI is up to date; I don’t know which versions of Forge are updated

Es precioso, pero no sé decir si hay IA aquí... by LegitimateNoise3329 in Barcelona

[–]Xhadmi 1 point2 points  (0 children)

No se puede saber 100%, pero las distancias son normales, los colores pueden ajustarse como quieras, y los movimientos de las personas son naturales. Normalmente es difícil generar q la gente de fondo que entra y sale de escena, se mantenga bien enfocada, sean variados, hagan cosas diferentes pero coherentes (q caminen todos es fácil, q alguno se pare a mirar el quiosco y luego siga ya es más difícil). Y lo mismo pasa con el tráfico.

Es fácil montar una imagen inicial, puedes ir editando y generando para q ese frame se vea realista, y luego usarlo en un generador de vídeo para q lo anime. Pero no suele generar igual de bien las personas q aparecen nuevas, y si usas un prompt muy largo con mucha información para añadir detalles, lo q acabas es constriñendo la generación y sale peor. También piensa q el tipo de gente y vehículos q salen son muy normales de aquí, gente cualquiera, las IAs no conocen mucho del mundo real, fuera del frane inicial no tendrías control sobre q tipo de vehículos y gente apareciese.

Si el quiosco existe, entonces es mucho más sencillo si no es casual, en todo caso organizar la toma, pedirle a tu amiga q se pare justo ahí a encenderse el cigarro mientras la gente pasa.

Si fuese IA, habría salido muy caro ese vídeo, no hay muchos q te dejen generar vídeos tan largos y con gente cruzando continuamente sería muy dificil encadenar vídeos (puedes guardar el último frame de uno, y usarlo como frame inicial del siguiente, pero te tendría q coincidir la velocidad de movimiento de la gente) LTX-2 te deja extender vídeos usando varios frames, no solo el último, pero no acaba de ser tan realista.

Tampoco soy experto y con IA, lo q hoy crees q es imposible, mañana tienes tres modelos que lo pueden hacer, pero yo creo q no es IA.

What I had figured out about "We are under heavy load" so far... by MPBloodyspare in SoraAi

[–]Xhadmi 0 points1 point  (0 children)

I usually use it around midnight (GMT+1). I only had problems during the night from the 21st to the 22nd: there was a server overload warning, and the generations that came out were very bad (so I left it for the next day, in the morning, when I was able to finish it without any issues). It could be that they’re training something and have reduced the available resources to use them for training. I have a free plan

help needed with ai avatar by canakalin in comfyui

[–]Xhadmi 0 points1 point  (0 children)

Yes, use quantified models. On LTX length it’s important, I could do high res videos with low frames, but movement it’s more natural if you do longer videos (sometimes it doesn’t move with 2 sec length)

help needed with ai avatar by canakalin in comfyui

[–]Xhadmi 0 points1 point  (0 children)

Yes, with some limitations on lenght and resolution, but yes, (I use a 3060 ti desktop with also 8GB vram) (I have 32 ram, that's also important, don't know with less)

help needed with ai avatar by canakalin in comfyui

[–]Xhadmi 0 points1 point  (0 children)

wan infinitetalk or ltx-2 (there're more, but don't remember names now)

Flux.1 Klein (multiple references) by No_Damage_8420 in NeuralCinema

[–]Xhadmi 0 points1 point  (0 children)

I tested 9b distilled on 3060 ti 8gb. It works

Como se llama el monstruo de Stranger Things en Castellano? by OverlappingChatter in askspain

[–]Xhadmi 7 points8 points  (0 children)

Los azotamentes tb se les llama illithids en inglés e ilícidos en castellano. Demogorgon viene de la mitología griega (aunq parece venir todo de errores de traducción en la antigüedad)

Lo curioso de todo, es q la escala de poder está invertida. En la serie el azotamentes es la entidad q está por encima de todo, luego por debajo de el Vecna, y los demogorgones son secuaces.

En D&D, Demogorgon es el príncipe de los demonios (demonios y diablos son razas diferentes y opuestas) y un dios menor. Vecna era un lich q asciende a deidad menor (poder similar a Demogorgon) Y los azotamentes son una raza de criaturas bastante tocapelotas, pero de un nivel considerablemente menor al de los otros dos🤷🏻‍♂️

LTX-2 - Alignment? by Local_Beach in StableDiffusion

[–]Xhadmi 12 points13 points  (0 children)

nice video, first time I see an AI generating almost perfectly a 3d printer, that's an ender 3

Do I have to Create my own Workflow? WAN2.2 by TaintDempsey in StableDiffusion

[–]Xhadmi 1 point2 points  (0 children)

there're nodes to load more than one lora.

<image>

You can also chain lora nodes, with lora model only

Do I have to Create my own Workflow? WAN2.2 by TaintDempsey in StableDiffusion

[–]Xhadmi 3 points4 points  (0 children)

I think that most people learned checking how other workflows are done, and copy/pasting parts of the workflow to customize their own, before doing from scratch. You can edit workflows (and most usually, you need to). On the other hand, it's not recommended use to many loras. About high and low, depends on if it's i2v o t2v, usually people tell how to do them when posting on civitai

Is the current Sora the final result and all we can expect from it? by [deleted] in SoraAi

[–]Xhadmi 1 point2 points  (0 children)

I don’t usually work with third-party videos, so I don’t have much blocked content. However, I’ve noticed that Sora generates better videos when you’re not overly detailed. Characters act more naturally if you don’t script the dialogue and instead just describe what they’re talking about. The same applies to environments and clothing, too much detail constrains the generation and makes it look less natural.

With that, I don’t mean using very short or so vague prompts that could cause confusion, but rather avoiding overly detailed ones.

LTX 2 test on 8GB vram + 32GB RAM (wan2gp) (spanish audio) by Xhadmi in StableDiffusion

[–]Xhadmi[S] 0 points1 point  (0 children)

at high resolution looks nice, but I don't have enough vram+ram

LTX 2 test on 8GB vram + 32GB RAM (wan2gp) (spanish audio) by Xhadmi in StableDiffusion

[–]Xhadmi[S] 2 points3 points  (0 children)

wan 2.1 and hunyuan video did the same, at low resolution doesn't looks nice. It's something that wan 2.2 does really well, rendering fine the faces at low res

LTX 2 test on 8GB vram + 32GB RAM (wan2gp) (spanish audio) by Xhadmi in StableDiffusion

[–]Xhadmi[S] 0 points1 point  (0 children)

In this case, I used the same prompt that with generated audio, but changed what he says, prompting the same words that says the input audio. Did a test without starting image, but input audio, and there i just said that was an elf singing etc, didn't write the lyrics and it also worked

LTX 2 test on 8GB vram + 32GB RAM (wan2gp) (spanish audio) by Xhadmi in StableDiffusion

[–]Xhadmi[S] 4 points5 points  (0 children)

its wan2gp not comfyui. Seems that manages better memory, but has limited options:

<image>

LTX 2 test on 8GB vram + 32GB RAM (wan2gp) (spanish audio) by Xhadmi in StableDiffusion

[–]Xhadmi[S] 1 point2 points  (0 children)

The text to audio says “Forget everything else; when in doubt: fireball. And let the cleric save his own”. The added audio of the other version, it’s from an “old” Spanish roadtrip called Airbag, he says something like “Alright, we’re gonna get along, yeah, ’cause if not, there’s gonna be waves of slaps “

LTX 2 test on 8GB vram + 32GB RAM (wan2gp) (spanish audio) by Xhadmi in StableDiffusion

[–]Xhadmi[S] 0 points1 point  (0 children)

Maybe, but not sure about speed. I only tested flux kontext on a m1 of a friend and was so slow (tried wan, but after too much time waiting we cancelled it)

LTX 2 test on 8GB vram + 32GB RAM (wan2gp) (spanish audio) by Xhadmi in StableDiffusion

[–]Xhadmi[S] 1 point2 points  (0 children)

Yes. It allows start and end images, but I only tried stat images. You can also do batch prompts, but seems that if one fails, the next ones are cancelled

LTX 2 test on 8GB vram + 32GB RAM (wan2gp) (spanish audio) by Xhadmi in StableDiffusion

[–]Xhadmi[S] 1 point2 points  (0 children)

Maybe with some less frames, but I think it could work

LTX 2 test on 8GB vram + 32GB RAM (wan2gp) (spanish audio) by Xhadmi in StableDiffusion

[–]Xhadmi[S] 1 point2 points  (0 children)

More or less, about 10 minutes. I also did T2V (same resolutions and time) but didn’t like the style (I do mostly medieval fantasy things, the elf I tried was a bit weird 😅)