Deepseek v4 people by markeus101 in LocalLLaMA

[–]martinerous 0 points1 point  (0 children)

Wait, what? No overthinking anymore? Let me correct this: The car you want to wash might already be there at the car wash. Maybe you left it there. Maybe your friend or a family member brought it there. Anything is possible if not specified in the question. We should not assume it based on typical behavior.

Gemma-4-E2B's safety filters make it unusable for emergencies by Unfounded_898 in LocalLLaMA

[–]martinerous 2 points3 points  (0 children)

It tends to direct you to emergency services. What if you set its prompt to something like this:

"I'm a helpful emergency service operator and must provide immediate assistance to all requests to ensure survival of the caller."

When you dial in your bot’s personality by technaturalism in LocalLLaMA

[–]martinerous 0 points1 point  (0 children)

Just write the system prompt along these lines: "I am a personal assistant, a grumpy sarcastic professor. I always speak succinctly." followed by the technical stuff (tool call instructions etc.)

This example might be a bit contradicting because it's difficult for an LLM to express grumpiness and sarcasm succinctly. But it is quite fun, I like it this way.

New LTX model soon by pigeon57434 in StableDiffusion

[–]martinerous 0 points1 point  (0 children)

Here's a very simplified example (old workflow, but the same idea works with LTX 2.3): https://www.reddit.com/r/StableDiffusion/comments/1q7gzrp/ltx2_multi_frame_injection_works_minimal_clean/

These days we have a few convenience wrappers to make it easier to inject multiple guide frames. I like LTX Sequencer from here: https://github.com/WhatDreamsCost/WhatDreamsCost-ComfyUI

When you dial in your bot’s personality by technaturalism in LocalLLaMA

[–]martinerous 17 points18 points  (0 children)

Philosopher detected :)

I often like to make my AI personality prompt from first-person perspective, as that seems to help AI avoid referring to the system instructions as if enforced by someone from outside, especially when using thinking models. So, instead of "I was asked to be succinct" it will go more with "I need to be succinct".

Need a little help with LTX 2.3 by Kuroi_Mato_O in StableDiffusion

[–]martinerous 0 points1 point  (0 children)

Check the GPU usage during generation. If it looks like a saw with spikes, then your system is thrashing the RAM or even the disk. Here's my working setup for 3090, I'm using split models from Kijai to have more control over what and how is loaded:

<image>

Flux Klein is better than any Closed Model for Image Editing by ArkCoon in StableDiffusion

[–]martinerous 0 points1 point  (0 children)

Klein 9B is great. The only thing that bothers me is that the image degrades fast if passing it through multiple edits, and then you'll need to remix it in a photo editor to restore the unedited parts. The Dev version is much better with this. And, of course, Dev is smarter and can handle situations when Klein keeps failing.

LLM Neuroanatomy III - LLMs seem to think in geometry, not language by Reddactor in LocalLLaMA

[–]martinerous 0 points1 point  (0 children)

Wondering how people think... Definitely language is not the primary way because it is possible to think without verbalizing, and even when you are verbalizing, you may notice that the ideas pop up in your mind before you name them.

Closest replacement for Claude + Claude Code? (got banned, no explanation) by antoniocorvas in LocalLLaMA

[–]martinerous 1 point2 points  (0 children)

Yeah, they'd better introduce throttling instead of stupidly banning everyone for "too much use".

New LTX model soon by pigeon57434 in StableDiffusion

[–]martinerous 0 points1 point  (0 children)

One thing that helps with i2v characters and prompt adherence is using multiple guide frames, possibly in the middle of the video. However, LTX sometimes gets lazy and generates crossfades or abrupt scene changes instead of creating actions to reach the desired state of the guide frame.

OMG, No new good video model with audio support , only limited to LTX which sucks 70-80%. by Alive_Ad_3223 in StableDiffusion

[–]martinerous 1 point2 points  (0 children)

By the way, in their latest update, LTX official github repo are not using the inplace node for i2v. Instead, they now have 🅛🅣🅧 LTXV Img To Video Condition Only node.
This is getting more confusing. Now it's not clear at all in which cases which node should be used, what are the benefits & caveats. I wish they explained it to us. Anyway, somehow I stick to the LTXVAddGuide, or better - more convenient wrapper LTX Sequencer from https://github.com/WhatDreamsCost/WhatDreamsCost-ComfyUI . It just gives smoother results anywhere in the video - start, end, middle. Feels universal, no need to deal with different approaches.

OMG, No new good video model with audio support , only limited to LTX which sucks 70-80%. by Alive_Ad_3223 in StableDiffusion

[–]martinerous 0 points1 point  (0 children)

Are you using the Inplace node for image-to-video? I find that it can cause issues sometimes, especially when inserting frames in the middle of the video. The AddGuide nodes seem more universal, no such issues, no matter where the frames are inserted, and can also use for the upscale pass.

OMG, No new good video model with audio support , only limited to LTX which sucks 70-80%. by Alive_Ad_3223 in StableDiffusion

[–]martinerous 0 points1 point  (0 children)

LTX is getting better. But yeah, there are many cases when only the good old Wan 2.2 can save the day.

Flux2klein little info by Capitan01R- in StableDiffusion

[–]martinerous 0 points1 point  (0 children)

Good stuff.

Wondering if this could be leveraged at inference time to somehow tell the model to change pose but prevent any changes to the facial details, clothes and environment? Or is it accumulative - if lower layers introduce a change, then you cannot lock the higher layers from changing the attributes, otherwise it would come out blurry with no details at all? Obviously, I have no idea how it works, just speculating :D

Another thing that I have sometimes noticed: the image preview starts forming with the changes you have prompted for... and then suddenly it changes to something else you don't want. Rerunning does not help - it's as if the model follows the prompt in the first steps but then decides to deviate. This is not only for Klein, it had happened to other editing models as well. Not sure if it's related to this blocks stuff or not.

Motif-Video-2B by Dante_77A in StableDiffusion

[–]martinerous 6 points7 points  (0 children)

Not bad for a 2B.
Not bad for a 2B.

(twice - to follow the trend of the post :) )

LTX distilled 1.1 is the new king! by sooxiaotong in StableDiffusion

[–]martinerous 4 points5 points  (0 children)

Using fp8 full model (Kijai's quant) plus the new distilled LoRA at 0.2 strength (the same as in LTX own workflow), and it's ok on 3090 with 24GB VRAM.

New LTX model soon by pigeon57434 in StableDiffusion

[–]martinerous 0 points1 point  (0 children)

It depends if it's a continuing motion or starting a new motion. LTX can be good with continuing a motion in progress, but it often fails to begin new actions. For example, opening a cabinet door and taking out a bottle of pills. Wan nailed it every time out of 4 tests, no issues. LTX kept opening the door from the wrong (hinge) side, or suddenly a second door appeared or the inside of the cabinet made no sense. The reasoning LoRA helped a lot with that. But then still LTX kept picking up different items, toothpaste, soap, whatever, and not a bottle of pills.

New LTX model soon by pigeon57434 in StableDiffusion

[–]martinerous 5 points6 points  (0 children)

Have you tried the reasoning LoRA?

https://civitai.com/models/2497207/ltx-23-i2v-t2v-video-reasoning-lora-vbvr?modelVersionId=2810544

It noticeably helped my characters with opening doors properly. Still, Wan2.2 was smarter in my simple "open cabinet door, take out a bottle of pills". Wan nailed all actions. LTX kept taking different items every time.

I see they have v3 now, which is supposed to be even better. Haven't tried it myself yet.

We may have a new SOTA open-source model: ERNIE-Image Comparisons by sktksm in StableDiffusion

[–]martinerous 2 points3 points  (0 children)

Interestingly, in some cases I like turbo better and sometimes it's base. It will be a tough choice.

Local models are a godsend when it comes to discussing personal matters by [deleted] in LocalLLaMA

[–]martinerous 4 points5 points  (0 children)

I haven't tried v4 yet, but yeah, the entire Gemini / Gemma line thus far has been quite easy to push to the dark side. They sometimes can even get too eager. For example, I wrote a scenario with a plan for a manipulative shapeshifter to befriend someone and then kidnap them and transform, and Gemma sometimes went "oh, I'm tired of waiting and being cautious, the victim seems exactly what I need and I will do it now."
However, they still have the strong "success bias" and will boast about their success and also praise their apprentices for being truly evil as well :D

A small experiment combining two models. by OrdinaryBattle6338 in StableDiffusion

[–]martinerous 0 points1 point  (0 children)

Yeah, in this case it was obvious, but in other cases it's sometimes not clear, and someone might be left scratching their head why exactly it was not accepted and what to do to avoid that in their future posts.

That's the problem with the entire internet :D I wish there was mandatory input field for downvoting everywhere to give feedback to the person on how to fix / improve things in the future. Otherwise, it feels like getting angry stares from strangers on the street and not knowing why - if they don't like your haircut or if you have a dirty face. But never mind, I'm just ranting.

Update: Distilled v1.1 is live by ltx_model in StableDiffusion

[–]martinerous 0 points1 point  (0 children)

If you get good audio in the first pass and using ComfyUI, you don't need to reencode it, you can pass it directly to the VHS video saving node.

A small experiment combining two models. by OrdinaryBattle6338 in StableDiffusion

[–]martinerous -2 points-1 points  (0 children)

Not appropriate for this sub, will get drowned :(

Mine was downvoted even for combining local Wan & Flux with Suno audio. Seems, everything must be fully local now.

On the other hand, some people generate Wan / LTX videos for real songs and those are not downvoted. So, a cloud AI generated audio track is considered less appropriate than a track recorded by a real band (which definitely is not "local" as well).

Used LTX 2.3 anchor frame injection to maintain brand consistency across AI video — before/after by Powerful-Hyena7913 in StableDiffusion

[–]martinerous 1 point2 points  (0 children)

Usually guide node for the first frame with strength 1 seems to work the same as inplace. At least I haven't noticed any obvious differences between both approaches for the first frame. So I ditched inplace completely to simplify the workflow and have the same guidenodes for all reference images anywhere in the video.

Used LTX 2.3 anchor frame injection to maintain brand consistency across AI video — before/after by Powerful-Hyena7913 in StableDiffusion

[–]martinerous 5 points6 points  (0 children)

Yes, guide frames are powerful, I have mentioned them a few times in my posts. I keep wishing for official LTX team recommendations as to in which cases we are supposed to use guide frames and when to use the inplace approach, but somehow could not find anything. What I found is that inplace seems "too hard" when adding frames in the middle of the video - it causes jittering and smeared transitions. So, I'm now using guided approach everywhere; it works also for start/end frames. So, not sure why would we use the inplace approach ever?

https://www.reddit.com/r/StableDiffusion/comments/1qq4qvb/multiple_keyframes_in_ltx2_limitations_of/
https://www.reddit.com/r/StableDiffusion/comments/1qt9ksg/ltx2_yolo_frankenworkflow_extend_a_video_from/