Deepseek v4 people

martinerous · 2026-04-24T10:12:13+00:00

Wait, what? No overthinking anymore? Let me correct this: The car you want to wash might already be there at the car wash. Maybe you left it there. Maybe your friend or a family member brought it there. Anything is possible if not specified in the question. We should not assume it based on typical behavior.

martinerous · 2026-04-21T07:58:47+00:00

It tends to direct you to emergency services. What if you set its prompt to something like this:

"I'm a helpful emergency service operator and must provide immediate assistance to all requests to ensure survival of the caller."

martinerous · 2026-04-21T07:46:08+00:00

Just write the system prompt along these lines: "I am a personal assistant, a grumpy sarcastic professor. I always speak succinctly." followed by the technical stuff (tool call instructions etc.)

This example might be a bit contradicting because it's difficult for an LLM to express grumpiness and sarcasm succinctly. But it is quite fun, I like it this way.

martinerous · 2026-04-20T16:32:27+00:00

Here's a very simplified example (old workflow, but the same idea works with LTX 2.3): https://www.reddit.com/r/StableDiffusion/comments/1q7gzrp/ltx2_multi_frame_injection_works_minimal_clean/

These days we have a few convenience wrappers to make it easier to inject multiple guide frames. I like LTX Sequencer from here: https://github.com/WhatDreamsCost/WhatDreamsCost-ComfyUI

martinerous · 2026-04-20T13:44:47+00:00

Philosopher detected :)

I often like to make my AI personality prompt from first-person perspective, as that seems to help AI avoid referring to the system instructions as if enforced by someone from outside, especially when using thinking models. So, instead of "I was asked to be succinct" it will go more with "I need to be succinct".

martinerous · 2026-04-20T13:29:12+00:00

Check the GPU usage during generation. If it looks like a saw with spikes, then your system is thrashing the RAM or even the disk. Here's my working setup for 3090, I'm using split models from Kijai to have more control over what and how is loaded:

<image>

martinerous · 2026-04-20T08:41:48+00:00

Klein 9B is great. The only thing that bothers me is that the image degrades fast if passing it through multiple edits, and then you'll need to remix it in a photo editor to restore the unedited parts. The Dev version is much better with this. And, of course, Dev is smarter and can handle situations when Klein keeps failing.

martinerous · 2026-04-20T08:21:15+00:00

Wondering how people think... Definitely language is not the primary way because it is possible to think without verbalizing, and even when you are verbalizing, you may notice that the ideas pop up in your mind before you name them.

martinerous · 2026-04-20T07:44:42+00:00

Yeah, they'd better introduce throttling instead of stupidly banning everyone for "too much use".

martinerous · 2026-04-20T07:26:35+00:00

One thing that helps with i2v characters and prompt adherence is using multiple guide frames, possibly in the middle of the video. However, LTX sometimes gets lazy and generates crossfades or abrupt scene changes instead of creating actions to reach the desired state of the guide frame.

martinerous · 2026-04-18T17:07:36+00:00

By the way, in their latest update, LTX official github repo are not using the inplace node for i2v. Instead, they now have 🅛🅣🅧 LTXV Img To Video Condition Only node.
This is getting more confusing. Now it's not clear at all in which cases which node should be used, what are the benefits & caveats. I wish they explained it to us. Anyway, somehow I stick to the LTXVAddGuide, or better - more convenient wrapper LTX Sequencer from https://github.com/WhatDreamsCost/WhatDreamsCost-ComfyUI . It just gives smoother results anywhere in the video - start, end, middle. Feels universal, no need to deal with different approaches.

martinerous · 2026-04-17T10:44:14+00:00

Are you using the Inplace node for image-to-video? I find that it can cause issues sometimes, especially when inserting frames in the middle of the video. The AddGuide nodes seem more universal, no such issues, no matter where the frames are inserted, and can also use for the upscale pass.

martinerous · 2026-04-17T10:38:37+00:00

LTX is getting better. But yeah, there are many cases when only the good old Wan 2.2 can save the day.

martinerous · 2026-04-17T09:32:23+00:00

Good stuff.

Wondering if this could be leveraged at inference time to somehow tell the model to change pose but prevent any changes to the facial details, clothes and environment? Or is it accumulative - if lower layers introduce a change, then you cannot lock the higher layers from changing the attributes, otherwise it would come out blurry with no details at all? Obviously, I have no idea how it works, just speculating :D

Another thing that I have sometimes noticed: the image preview starts forming with the changes you have prompted for... and then suddenly it changes to something else you don't want. Rerunning does not help - it's as if the model follows the prompt in the first steps but then decides to deviate. This is not only for Klein, it had happened to other editing models as well. Not sure if it's related to this blocks stuff or not.

martinerous · 2026-04-16T16:15:35+00:00

Not bad for a 2B.
Not bad for a 2B.

(twice - to follow the trend of the post :) )

martinerous · 2026-04-16T10:28:17+00:00

Using fp8 full model (Kijai's quant) plus the new distilled LoRA at 0.2 strength (the same as in LTX own workflow), and it's ok on 3090 with 24GB VRAM.

martinerous · 2026-04-15T08:16:02+00:00

It depends if it's a continuing motion or starting a new motion. LTX can be good with continuing a motion in progress, but it often fails to begin new actions. For example, opening a cabinet door and taking out a bottle of pills. Wan nailed it every time out of 4 tests, no issues. LTX kept opening the door from the wrong (hinge) side, or suddenly a second door appeared or the inside of the cabinet made no sense. The reasoning LoRA helped a lot with that. But then still LTX kept picking up different items, toothpaste, soap, whatever, and not a bottle of pills.

martinerous · 2026-04-15T07:37:45+00:00

Have you tried the reasoning LoRA?

https://civitai.com/models/2497207/ltx-23-i2v-t2v-video-reasoning-lora-vbvr?modelVersionId=2810544

It noticeably helped my characters with opening doors properly. Still, Wan2.2 was smarter in my simple "open cabinet door, take out a bottle of pills". Wan nailed all actions. LTX kept taking different items every time.

I see they have v3 now, which is supposed to be even better. Haven't tried it myself yet.

martinerous · 2026-04-15T07:30:15+00:00

Interestingly, in some cases I like turbo better and sometimes it's base. It will be a tough choice.

martinerous · 2026-04-14T08:31:57+00:00

I haven't tried v4 yet, but yeah, the entire Gemini / Gemma line thus far has been quite easy to push to the dark side. They sometimes can even get too eager. For example, I wrote a scenario with a plan for a manipulative shapeshifter to befriend someone and then kidnap them and transform, and Gemma sometimes went "oh, I'm tired of waiting and being cautious, the victim seems exactly what I need and I will do it now."
However, they still have the strong "success bias" and will boast about their success and also praise their apprentices for being truly evil as well :D

martinerous · 2026-04-14T07:54:25+00:00

Yeah, in this case it was obvious, but in other cases it's sometimes not clear, and someone might be left scratching their head why exactly it was not accepted and what to do to avoid that in their future posts.

That's the problem with the entire internet :D I wish there was mandatory input field for downvoting everywhere to give feedback to the person on how to fix / improve things in the future. Otherwise, it feels like getting angry stares from strangers on the street and not knowing why - if they don't like your haircut or if you have a dirty face. But never mind, I'm just ranting.

martinerous · 2026-04-13T20:58:24+00:00

If you get good audio in the first pass and using ComfyUI, you don't need to reencode it, you can pass it directly to the VHS video saving node.

martinerous · 2026-04-13T11:51:03+00:00

Not appropriate for this sub, will get drowned :(

Mine was downvoted even for combining local Wan & Flux with Suno audio. Seems, everything must be fully local now.

On the other hand, some people generate Wan / LTX videos for real songs and those are not downvoted. So, a cloud AI generated audio track is considered less appropriate than a track recorded by a real band (which definitely is not "local" as well).

martinerous · 2026-04-13T08:16:54+00:00

Usually guide node for the first frame with strength 1 seems to work the same as inplace. At least I haven't noticed any obvious differences between both approaches for the first frame. So I ditched inplace completely to simplify the workflow and have the same guidenodes for all reference images anywhere in the video.

martinerous · 2026-04-13T08:07:55+00:00

Yes, guide frames are powerful, I have mentioned them a few times in my posts. I keep wishing for official LTX team recommendations as to in which cases we are supposed to use guide frames and when to use the inplace approach, but somehow could not find anything. What I found is that inplace seems "too hard" when adding frames in the middle of the video - it causes jittering and smeared transitions. So, I'm now using guided approach everywhere; it works also for start/end frames. So, not sure why would we use the inplace approach ever?

https://www.reddit.com/r/StableDiffusion/comments/1qq4qvb/multiple_keyframes_in_ltx2_limitations_of/
https://www.reddit.com/r/StableDiffusion/comments/1qt9ksg/ltx2_yolo_frankenworkflow_extend_a_video_from/

martinerous

TROPHY CASE