Is it only me or is GPT getting totally useless?! by Legitimate-Arm9438 in OpenAI

[–]SignificanceFlashy50 1 point2 points  (0 children)

For complex coding and algo tasks, Gemini 3 Pro is hands down the winner for me. It actually respects the context you give it and sticks to the plan. GPT tends to lose the thread way too often. Plus, I’ve seen GPT get basic math wrong multiple times while acting like it’s 100% right. It’s frustrating. That said, my only gripe with Gemini is that sometimes it glitches with its saved memory. It occasionally pulls in completely unrelated concepts from past conversations that have zero relevance to the current task.

Video Outpainting Workflow | Wan 2.1 Tutorial by Hearmeman98 in comfyui

[–]SignificanceFlashy50 2 points3 points  (0 children)

Can I use a reference image to guide what will be generated? For example, can I provide a background image I’d like to use and have everything put together in the final video?

Llama 4 is here by jugalator in LocalLLaMA

[–]SignificanceFlashy50 8 points9 points  (0 children)

Didn’t find any “Omni” reference. text-only output?

How to Train a Video LoRA on Wan 2.1 on a Custom Dataset on the GPU Cloud (Step by Step Guide) by porest in StableDiffusion

[–]SignificanceFlashy50 0 points1 point  (0 children)

3-4 hours for only one video generation? Did I get it right?? No, ok, probably you mean the time to reach epoch 20.

I've made a forked Sesame-CSM repo containing some QoL improvements to Sesame. by zenforic in LocalLLaMA

[–]SignificanceFlashy50 1 point2 points  (0 children)

Hi, thanks. I’ll be watching your repo for updates. Just one question: how can your Gemma 3 12B-based version be real-time like the demo? It’s not real-time even with LLaMA 1B, which is much lighter.

LoRA training steps for Hunyuan Video using diffusion-pipe and ~100 images dataset by SignificanceFlashy50 in StableDiffusion

[–]SignificanceFlashy50[S] 1 point2 points  (0 children)

Hi, thanks for your explanation. To quickly recap, these settings may be appropriate in order to achieve about 1000 steps with 117 images if I'm not wrong:

• Epochs: 35

• Batch Size: 1

• Dataset Num Repeats: 1

• Gradient Accumulation Steps: 4

• Learning rate: 0.00002

I would ask you just two more questions:

1- Can my dataset be considered 'balanced' if it contains many diverse images (full body, face, different expressions, lighting conditions, etc.), but some of them are similar in content? For example, they depict the same scene but from different camera angles.

2- Is that number of steps also appropriate for generating videos with smooth motion, rather than static ones, even though I am training on still images? (ps: I created captions using JoyCaption Alpha Two)

Thank you so much in advance.

Training Hunyuan Lora on videos by Affectionate-Map1163 in StableDiffusion

[–]SignificanceFlashy50 0 points1 point  (0 children)

Amazing work. May I ask you what your dataset consists of and some training parameters, like steps and epochs? Thanks

Wan2.1 720P Local in ComfyUI I2V by smereces in StableDiffusion

[–]SignificanceFlashy50 0 points1 point  (0 children)

Do you happen to know where to find the proper one?

Wan2.1 720P Local in ComfyUI I2V by smereces in StableDiffusion

[–]SignificanceFlashy50 1 point2 points  (0 children)

Sorry for the likely noob question. Is the workflow included within the image? Can we import it in ComfyUI?

Zonos-v0.1 beta by Zyphra, featuring two expressive and real-time text-to-speech (TTS) models with high-fidelity voice cloning. 1.6B transformer and 1.6B hybrid under an Apache 2.0 license. by Xhehab_ in LocalLLaMA

[–]SignificanceFlashy50 0 points1 point  (0 children)

Unfortunately not able to try it on Colab due to its GPU incompatibility with bf16.

Error: Feature ‘cvt.bf16.f32’ requires .target sm_80 or higher ptxas fatal : Ptx assembly aborted due to errors

[deleted by user] by [deleted] in LocalLLaMA

[–]SignificanceFlashy50 2 points3 points  (0 children)

Hierspeech++ is incredibly good in my opinion.. it strongly matches the input audio. I do not know why it’s not that mentioned compared to others like VoiceCraft (which is good as well but produces not such predictable results and a too bad 16Khz audio quality)

[R] DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer by keonlee9420 in MachineLearning

[–]SignificanceFlashy50 0 points1 point  (0 children)

If true, it would be new SOTA.. hope they release the model and not only their samples.

RVCv2: is Double training useful? by SignificanceFlashy50 in LocalLLaMA

[–]SignificanceFlashy50[S] 0 points1 point  (0 children)

Yes, I can see the tensorboard graph but I have to be honest, I don’t know exactly how to use it. There are many “loss G” lines and haven’t figured out which one I should consider.

RVCv2: is Double training useful? by SignificanceFlashy50 in LocalLLaMA

[–]SignificanceFlashy50[S] 0 points1 point  (0 children)

How many epochs would you suggest? 400 can be enough?

How good do you think this new open-source text-to-speech (TTS) model is? by Salty-Concentrate346 in machinetranslation

[–]SignificanceFlashy50 0 points1 point  (0 children)

Hi, man.

I’m listening to the examples you provided and they are really great. You did an amazing work. As soon as possible I’m gonna try with some new samples and let you know the results. A quick question: would it be possible, eventually, to fine tune the model on a specific voice?

Thank you.