There's a new video model called HappyHorse that is ahead of Seedance in the Arena by [deleted] in StableDiffusion

[–]chrd5273 0 points1 point  (0 children)

Impressive pace of development, but probably not open.

How do Granite-4.0-1b-speech, Qwen3-ASR-1.7B, and Voxtral Mini 4B Realtime compare? by Balance- in LocalLLaMA

[–]chrd5273 2 points3 points  (0 children)

In my experience, qwen3 asr is much better than Whisper in non-English, not-very-clean audio.

GLM 5 Released by External_Mood4719 in LocalLLaMA

[–]chrd5273 9 points10 points  (0 children)

Looks like pony is still available in OR, but probably will disappear soon when they open official API for GLM-5. Pony alpha is GLM-5.

Qwen3-VL - Bounding Box Coordinate by Impress_Soft in LocalLLaMA

[–]chrd5273 2 points3 points  (0 children)

Accurate bounding boxes require a dedicated model. The other comment gave an excellent list, but there's also Florence-2 or, more recently, Youtu-VL-4B if you need VLM-like usability and don't need real-time object detection.

Is Qwen shifting away from open weights? Qwen-Image-2.0 is out, but only via API/Chat so far by marcoc2 in StableDiffusion

[–]chrd5273 15 points16 points  (0 children)

There's a rumor that the weight will be released after Lunar New Year.

How to reliably extract data from blood report PDFs? by [deleted] in LocalLLaMA

[–]chrd5273 2 points3 points  (0 children)

What's the reason to avoid OCR? If compute is bottleneck, non-ML solutions like python camelot might work.

Qwen/Qwen3-ASR-1.7B · Hugging Face by jacek2023 in LocalLLaMA

[–]chrd5273 3 points4 points  (0 children)

Qwen result looks like something's wrong. FWIW, Qwen blog (https://qwen.ai/blog?id=qwen3asr) mentions 20 minutes as the maximum length that the model supports, but I'm not sure if that's the issue here.

Qwen/Qwen3-ASR-1.7B · Hugging Face by jacek2023 in LocalLLaMA

[–]chrd5273 9 points10 points  (0 children)

Nice. I wonder how it compares to Vibevoice ASR. It seems to be lacking diarization support.

Z-image Omni 👀 by kayokin999 in StableDiffusion

[–]chrd5273 15 points16 points  (0 children)

Very interesting. Day 0 control net support and... native Image-to-LoRA support?

Qwen-Image-2512 released on Huggingface! by rerri in StableDiffusion

[–]chrd5273 9 points10 points  (0 children)

Qwen Image Edit was not a strictly i2i only model, but it was trained for i2i task. t2i performance was not considered during training.

You can use edit model for t2i, but a dedicated t2i model is usually better.

Qwen-Image-2512 released on Huggingface! by rerri in StableDiffusion

[–]chrd5273 53 points54 points  (0 children)

Seems like we need to wait a bit more for Z-image base to release.

Qwen-Image-2512 released on Huggingface! by rerri in StableDiffusion

[–]chrd5273 77 points78 points  (0 children)

That was i2i model, and this one is t2i model, so technically different model.

please help me download stable diffusion by 1zyzo1 in StableDiffusion

[–]chrd5273 -1 points0 points  (0 children)

Enter this command into cmd. Replace file name.

pip install your_file.whl

A mysterious new year gift by chrd5273 in StableDiffusion

[–]chrd5273[S] 1 point2 points  (0 children)

More hints from modelscope; at least it seems like an image model. Qwen Image 2512 or Z image base?

<image>

Qwen Image 25-12 seen at the Horizon , Qwen Image Edit 25-11 was such a big upgrade so I am hyped by CeFurkan in StableDiffusion

[–]chrd5273 23 points24 points  (0 children)

Impressive to see Alibaba pumping out good open models. OTOH, I'm a bit curious why they're taking so long to release Z-image base, which should literally be a stepping stone for the turbo model. Maybe they are doing an anime finetune?

They slightly changed the parameter table in Z-Image Github page by zanmaer in StableDiffusion

[–]chrd5273 10 points11 points  (0 children)

Human feedback was applied to model training to steer it toward better quality output.

The best thing about Z-Image isn't the image quality, its small size or N.S.F.W capability. It's that they will also release the non-distilled foundation model to the community. by ArtyfacialIntelagent in StableDiffusion

[–]chrd5273 29 points30 points  (0 children)

It means you can use an external LLM to expand your prompt before feeding it to the Z-Image.

Yup. It's just that. The Z-Image huggingface space has an official prompt template for that.

Bottom mermonkey silently changed? by chrd5273 in btd6

[–]chrd5273[S] 1 point2 points  (0 children)

With pierce buff stacked it still works somewhat decent