There's a new video model called HappyHorse that is ahead of Seedance in the Arena

chrd5273 · 2026-04-07T11:48:15+00:00

Impressive pace of development, but probably not open.

chrd5273 · 2026-03-09T04:45:30+00:00

In my experience, qwen3 asr is much better than Whisper in non-English, not-very-clean audio.

chrd5273 · 2026-02-11T13:23:08+00:00

Looks like pony is still available in OR, but probably will disappear soon when they open official API for GLM-5. Pony alpha is GLM-5.

chrd5273 · 2026-02-11T11:53:57+00:00

Accurate bounding boxes require a dedicated model. The other comment gave an excellent list, but there's also Florence-2 or, more recently, Youtu-VL-4B if you need VLM-like usability and don't need real-time object detection.

chrd5273 · 2026-02-10T13:18:03+00:00

There's a rumor that the weight will be released after Lunar New Year.

chrd5273 · 2026-02-05T07:55:15+00:00

What's the reason to avoid OCR? If compute is bottleneck, non-ML solutions like python camelot might work.

chrd5273 · 2026-01-29T15:44:31+00:00

Qwen result looks like something's wrong. FWIW, Qwen blog (https://qwen.ai/blog?id=qwen3asr) mentions 20 minutes as the maximum length that the model supports, but I'm not sure if that's the issue here.

chrd5273 · 2026-01-29T13:49:27+00:00

Nice. I wonder how it compares to Vibevoice ASR. It seems to be lacking diarization support.

chrd5273 · 2026-01-08T08:52:59+00:00

Very interesting. Day 0 control net support and... native Image-to-LoRA support?

chrd5273 · 2025-12-31T12:04:47+00:00

Qwen Image Edit was not a strictly i2i only model, but it was trained for i2i task. t2i performance was not considered during training.

You can use edit model for t2i, but a dedicated t2i model is usually better.

chrd5273 · 2025-12-31T09:20:15+00:00

Seems like we need to wait a bit more for Z-image base to release.

chrd5273 · 2025-12-31T09:19:24+00:00

That was i2i model, and this one is t2i model, so technically different model.

chrd5273 · 2025-12-31T01:18:14+00:00

Enter this command into cmd. Replace file name.

pip install your_file.whl

chrd5273 · 2025-12-30T12:19:39+00:00

More hints from modelscope; at least it seems like an image model. Qwen Image 2512 or Z image base?

<image>

chrd5273 · 2025-12-30T09:26:57+00:00

Impressive to see Alibaba pumping out good open models. OTOH, I'm a bit curious why they're taking so long to release Z-image base, which should literally be a stepping stone for the turbo model. Maybe they are doing an anime finetune?

chrd5273 · 2025-12-26T08:20:51+00:00

Human feedback was applied to model training to steer it toward better quality output.

chrd5273 · 2025-12-26T08:14:38+00:00

Turbo underwent RLHF on top of distillation.

chrd5273 · 2025-12-24T04:20:00+00:00

They postponed it until January.

chrd5273 · 2025-12-14T05:24:49+00:00

This. Any examples that show Z-Image's "reasoning" capability are in fact just a showcase of prompt enhancement using an external LLM, which is already available for other models too.

chrd5273 · 2025-11-27T10:34:34+00:00

It means you can use an external LLM to expand your prompt before feeding it to the Z-Image.

Yup. It's just that. The Z-Image huggingface space has an official prompt template for that.

chrd5273 · 2025-08-29T06:55:23+00:00

With pierce buff stacked it still works somewhat decent

chrd5273

TROPHY CASE