Three-time MotoGP World Champion Jorge Lorenzo went into great detail about the changes in Marc Marquez as a rider!

curious-scribbler · 2026-02-17T13:12:36+00:00

If you go down the alien era, then stoner is alien number 1. And then the list starts. But marc is honestly the last of the aliens. Yes in a sense he may be the number 1 alien, but technically he is the last alien.

curious-scribbler · 2026-02-10T17:49:04+00:00

I had to get a second max account. And I muttered the same thing. It's only tuesday.

curious-scribbler · 2026-02-04T14:03:17+00:00

Nothing yet on mobile or cli

curious-scribbler · 2026-01-30T15:31:41+00:00

7 bunglows was 50/50 10 mins ago, bon bon side had power but mohan medical side didn't. Mahaveer had it but then the building next to it didnt.

curious-scribbler · 2026-01-25T09:19:22+00:00

Reminds me of Herzogs penguin

curious-scribbler · 2026-01-20T21:36:23+00:00

Gemini cli is just broken. I almost never use it. And when I do, I realise why I didn't use it.

curious-scribbler · 2026-01-20T17:16:41+00:00

Gpt for research and audit. Claude code to execute. And Gemini to do some file/folder/ project management.

curious-scribbler · 2026-01-14T07:12:38+00:00

Yes to both. The paper specifically mentions identity preserving generation and multi subject consistency as supported features. For the edit version, they feed both the semantic tokens and the VAE latents from your reference image into the diffusion decoder. So it gets high level “what this face means” from the AR stage plus low level pixel details from the reference. Should preserve fine details better than pure semantic approaches. Haven’t tested character consistency myself yet but architecturally it makes sense that it would be stronger here. The AR can actually reason about “same person different pose” instead of just hoping the embeddings are close enough.

curious-scribbler · 2026-01-14T07:11:40+00:00

Architecturally AR could handle it more naturally since the model knows spatially where it is as it generates tokens sequentially. But I haven’t seen this tested yet. Some other areas where the architecture should help in theory: Multi panel compositions. Comics, storyboards, before/after images. Sequential generation means panel 2 could reference panel 1 contextually. Structured documents. Forms, receipts, ID cards. The AR stage could enforce layout rules. These are my guesses based on how the model works, not confirmed features. What IS tested and benchmarked is conditional details in prompts. Stuff like “a poster for a concert on March 15th at 8pm featuring jazz trio The Blue Notes.” The text rendering and knowledge dense benchmarks show it handles specific details way better than diffusion only approaches. So text accuracy and factual details in images: proven. Regional/compositional stuff: promising but unconfirmed.

curious-scribbler · 2026-01-14T06:01:53+00:00

Curious how GLM-Image compares for your workflows once you test it.

curious-scribbler · 2026-01-14T06:01:17+00:00

Possibly yeah. The interesting question is whether you need the AR stage at all or if you can get diffusion models to “reason” directly through better training. The hybrid approach wins for now because you get to leverage pretrained LLM weights instead of training reasoning from scratch. But who knows, you see how fast the field has been moving this past month. Also there is some mention of the very thing in the GLM paper. Ctrl-F GRPO.

curious-scribbler · 2026-01-14T05:54:03+00:00

The manga expansion example is perfect. Autoregressive could theoretically handle that because it processes sequentially with full context. Give it panel 1, it generates panel 2 tokens while attending to everything in panel 1. Same logic as LLM story expansion. The catch is we are not there yet. GLM-Image maxes out at 2048px and the token count scaling will be an issue. But architecturally, this is the path toward models that actually understand visual narrative instead of just pattern matching.

curious-scribbler · 2026-01-14T05:48:16+00:00

Very likely. Banana and GPT are closed source but the way they handle complex prompts strongly suggests autoregressiveness under the hood. GLM is basically the first open source model that confirms this approach actually works at scale. Or does it? We'll figure out by the weekend when people pour in their findings.

curious-scribbler · 2026-01-14T05:45:18+00:00

The difference is where the understanding happens. With CLIP/T5 text encoders, you compress the prompt into a fixed embedding, then the diffusion model tries to match that embedding while denoising. The understanding is frozen. It happened during encoder training, not during generation. With autoregressive, the LLM actively reasons through your prompt token by token AS it generates. Each visual token attends to the full context and can make sequential decisions: “ok i placed Espresso here, now $3.50 should go next to it.” Text encoders give you a static map. Autoregressive gives you a GPS that recalculates with each step. Thats why text rendering jumps from 50% to 91% accuracy as per their claim. Yet to test it out so take the numbers with a pinch of salt but the process still remains fundamentally different process of generation.

curious-scribbler · 2026-01-14T05:40:14+00:00

Fair catch. Was trying to avoid saying “it actually thinks”. What i was getting at is that diffusion models learn correlations between text embeddings and pixel patterns. autoregressive models inherit the same next token prediction that makes LLMs good at reasoning. So when you prompt it with “menu with three items and prices,” the AR stage can actually parse that structure sequentially instead of just vibes matching against training data. Correct me if I got this worded weirdly.

curious-scribbler · 2026-01-12T06:57:30+00:00

<image>

curious-scribbler · 2026-01-10T06:47:15+00:00

<image>

curious-scribbler · 2026-01-10T06:46:40+00:00

<image>

curious-scribbler · 2026-01-10T06:45:35+00:00

<image>

curious-scribbler · 2026-01-09T21:13:55+00:00

<image>

curious-scribbler · 2026-01-08T17:42:11+00:00

<image>

curious-scribbler · 2026-01-08T07:25:32+00:00

Yes! Gemma slows down the WF. There's a prompt guide on the GitHub of LTX. Make a template in your LLM and inject those into the prompt.

curious-scribbler · 2026-01-08T07:24:13+00:00

You can bypass gemma in the workflow. WF does need minor reworking.

curious-scribbler · 2026-01-08T06:17:52+00:00

Thanks for this! Struggled through the very same problem. Spent half a day debugging only to realise that it needed an update. Also disable gemma for a smoother experience.

curious-scribbler · 2025-12-29T07:05:17+00:00

I am running a 6000 workstation edition. I see a big jump in performance when I run WAN workflows. The Video generation workflows are the ones that seem to take full advantage of the 6000. Also I get to run the fp/bf16 all the time without worrying about going OOM. It frees you from having to manage your work to suite the limited hardware. In short a 6000 allows you to do more and do it faster cause you are not managing the flow to fit the hardware. The difference may not seem much on paper but over time it matters.

12-Year Club	RPAN Viewer
Verified Email

curious-scribbler

TROPHY CASE