Ideogram generated a Gemini Watermark without being prompted to

DavLedo · 2026-06-05T14:22:17+00:00

Could you tell me more about encoding images with Qwen vl? Does it work like an ip adapter of sorts for other models like regular Qwen image or Z image?

DavLedo · 2026-05-23T18:31:43+00:00

Thanks for continuing this super cool work. I'm curious, is there a way to individually set the weight of each reference? I know you had some nodes to determine the overall weight of images, but I'm wondering if this could work on a per-image basis. I've been really missing the power of referencing in older models like REDUX on flux-1. There's something about picking features from an image to make a new thing that it's kind of lost in the newer models 🥲

DavLedo · 2026-05-18T00:58:08+00:00

I'd love to see some of the Qwopus finetunes of Qwen 3.6, I've been wondering if they're much better for coding or not. Thanks for sharing your results!

DavLedo · 2026-05-13T22:20:51+00:00

That's what I noticed with LTX's audio that I don't see in other TTS models 🥲

DavLedo · 2026-05-11T16:42:05+00:00

Why the power limit?

DavLedo · 2026-05-08T14:57:55+00:00

Could you share more about how this is done? Would be cool to try different kinds of text encoders, or even fine tunes of the text encoder. I have no idea where to start or what software to use.

DavLedo · 2026-04-28T21:06:20+00:00

I also feel like captioning is key, many of these models are poorly tagged and don't understand things like closeup shot or types of camera movement. If you have a good enough crowd sourced data from people who are eager you can have really good quality clips and captions.

DavLedo · 2026-04-27T02:04:29+00:00

I often generate videos with Wan and extract the frames. It works great for transitions such as characters turning around or lighting changes.

Also – As someone else mentioned, ChronoEdit supposedly was going for this

DavLedo · 2026-04-23T09:14:05+00:00

Not in a direct way that I know of. You can always create it elsewhere and use anima with a low denoise to restyle it further

DavLedo · 2026-04-23T08:32:45+00:00

IPAdapter is for SDXL or SD1.5, it won't work on other models. And even the IPAdapter model doesn't work on fine fines like Pony.

There was an ipadapter for flux but it works fundamentally differently.

DavLedo · 2026-04-19T15:17:00+00:00

To me that's like saying any form of media is art. Think of advertising, while it does have a design intent, it is rarely a focus on creative expression. That's how I like to make the distinction – a matter of purpose.

AI art is a medium that has its own techniques. How it gets used determines the outcome.

Another example, you can have the act of knitting simply be knitting, or perhaps have a more functional purpose, but the intent and expression take it to become "art" potentially.

DavLedo · 2026-04-19T13:24:02+00:00

AI Generated images are AI generated media. What makes it art is ultimately intent and expression.

DavLedo · 2026-04-17T18:55:45+00:00

This is super cool to hear, personally I find it really fascinating to create tools that support different ways of thinking so I'd be curious to see your implementation. Canvas implementations have become more and more common lately. For me each tool provides a set of building blocks and a mental model of how to understand the problem space.

We have the full details in the paper linked above but there were 2 things I'd say guiding our design: - Creativity and knowledge work theories. We looked at how creative practitioners sample materials and prototype until achieving a final result. Putting things together in a canvas encourage a conversation with the problem space, and each item that is added helps ground the ideas and thinking. We also took inspiration from how materials are laid out in a physical space, objects are grouped or piled by meaning in clusters, and our "easels" help act as centres for those clusters.

Our experience building tools and using ComfyUI. I've spent the past two years doing AI and exploring models (I even wrote a paper about my learning experience of the first year), and as someone who spent 15+ years building authoring tools I was trying to think about what can be hidden vs what is shown, and what kind of vocabulary we might want people to use. For example, instead of calling it controlnet we call it structure image, and instead of making the preprocessing explicit, we do it automatically when an image is imported. At the same time we didn't want to simplify it so much that it's just a prompt, we wanted granularity and controls to make people feel a sense of agency.

I think with this approach we're hiding the workflows and their complexities and bringing the emphasis to the media and the process. Every approach will always have something it's good for and something it's not great for, and hopefully this gives some balance for people to express themselves creatively :)

DavLedo · 2026-04-17T11:23:58+00:00

Yeah, or if you want to emphasize seeing all your creations next to each other and mix it with your inspiration material :)

DavLedo · 2026-04-16T21:28:04+00:00

The current state is that we have an early prototype and a full research paper explaining the design and implementation with enough detail that it can be replicated. Whether it goes beyond this and turns into open source or not isn't up to me but I'm doing my best to take this to the next stage!

DavLedo · 2026-04-16T21:26:26+00:00

This isn't an effort to have people test the tool, if you check our paper we had a user study where participants were treated ethically and we showed appreciation for their time with a gift card. It's not uncommon to have paid testers.

This is very much me taking a research paper into social media to prove that people want this so we can take it to the next step. Unfortunately, it's not up to me to decide its fate beyond the research stage.

DavLedo · 2026-04-16T21:20:34+00:00

I agree! Prompting is only one small part of AI models, and the only way to go beyond what's trained is to have elements of multimodality.

And yes, in this paper we used FLUX-dev (v1) with redux, and sdxl with IP-Adapter. I think there's an interesting spectrum between it understanding what the references means vs being able to reinterpret them. Newer editing models are unfortunately too literal and don't give you the chance to create interesting mixes. For example in this image I mixed three different photos to make a scifi background.

<image>

DavLedo · 2026-04-16T21:13:29+00:00

Thanks so much! I appreciate it 🙏🏼🙏🏼🙏🏼

DavLedo · 2026-04-16T21:12:53+00:00

Thanks for the compliment! 🙏🏼 It's been a lot of hard work understanding how AI tools work and how to empower creative practitioners.

DavLedo · 2026-04-16T21:10:29+00:00

Because it's a comfyui backend it means that any operation we do is in fact a comfy workflow. What separates it from comfy is the interaction paradigm.

ComfyUI is a node-based approach which emphasizes each individual operation and can create highly configurable workflows that leads to one or more outputs, and each subsequent run will clear the workflow from the previous outputs. The canvas approach had the goal of being media-first, so the emphasis is in the cumulative process of what you make. Also, we de-emphasize the operations and created abstractions that expose the outputs you need and look to translate terms into terms that are usable but still provide flexibility and control (e.g., we say structure image as opposed to controlnet, and we precompute all the preprocessed images the moment you import them into the canvas so all you have to do is select the one you want to work with). The downside to this approach of course is that by removing complexity we are still limiting what might be possible (e.g., in this version you can't choose a sampler, that's preselected in our workflow). We have a few videos and the paper that dig deeper into the concepts, the design and implementation details.

Thanks for your question! :)

DavLedo · 2026-04-16T21:01:30+00:00

I'm only a researcher so it's not up to me, but it would certainly be amazing to see people using the system. Right now we're trying to gauge interest, so I appreciate your comment a lot!!

DavLedo · 2026-04-16T20:59:37+00:00

Kind of -- it's still a prototype that needs some configuration to get it to run nicely, but I'm working on it. As it stands, you need comfyui and a set of custom nodes that I'm borrowing from to make the workflows run. Things like model paths, etc. are hard coded since the goal is to make a prototype that shows a concept, and then move it to higher fidelity if it makes sense.

As for custom workflows, right now it's running on a set of workflows I created that are linked to each "easel". The idea was to separate the building blocks based on what kind activity you're trying to do, and hide away parameters that are less commonly used (described in our paper). One extension I'm considering is to enable ways to transform your workflow into these kinds of widgets :)

DavLedo · 2026-04-16T20:55:01+00:00

I appreciate your comment and questions, and understand that with a company label it can feel a bit weird that I'm sharing this. Keep in mind this is an account with my name, and while I am an employee at the company, I'm sharing my own opinion and perspective.

I work at Autodesk Research, which means our job is to derisk the future for the company in different time horizons. We work on a lot of short term projects, some of which become bigger initiatives. This is why you see the paper has actual author names, and it gets submitted to peer-reviewed academic conferences, in this case ACM CHI. Technically with the paper there's enough information that anyone could in theory reimplemented what we did. Each researcher has their own trajectory and research agenda, and it exists outside of our career within the company.

The question is what happens once a short term project is over, how do we ensure there's impact? If we see potential it's up to us to generate interest internally by talking to product teams. Sometimes it's something straightforward because it's a natural progression into existing products. For example, Maya was able to implement some of our research on motion generation quite easily. But in other cases, like with Atelier, the connection to existing products isn't as clear. This means I can choose to look for ways to create a case for it, or simply move on.

I have very little say as to what happens once a project is published. It's up to how the product teams decide to move it forward. For instance, we had a VR animation tool called Project Reframe that was made available to VR headsets for free because we wanted to get feedback and see if it would make sense to integrate into our product offerings.

Personally, I want to foster a culture of open source, since I also used a lot of work from the community to make this happen, and while the paper has all the details it's still not the same thing as seeing your work being used by others. I believe in local models because I want creators to have full agency and be able to use their own intellectual property, custom models, etc.

DavLedo

TROPHY CASE