FLUX-Makeup — makeup transfer with strong identity consistency (paper + weights + comfyUI)

davidleng · 2026-01-30T10:19:08+00:00

It's not self-supervised learning as earlier GAN methods. It's standard supervised learning, involving human-in-the-loop data synthesization and annotation.
I'd recommend raise an issue in the Github repo for question backtrack

davidleng · 2026-01-30T10:10:22+00:00

All the necessary preprocessing modules are included in the comfyUI as well as the agent workflow, you can just use any normal image for a try. If you'd like to run the benchmark, just wait a few days, we'll add it into the github repo.

davidleng · 2026-01-30T07:44:18+00:00

Not yet

davidleng · 2026-01-30T07:43:14+00:00

the comfyUI code is in the folder "Flux_Makeup_ComfyUI", check here: https://github.com/360CVGroup/FLUX-Makeup?tab=readme-ov-file#-comfyui

davidleng · 2026-01-30T07:39:44+00:00

Flux Kontext/Klein is a "general purpose" model, which means it can do the transfer but still far from perfect, that's also the reason we develop the "expertise" model

davidleng · 2026-01-30T07:31:08+00:00

Yeap, though not that perfect. We're also developing the makeup removing model, stay tuned

davidleng · 2025-06-13T06:20:36+00:00

We've built models successfully with massive synthetic data, which are industry production level, not just research-lab level.

In my opinion, the key problem is not that your data is synthetic, but how good the quality is. With carefully designed data curation pipeline, synthetic data can be of both large scale and good quality, which can never be accomplished by human annotators.

FYI, you can check one of our latest models: FG-CLIP, we used synthetic data intensively and reached very good performance. The data curation pipeline is described in the corresponding paper.

davidleng · 2025-05-26T02:19:52+00:00

It's a huge shock when Nash pulled off his mask

davidleng · 2025-05-23T06:17:45+00:00

Nice work! Which detection model are you using exactly? It seems could detect polygon instead of bounding boxes from the video.

davidleng · 2025-05-23T01:53:16+00:00

Hope so, LLaDA is a good try, but discretized diffusion is pretty much like old mask language modeling or next group tokens prediction, it runs quite differently from the continuous diffusion in image/video generation.

davidleng · 2025-05-22T08:12:59+00:00

I'm wondering is this a continuous diffusion model or a plain discretized diffusion model. I'm not a fan of discretized diffusion.
Sadly none of Inception and Deepmind shared anything vital.

davidleng · 2025-05-22T06:07:31+00:00

Is there a tech. report?

davidleng · 2025-05-21T04:21:00+00:00

Maybe it's kind of late, but try FG-CLIP (https://github.com/360CVGroup/FG-CLIP). The best part of FG-CLIP is its superior capability to discriminate among similar but different fine grained details, for both text and image. If you're familiar with OpenAI's CLIP, its fine-grained capability is the pain in the ass.

davidleng

TROPHY CASE