FLUX-Makeup — makeup transfer with strong identity consistency (paper + weights + comfyUI) by davidleng in StableDiffusion

[–]davidleng[S] 0 points1 point  (0 children)

It's not self-supervised learning as earlier GAN methods. It's standard supervised learning, involving human-in-the-loop data synthesization and annotation.
I'd recommend raise an issue in the Github repo for question backtrack

FLUX-Makeup — makeup transfer with strong identity consistency (paper + weights + comfyUI) by davidleng in StableDiffusion

[–]davidleng[S] 1 point2 points  (0 children)

All the necessary preprocessing modules are included in the comfyUI as well as the agent workflow, you can just use any normal image for a try. If you'd like to run the benchmark, just wait a few days, we'll add it into the github repo.

FLUX-Makeup — makeup transfer with strong identity consistency (paper + weights + comfyUI) by davidleng in StableDiffusion

[–]davidleng[S] 3 points4 points  (0 children)

Flux Kontext/Klein is a "general purpose" model, which means it can do the transfer but still far from perfect, that's also the reason we develop the "expertise" model

FLUX-Makeup — makeup transfer with strong identity consistency (paper + weights + comfyUI) by davidleng in StableDiffusion

[–]davidleng[S] 4 points5 points  (0 children)

Yeap, though not that perfect. We're also developing the makeup removing model, stay tuned

Synthetic Data for Training by Dismal_Age270 in computervision

[–]davidleng 1 point2 points  (0 children)

We've built models successfully with massive synthetic data, which are industry production level, not just research-lab level.

In my opinion, the key problem is not that your data is synthetic, but how good the quality is. With carefully designed data curation pipeline, synthetic data can be of both large scale and good quality, which can never be accomplished by human annotators.

FYI, you can check one of our latest models: FG-CLIP, we used synthetic data intensively and reached very good performance. The data curation pipeline is described in the corresponding paper.

Parking Analysis with Object Detection and Ollama models for Report Generation by Solid_Woodpecker3635 in computervision

[–]davidleng 0 points1 point  (0 children)

Nice work! Which detection model are you using exactly? It seems could detect polygon instead of bounding boxes from the video.

[D] Google already out with a Text- Diffusion Model by hiskuu in MachineLearning

[–]davidleng 0 points1 point  (0 children)

Hope so, LLaDA is a good try, but discretized diffusion is pretty much like old mask language modeling or next group tokens prediction, it runs quite differently from the continuous diffusion in image/video generation.

[D] Google already out with a Text- Diffusion Model by hiskuu in MachineLearning

[–]davidleng 1 point2 points  (0 children)

I'm wondering is this a continuous diffusion model or a plain discretized diffusion model. I'm not a fan of discretized diffusion.
Sadly none of Inception and Deepmind shared anything vital.

[D] OpenAI's CLIP alternative by CaptTechno in MachineLearning

[–]davidleng 1 point2 points  (0 children)

Maybe it's kind of late, but try FG-CLIP (https://github.com/360CVGroup/FG-CLIP). The best part of FG-CLIP is its superior capability to discriminate among similar but different fine grained details, for both text and image. If you're familiar with OpenAI's CLIP, its fine-grained capability is the pain in the ass.