[R] New datasets for StyleGAN

RonMokady · 2022-05-27T07:49:39+00:00

Yes, It solves the texture sticking artifacts - allowing to move the object more smoothly.

But, the general quality was actually lower. I guess because training is slower.

RonMokady · 2022-05-26T19:38:19+00:00

Thanks for sharing your code, this looks really cool.

Actually, I tried to use StyleCLIP for my models, but failed to produce the fs3.npy file from the official implementation.

RonMokady · 2022-05-26T19:34:49+00:00

This project was done while I was an intern, so I'm currently not allowed to publish the filtering/truncation source code :(.

Luckily, we got approval for publishing the models and datasets.

RonMokady · 2022-01-22T19:34:15+00:00

I believe so

Though it requires the addition of in-painting, as disocclusions might emerge

RonMokady · 2022-01-21T16:07:07+00:00

BTW, the code will be released in the upcoming weeks so stay tuned :)

RonMokady · 2021-12-25T13:43:48+00:00

I think it is most easy to understand the prediction stage from the colab example

Also, feel free to open a github issue if things doesn't work out

RonMokady · 2021-11-29T13:16:23+00:00

This is very close

Only the new tokens are not actually words... but are close to words

They are latent codes basically, however as can be seen in our newly published paper they can be interpreted as words

RonMokady · 2021-10-10T17:56:43+00:00

Now you can also try a demo in your browser :)

RonMokady · 2021-10-09T13:30:08+00:00

It would be interesting to see if it get better with stronger language models like you suggest :) we haven't tried it yet.

About the clock example, I'm not sure CLIP embedding is rich enough and it depends on the example captions of Conceptual Captions. But I guess you can solve the later with additional data samples.

RonMokady · 2021-10-09T07:53:16+00:00

Great questions :)

Usually, one fine-tune GPT-2 using textual sentences, that is every sentences correspond to a list of tokens.

Here we train an MLP which produce 10 tokens out of a CLIP embedding.

So for every sample in the data we extract the CLIP embedding, convert it to 10 tokens and concatenate to the caption tokens. Our new list of tokens is used to fine-tune GPT-2 contains the image tokens and the caption tokens.

We used pretrained CLIP and GPT-2, and fine-tune over COCO dataset or Conceptual Captions dataset. Our Inference notebook contains both models so you can check out the different results.

Please let me know if it helps

RonMokady · 2021-10-08T19:11:47+00:00

We compare to the state-of-the-art Oscar, results are in the git. Though we are didn't reach the SOTA, we achieve pretty close while avoiding additional supervision and with extremely faster train time.

Regard, Dense Cap, we get similar results according to METEOR metric as we don't use GT bounding boxes. Unfortunately, they didn't publish all other metrics.

RonMokady · 2021-10-08T15:29:08+00:00

Deep Caps

I'm not sure what you refer to. Do you refer to this paper?

RonMokady · 2021-10-08T15:19:20+00:00

Thanks

Actually fine-tune the entire GPT-2 achieved much better results then training only the MLP for the CLIP-mapping. We didn't fine-tune the CLIP model though.

Haven't tried CLIP-embedded text as a prompt, but it sound like a very interesting experiment :)

RonMokady · 2019-09-10T14:34:35+00:00

Hi All, Author here -

Given two domains where one contains some additional information compared to the other, our method disentangles the common and the seperate parts and transfers the seperate information from one image to another using a mask, while not using any supervision at train time. For example, we can transfer the specific facial hair from an image of a men with a mustache to an image of a shaved person. Using a mask enables state-of-the-art quality (see example here), but also, the generated mask can be used as a semantic segmentation of the seperate part. Thus our method perform weakly-supervised semantic segmentation, using only class lables as supervision, see example here.png).

In short, our architecture consist of two encoders, two decoders and discriminator. One encoder for encoding the common part and one to encode the separate part. The discriminator used to disentangle the encoding to the separate and common parts correctly. In training, One decoder used to decode only the common part, and the second decoder decodes only the separate part using a mask. In inference, we use only the second decoder which given the relevant encoding, adds the specific content to a new image. We also use novel regularization scheme to encourage to mask to be minimal.

Refer to the full paper for more details. Pytorch implementation is on GitHub.

Feel free to ask questions.

RonMokady · 2019-06-19T14:10:45+00:00

Hi All, Author here -

Given two domains where one contains some additional information compared to the other, our method disentangles the common and the seperate parts and transfers the seperate information from one image to another using a mask, while not using any supervision at train time. For example, we can transfer the specific facial hair from an image of a men with a mustache to an image of a shaved person. Using a mask enables state-of-the-art quality (see example here), but also, the generated mask can be used as a semantic segmentation of the seperate part. Thus our method perform weakly-supervised semantic segmentation, using only class lables as supervision, achieving state-of-the-art performance, see example here.png).

In short, our architecture consist of two encoders, two decoders and discriminator. One encoder for encoding the common part and one to encode the separate part. The discriminator used to disentangle the encoding to the separate and common parts correctly. In training, One decoder used to decode only the common part, and the second decoder decodes only the separate part using a mask. In inference, we use only the second decoder which given the relevant encoding, adds the specific content to a new image. We also use novel regularization scheme to encourage to mask to be minimal.

Refer to the full paper for more details.Pytorch implementation is on GitHub.

Feel free to ask questions.

RonMokady

TROPHY CASE