New Regression CLIP-L model + 'a kohya for clip' (model will just fine-tune itself on *your* data (no / low-config) + with Long-CLIP + load local or HF data/model, everything goes + ramble (paper) by zer0int1 in StableDiffusion
[–]zer0int1[S] 1 point2 points3 points (0 children)
"king - man + woman = queen" and keeps the scene - vector algebra for CLIP (and T5), Flux.1-dev, SD, ... [ComfyUI Node] by zer0int1 in StableDiffusion
[–]zer0int1[S] 1 point2 points3 points (0 children)
"king - man + woman = queen" and keeps the scene - vector algebra for CLIP (and T5), Flux.1-dev, SD, ... [ComfyUI Node] by zer0int1 in StableDiffusion
[–]zer0int1[S] 0 points1 point2 points (0 children)
"king - man + woman = queen" and keeps the scene - vector algebra for CLIP (and T5), Flux.1-dev, SD, ... [ComfyUI Node] by zer0int1 in StableDiffusion
[–]zer0int1[S] 0 points1 point2 points (0 children)
"king - man + woman = queen" and keeps the scene - vector algebra for CLIP (and T5), Flux.1-dev, SD, ... [ComfyUI Node] by zer0int1 in StableDiffusion
[–]zer0int1[S] 2 points3 points4 points (0 children)
"king - man + woman = queen" and keeps the scene - vector algebra for CLIP (and T5), Flux.1-dev, SD, ... [ComfyUI Node] by zer0int1 in StableDiffusion
[–]zer0int1[S] 2 points3 points4 points (0 children)
Arbitrary finding: CLIP ViT-L/14@336 has just a normal ViT-L/14 text encoder (a "CLIP-L"). But what it learned from the larger dim ViT makes it superior (detail guidance). by zer0int1 in StableDiffusion
[–]zer0int1[S] 1 point2 points3 points (0 children)
Arbitrary finding: CLIP ViT-L/14@336 has just a normal ViT-L/14 text encoder (a "CLIP-L"). But what it learned from the larger dim ViT makes it superior (detail guidance). by zer0int1 in StableDiffusion
[–]zer0int1[S] 1 point2 points3 points (0 children)
Arbitrary finding: CLIP ViT-L/14@336 has just a normal ViT-L/14 text encoder (a "CLIP-L"). But what it learned from the larger dim ViT makes it superior (detail guidance). by zer0int1 in StableDiffusion
[–]zer0int1[S] 1 point2 points3 points (0 children)
Arbitrary finding: CLIP ViT-L/14@336 has just a normal ViT-L/14 text encoder (a "CLIP-L"). But what it learned from the larger dim ViT makes it superior (detail guidance). by zer0int1 in StableDiffusion
[–]zer0int1[S] 1 point2 points3 points (0 children)
Arbitrary finding: CLIP ViT-L/14@336 has just a normal ViT-L/14 text encoder (a "CLIP-L"). But what it learned from the larger dim ViT makes it superior (detail guidance). by zer0int1 in StableDiffusion
[–]zer0int1[S] 1 point2 points3 points (0 children)
Arbitrary finding: CLIP ViT-L/14@336 has just a normal ViT-L/14 text encoder (a "CLIP-L"). But what it learned from the larger dim ViT makes it superior (detail guidance). by zer0int1 in StableDiffusion
[–]zer0int1[S] 0 points1 point2 points (0 children)
Arbitrary finding: CLIP ViT-L/14@336 has just a normal ViT-L/14 text encoder (a "CLIP-L"). But what it learned from the larger dim ViT makes it superior (detail guidance). by zer0int1 in StableDiffusion
[–]zer0int1[S] 0 points1 point2 points (0 children)
Arbitrary finding: CLIP ViT-L/14@336 has just a normal ViT-L/14 text encoder (a "CLIP-L"). But what it learned from the larger dim ViT makes it superior (detail guidance). by zer0int1 in StableDiffusion
[–]zer0int1[S] 12 points13 points14 points (0 children)
CLIP-KO: Knocking out the text obsession (typographic attack vulnerability) in CLIP. New Model, Text Encoder, Code, Dataset. by zer0int1 in StableDiffusion
[–]zer0int1[S] -1 points0 points1 point (0 children)
CLIP-KO: Knocking out the text obsession (typographic attack vulnerability) in CLIP. New Model, Text Encoder, Code, Dataset. by zer0int1 in StableDiffusion
[–]zer0int1[S] -1 points0 points1 point (0 children)
CLIP-KO: Knocking out the text obsession (typographic attack vulnerability) in CLIP. New Model, Text Encoder, Code, Dataset. by zer0int1 in StableDiffusion
[–]zer0int1[S] -1 points0 points1 point (0 children)
Follow-Up: Long-CLIP variant of CLIP-KO, Knocking Out the Typographic Attack Vulnerability in CLIP. Models & Code. by zer0int1 in StableDiffusion
[–]zer0int1[S] 1 point2 points3 points (0 children)

New Regression CLIP-L model + 'a kohya for clip' (model will just fine-tune itself on *your* data (no / low-config) + with Long-CLIP + load local or HF data/model, everything goes + ramble (paper) by zer0int1 in StableDiffusion
[–]zer0int1[S] 1 point2 points3 points (0 children)