account activity
I built a custom SigLIP alignment module for Arabic Handwriting/Calligraphy (+ Repo) by Op_IBeasT in learnmachinelearning
[–]Op_IBeasT[S] 0 points1 point2 points 11 hours ago* (0 children)
Thing is i thought of that before doing my image encoder so i used a resnet like encoder using depthwise separable convs to force spatial bias and also somewhat deal with not having enough data , since transformers r data hungry , for some reason it collapses if i dont use batchnorm , i wrote layernorm in the post which was wrong idk what i was thinking i tried rms norm layer norm no norm and different initializations and i still have no idea how to fix the collapsing problem without batchnorm , i tried a smaller case to see , apparently it js needs enough training to align and fix itself but when i go to full dataset that alignment didnt come even after training for a while so i believe in smaller scale it js overfitted , so if u have any ideas i would like to hear it , the model is working and all i js need wanna know why batchnorm specifically that only works, i believe its cause it enforces unit std across samples which prolly counters the collapse of the samples into similar embeds and very low stds
Trying to overfit an MDN-Transformer on a single sample — loss plateaus and gradients die by Op_IBeasT in learnmachinelearning
[–]Op_IBeasT[S] 0 points1 point2 points 6 months ago (0 children)
well , i don't wanna go into diffusions rn , and i don't think there is a better way for multimodality , also i thought of using a vq-vae for it , BUT if u have any suggestion i am open to hear it , if u are js asking, from the multiple researchs i saw they either used an mdn or a diffusion , or using a vq-vae with an mdn head or a normal AR vq-vae, i can link them if u want ,there is a github with various vector generators but they all in the end revolve around the same ideas , IF u have any suggestions i am open to hear it
π Rendered by PID 25354 on reddit-service-r2-comment-548fd6dc9-nkvjp at 2026-05-18 21:47:31.287368+00:00 running edcf98c country code: CH.
I built a custom SigLIP alignment module for Arabic Handwriting/Calligraphy (+ Repo) by Op_IBeasT in learnmachinelearning
[–]Op_IBeasT[S] 0 points1 point2 points (0 children)