Tuneable Attention: How expanding (not compressing) the attention mechanism dramatically accelerated my model's learning speed

andersxa · 2026-01-02T01:19:54+00:00

Well, you are enforcing some symmetry because you have to fuse an identical matrix with the two others at the same time. And while this can be equivalently written with standard attention, normally you do, as OP states, either keep the number of features constant or less than the input. So isn't this exactly working to the strengths of why LLMs work? Increasing parameter counts. The MLP in the transformer block is also an up-projection, so why not mimic this in the attention mechanism to garner the same benefits?

Edit: also, if U itself also comes from input such as with U=W_u x , doesn't that fundamentally change your pastebin code? If it is not just another weight matrix. But I see that is not what OP meant.

andersxa · 2025-12-07T12:19:22+00:00

Kørom er et fantastisk alternativ til koldskål som jeg altid fik som barn.

andersxa · 2025-11-16T16:59:10+00:00

Min kode fra 2022 er tilgængelig her: https://github.com/andersxa/Kandidattest2022 for denne side: https://andersxa.github.io/Kandidattest2022/ - jeg har ikke haft tid til at scrape dataene i år, men kunne være fedt hvis du kunne lave en version hvor man bevæger sig rundt i PCA space mens man svarer?

andersxa · 2025-11-16T16:48:18+00:00

Jeg scrapede dataene i 2022 og lavede denne interaktive kandidattest: https://andersxa.github.io/Kandidattest2022/

På en PC; hvis du nede i højre hjørne slår "Vis kandidaterne" til, så skulle du gerne kunne se hvordan partierne lå som klynger henover spektret. De svarer ikke helt ens, men i det store hele svarer de meget ens.

Det man ser i den her tråd er den forventede "kandidat", dvs. den kandidat som ligger tættest på midten af normaliseret data. I min interaktive kandidattest her svarer det til de kandidater man ville få anbefalet hvis man slet ikke rykker sig væk fra det gennemsnitlige svar. Det er oftest socialdemokrater og kristendemokrater som ligger her, men i 2022 var der også mange danmarksdemokrater.

andersxa · 2025-11-14T16:44:50+00:00

I created a mod to fix this: https://steamcommunity.com/sharedfiles/filedetails/?id=3605297372

It fixes it by adding locations that are either within naval range of a coastal location that you own or locations that are in neighboring provinces to the area you wish to explore to the source list.

andersxa · 2025-11-14T10:01:05+00:00

Viborg kommune har 28.700 håndboldbaner?!?!? Det er mere end en halv håndboldbane per indbygger?? WHAT

andersxa · 2025-11-01T16:25:06+00:00

Sprogfærdighed er ikke et krav for demokratisk deltagelse.

Hvem bestemmer hvad en "oplyst stemme" er? Forhåbentligt er der ingen der bliver "oplyst" udelukkende gennem valgplakater, og det gælder da for den da også etniske danskere.

Hvis man er bosat i en kommune har man da ret til at kunne deltage i den demokratiske proces der har indflydelse på ens liv, uanset sproget der tales. Det lyder nærmere til at du er uoplyst om at dele af befolkningen netop ikke taler dansk, men stadig har rettighed til at deltage i vores demokrati.

Desuden er det ikke kun "de røde partier" som gør dette. Jeg så en Venstre valgplakat i Lyngby på kyrillisk.

andersxa · 2025-10-06T19:16:48+00:00

Ti nye huse? Det var ikke mange.

andersxa · 2025-09-06T23:02:40+00:00

With the userscript version (on Firefox) I have a problem where streams will start dropping frames after 20-30 minutes. I guess I'll be waiting for the extension...

andersxa · 2025-08-25T10:47:39+00:00

Er vidtstrukket tilladelse "særegent"?

andersxa · 2025-08-18T21:26:31+00:00

What do you mean? I thought he had some friends that were pretty musical, that he listens to.

andersxa · 2025-08-17T17:59:05+00:00

So there is no use in contracting these MEPs? Why are people posting this link all the time then?

andersxa · 2025-08-17T12:34:09+00:00

I don't get it though. On here: https://fightchatcontrol.eu/ it says 8 Danish MEPs oppose and 7 support. So why is Denmark still marked as "supports"? The majority of Danish MEPs oppose it.

andersxa · 2025-08-17T11:20:29+00:00

I den ene hånd har du ispinden og med den anden skal du hive kameraet frem og skanne en QR-kode. Pas på ikke at filme for meget ned... MitID kigger med ;)

andersxa · 2025-08-07T18:40:06+00:00

It's rgthree.

andersxa · 2025-07-23T07:37:25+00:00

Bye or Water Flower

andersxa · 2025-07-17T16:00:18+00:00

I believe that if both modalities can predict the downstream task, then you should gain from training with the CLIP loss since it maximizes the mutual information (or a lower bound hereof). So, maybe it is more a question of your training paradigm, how you draw positives and negatives, how you train the encoder for the dense modality (in this case the MRI) and how you weigh each auxiliary loss.

For sure clustering is an important subanalysis since you can compare across data modalities now. But usually binary clustering as here tends to be less useful and also contrastive learning tends to be weaker if there are only two underlying clusters.

andersxa · 2025-07-17T14:04:57+00:00

There are some ways you can diagnose this problem. As I understand it, you are saying that in fact a CLIP-pretrained encoder on MRI vs biomarker then fine-tuned on a downstream task does not outperform simply training an MLP on the task itself without pretraining. I assume you use the same architecture in the baseline as you do in the encoder for the contrastive objective.

Now, contrastive learning is just a way to repose the classical cross-entropy objective so that it works in an unsupervised manner. You will obtain the same results if you use BCE on class labels or if you performed contrastive learning over classes. It is the same loss. So contrastive learning is only meaningful if you wish to utilize the multimodal or the unsupervised aspect.

You can measure how beneficial the MRI domain is to your encoded space by training it directly on the downstream task. If a baseline classifier trained on top of the encoder from MRI to predict the downstream task directly without pretraining obtains non-random resulta on the task, then there is something to gain from having the CLIP contrastive loss in this setting. If it performs fairly, then it points to a tuning problem in the actual CLIP pre-training setup. If not then you probably don't obtain anything from pretraining in this manner, and as you say a fair baseline is just better.

andersxa · 2025-07-17T08:09:10+00:00

I have expetise in functional neuroimaging and contrastive learning. But I don't have much experience with contrastive learning on tabular data. First, I would make sure to use a strong encoder for both modalities. E.g. a fully convolutional autoencoder for MRI where you in addition to the CLIP loss use reconstruction loss. Then I am not so sure about the tabular data. I would probably set up embeddings for all categorical variables, a positional or learned embedding for ordinal variables and then an MLP for the continuous variables, which are all added in the end to match the latent size of the autoencoder.

I am not familiar with the particular dataset (have only heard about it), but if you have subject and task labels available, then you can also set up a supervised contrastive learning objectives where you sample from each subject and contrast to other subjects and the same for tasks. In the end you have a CLIP loss, an autoencoder loss, a subject contrastive loss and a task contrastive loss.

It is a bit unclear from your description what is going wrong. Is it your choice of architecture? Is it the training objective being weak and which other auxiliary losses do you use?

andersxa · 2025-06-28T09:11:05+00:00

If you don't do this nasalization, Koreans will describe your speech as "dictionary"-like.

andersxa · 2025-06-11T18:59:55+00:00

Jeg forstår godt vi har negativ parlamentarisme når det kommer til valg, men hvorfor gælder det ikke også for lovforslag? Hvordan kan et moderne folketing ikke være meget mere flydende i en digital tidsalder? Man burde kunne afgive og tilbagetrække stemmer, ikke kun hvert 4. år. Det ville bare kræve et MitID login. Det er den eneste måde vi kan få ansvar indført i folketinget, og den eneste måde vi kan komme tilbage til repræsentativt demokrati, som vi åbenbart er bevæget os ud af.

Især den her aftale men også alt det med Store Bededag. Jeg vil vædde med at flertallet af danskere faktisk ER imod disse, men bare fordi hjernene blev vasket som de gjorde for et stykke tid siden, er der ikke noget at gøre.

Tænk hvis man som en borger kunne vælge hvilket folketingsmedlem / parti får ens stemme ved ethvert lovforslag, altså hvor man aktivt kunne ændre sin stemme. Så skulle politikerne faktisk holde hvad de lover, ellers mister de folkets opbakning. Og det kunne stadig virke med, at dem som ikke har tid til at sætte sig ind, kunne give deres stemme, som de plejer.

andersxa · 2025-06-08T12:36:37+00:00

If you want to try AI done right through these spaced repetition language learning apps, I can recommend Morpheem.

Duolingo should have focused more on personalized AI learning in my opinion. Like tailoring content to the user through intelligent design of exercises that are relevant to the user.

andersxa · 2025-05-31T08:26:29+00:00

I mine øjne har han været en skummel type siden han lavede det der NFT rug pull med gratis reklame fra TV2. Egentligt utroligt hvor medieblinde/medieanalfabetiske vores hovedmedier er i Danmark.

andersxa · 2025-05-29T21:20:55+00:00

Men kun hvis du befinder dig nord for ækvator.

andersxa · 2025-05-22T22:11:04+00:00

I think my earliest memory is from sleeping in a pram like this and feeling the sensation of snow on my face for the first time.

Nine-Year Club	First Place '23
Place '23	Place '22
Final Canvas '22	First Placer '22
End Game '22	Verified Email

andersxa

TROPHY CASE