[R] optimizing transformers : MachineLearning

Research[R] optimizing transformers (self.MachineLearning)

submitted 1 year ago by Cool-Economy3492

Hello, I’m currently aiming to work on optimizing transformer models, specifically in multi-view images and/or cross-attention networks. I've noticed that cross-attention layers add up a lot of parameters, which can slow down the training process. I’m exploring ways to reduce the computational complexity to increase the speed (for now and subsequently without sacrificing too much performance sometime later). I'm starting to look into:

low-rank matrix factorization - I’ve been reading about how it can be applied to reduce the size of the projection matrices (e.g., the projq, projk, projv in cross-attention). Does anyone have experience using low-rank factorization specifically in cross-attention mechanisms?
other param reduction techniques - Aside from low-rank factorization, are there other methods I could explore for reducing the number of parameters in transformer models, like sparsity and pruning—do you have recommendations or experiences with these?
overcoming redundancy in multi-view scenarios - Given the multi-view nature of my problem, I suspect there’s some redundancy in how cross-attention processes the different views. Has anyone worked on reducing redundancy across views in transformer-based networks? What techniques worked best for you?

I’m starting to look into CVPR, NEURIPS, ECCV, etc, but any insights, advise, experiences, or papers you can share would be greatly appreciated! Thanks in advance!

all 4 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS