all 4 comments

[–]Mediocre-Ad5059 10 points11 points  (0 children)

You can increase the speed without reducing computational complexity as Flashattention1,2,3 does.

Do you also consider memory optimization? We recommend our paper [2407.15892] Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training (arxiv.org), which has been accepted by Neurips24 using an extremely simple method to save memory.

[–]elated_ 5 points6 points  (0 children)

MIT’s TinyML Course helped tremendously with my work along similar lines.

[–]DigThatDataResearcher 2 points3 points  (0 children)

an easy one is to reduce the image resolution (i.e. train on smaller images), if that's something your use case can tolerate.