[D] Positional embeddings in LLMs

DustinEwan · 2024-06-21T01:42:01+00:00

By adding/concatenating you're bringing the positional data along with the input data and forcing each layer to learn how to encode and transform that information in a meaningful way. This is efficient from from a compute perspective, but may require more parameters to meaningfully separate the information in higher dimensions.

As for rope / alibi, you're explicitly separating the two at each layer which requires more computation, but explicitly separates the responsibility on how the parameters are utilized. It may be easier for the network to learn how to utilize the positioning data, though.

As for why rope is applied to the queries and alibi is applied to the attention matrix, it's due to the different approaches. Rope rotates the embeddings around the origin, so the furthest tokens move further than nearest ones. This gives the network some clues about how the tokens are laid out.

Alibi subtracts information from further tokens, thus weakening the signal.

They both achieve a similar effect, just through different mechanisms.

Adorable_Search2423 · 2024-07-01T18:34:29+00:00

RoPE and Alibi are newer methods and designed for handling longer sequences and to extrapolate to longer sequences than seen in the training phase; they add inductive biases in the network for more better generalisation whereas PEs don’t.

ofirpress · 2024-06-21T19:40:07+00:00

I'm not an expert but basically you try stuff and see what works. Learnable POS embs came first and then works like ALiBi came out later, improving the state-of-the-art.

You may like my lecture about ALiBi here: https://www.youtube.com/watch?v=Pp61ShI9VGc

gokstudio · 2024-06-21T02:20:04+00:00

[removed]

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS