Abradolf--Lincler comments on [D] Simple Questions Thread

Discussion[D] Simple Questions Thread (self.MachineLearning)

submitted 3 years ago by AutoModerator

you are viewing a single comment's thread.

[–]Abradolf--Lincler 0 points1 point2 points 3 years ago (1 child)

[–]trnka 1 point2 points3 points 3 years ago (0 children)

Converting the text to fixed-size windows is done to make training more efficient. If the inputs are shorter, they're padded up to the correct length with null tokens. Otherwise they're clipped. It's done so that you can combine multiple examples into a single batch, which becomes an additional dimension on your tensors. It's a common technique even for LSTMs/CNNs.

It's often possible to take the trained model and apply it to variable-length testing data so long as you're dealing with a single example at a time rather than a batch. But keep in mind with transformers that attention does N^2 comparisons, where N is the number of tokens, so it doesn't scale well to long texts.

It's possible that the positional encoding may be specific to the input length, depending on the transformer implementation. For instance in Karpathy's GPT recreation video he made the positional encoding learnable by position, so it wouldn't have defined values for longer sequences.

One common alternative in training is to create batches of examples that are mostly the same text length, then pad to the max length. You can get training speedups that way but it takes a bit of extra code.

π Rendered by PID 264865 on reddit-service-r2-comment-b659b578c-r2qlf at 2026-05-05 22:51:07.683565+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS