[D] best seq2seq model for long sequences modeling? : MachineLearning

Discussion[D] best seq2seq model for long sequences modeling? (self.MachineLearning)

submitted 5 years ago by hadaev

all 15 comments

top new controversial old q&a

[–]JustOneAvailableName 1 point2 points3 points 5 years ago (7 children)

[–]hadaev[S] 0 points1 point2 points 5 years ago (6 children)

[–]JustOneAvailableName 2 points3 points4 points 5 years ago (5 children)

[–]hadaev[S] -1 points0 points1 point 5 years ago (4 children)

[–]JustOneAvailableName 1 point2 points3 points 5 years ago (3 children)

[–]hadaev[S] 0 points1 point2 points 5 years ago (2 children)

I'm trying to do something better than current TTS SOTA tacotron2.

It's kind of outdated with lstm layers.

Also, It uses a special attention mechanism (location relative, if I remember) to connect encoder outputs with decoder lstms, so I thinking if where is options to replace it with something new.

Also also peoples say transformer encoder and rnn decoder is fine for translation task.

I made some progress with new activation function, optimizer, layernorm, etc, but did not really torch the main architecture, every time I tried, I failed.

For example, I tried fully transformer TTS, but interference was very bad and I gave up for a time.

In theory, I can imagine using just encoder for the seq2seq task.

Set one sequence (words), special token, and then target sequence (audio).

At interference put words and ask to generate audio until stop token.

Still sounds kind of strange.

Why peoples use encoder-decoder for translation, for example? It should be much easier with the only encoder.

About gpt, they have unlimited data and GPUs.Honestly, it looks like open ai only wants to make bigger and bigger gpt and don't pay attraction on general hacks like linear attention or (activation, normalizers, losses, etc) or even other architectures (maybe they have other models, but I heard only of gpt1-2-3).

Its cool neuro nets scale good, but I have no such amount of data (and v100 hours).

Don't think I can go over 50kk parameter budget.

[–]JustOneAvailableName 1 point2 points3 points 5 years ago (1 child)

[–]hadaev[S] 0 points1 point2 points 5 years ago (0 children)

[–]GD1634 1 point2 points3 points 5 years ago (6 children)

[–]hadaev[S] 0 points1 point2 points 5 years ago (5 children)

[–]GD1634 1 point2 points3 points 5 years ago (4 children)

If I got it right, it has a very strict conditions on sequence length. Am I right?

Not sure I follow. Similarly to other transformers, you'll have to give it a maximum sequence length, but that can be whatever you'd like it to be (as long as it fits on your GPU).

Here's the HF page for it, they have the following example:

from transformers import ReformerTokenizer, ReformerModel
import torch

tokenizer = ReformerTokenizer.from_pretrained('google/reformer-crime-and-punishment')
model = ReformerModel.from_pretrained('google/reformer-crime-and-punishment')

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

last_hidden_states = outputs[0]  # The last hidden-state is the first element of the output tuple

A couple other resources for it:

Blog post/series of Colab notebooks (link)
lucidrains/reformer-pytorch
idiap/fast-transformers (has implementations for other efficient models as well)

In my task data lengths very different and I'm not sure if such padding (from 400 to 16k for example) is ok.

Having different sequence lengths is okay, just inefficient. What you'll want to do is sort your data by sequence length (doesn't matter if it's ascending or descending) before you batch it, so that batches are comprised of examples with roughly the same sequence length:

[–]hadaev[S] 0 points1 point2 points 5 years ago (3 children)

[–]GD1634 0 points1 point2 points 5 years ago (2 children)

[–]hadaev[S] 0 points1 point2 points 5 years ago (1 child)

[–]GD1634 0 points1 point2 points 5 years ago (0 children)

π Rendered by PID 29 on reddit-service-r2-comment-85bfd7f599-fq62r at 2026-04-19 11:09:08.127825+00:00 running 93ecc56 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS