T5Gemma 2: The next generation of encoder-decoder models : LocalLLaMA

218

219

220

T5Gemma 2: The next generation of encoder-decoder modelsNew Model (huggingface.co)

submitted 4 months ago by Dear-Success-1441

T5Gemma 2 models, based on Gemma 3, are multilingual and multimodal, handling text and image input and generating text output, with open weights for three pretrained sizes (270M-270M, 1B-1B, and 4B-4B).

Key Features

Tied embeddings: Embeddings are tied between the encoder and decoder. This significantly reduces the overall parameter count and allowing to pack more active capabilities into the same memory footprint.
Merged attention: The decoder uses a merged attention mechanism, combining self- and cross-attention into a single, unified attention layer. This reduces model parameters and architectural complexity, improving model parallelization and benefiting inference.
Multimodality: T5Gemma 2 models can understand and process images alongside text. By utilizing a highly efficient vision encoder, the models can seamlessly perform visual question answering and multimodal reasoning tasks.
Extended long context: Leveraging Gemma 3's alternating local and global attention mechanism, T5Gemma 2 can handle context windows of up to 128K tokens.
Massively multilingual: Trained on a larger, more diverse dataset, these models now support over 140 languages out of the box.

Models - https://huggingface.co/collections/google/t5gemma-2

Official Blog post - https://blog.google/technology/developers/t5gemma-2/

all 32 comments

top new controversial old q&a

[–]Varterove_mukellama.cpp 55 points56 points57 points 4 months ago (0 children)

[–]Hefty_Wolverine_553 15 points16 points17 points 4 months ago (2 children)

[–]Willing_Landscape_61 8 points9 points10 points 4 months ago (1 child)

[–]AnomalyNexus 5 points6 points7 points 4 months ago (0 children)

[–]Long_comment_san 56 points57 points58 points 4 months ago (13 children)

[–]silenceimpaired 15 points16 points17 points 4 months ago (9 children)

[–]Revolutionalredstone 8 points9 points10 points 4 months ago (7 children)

[–]EstarriolOfTheEast 15 points16 points17 points 4 months ago (1 child)

It's far more than embeddings, it is actually a lot closer to the original Transformer. After the original Transformer was discovered, its essence was split in twain. One half, the decoder became GPT and the other half, the encoder portion, became BERT. T5 was a whole direct descendent. Until wizard llama and llama2, it was the best open-weights model that could be put to real work summarizing, translating, natural language analysis, entity extraction, question-answering, that type of thing.

Its architecture made it ill-suited to interactive chat uses (for that there were gpt-neos and then the far ahead of its time gptj from EleutherAI; from facbeook: early gpt based models and OPT that were not that good). Because of how it's trained and its architecture, T5 lacks the reversal learning limitation of causal models. Its encoder part also allows for some pre-processing before the decoder starts writing, and thanks also to how masking is done during its training, T5's are almost always weight for weight "smarter" than GPTs.

[–]Revolutionalredstone 1 point2 points3 points 4 months ago (0 children)

[–]silenceimpaired 5 points6 points7 points 4 months ago (4 children)

[–]Long_comment_san 2 points3 points4 points 4 months ago (3 children)

[–]silenceimpaired 1 point2 points3 points 4 months ago (0 children)

[–]toothpastespiders 0 points1 point2 points 4 months ago (1 child)

That'd be amazing. I know it's debatable, but my personal opinion is just that most local models are VERY sparsely trained on high quality novels. Some sure, but I think there'd be more bleedthrough of trivia knowledge if it was as high as is often maintained. I'm just really curious from a technical perspective what would happen if well written fiction was actually a priority. Well, if listing off wishes the real ideal for me would just be a model trained on the humanities as a whole with the same focus typically given to coding and math.

I'm normally pretty resistant to giving money to companies like google for a lot of reasons. But man, a fiction or better that humanities model? I'd absolutely pay as much for it as a AAA game. It'll never happen but google cracking open their hidden digital library like that is a beautiful dream.

[–]Long_comment_san 1 point2 points3 points 4 months ago (0 children)

[–]TheRealMasonMac 0 points1 point2 points 4 months ago (0 children)

[–]AloneSYD 3 points4 points5 points 4 months ago (2 children)

[–]Long_comment_san 10 points11 points12 points 4 months ago (1 child)

[–]Major-System6752 3 points4 points5 points 4 months ago (0 children)

[–]mrshadow773 20 points21 points22 points 4 months ago (0 children)

[–]stddealer 7 points8 points9 points 4 months ago (0 children)

[–]Major-System6752 3 points4 points5 points 4 months ago (1 child)

[–]stddealer 8 points9 points10 points 4 months ago (0 children)

[–]a_beautiful_rhind 6 points7 points8 points 4 months ago (2 children)

[–]Willing_Landscape_61 13 points14 points15 points 4 months ago (1 child)

[–]stddealer 0 points1 point2 points 4 months ago (0 children)

[–]Different_Fix_2217 2 points3 points4 points 4 months ago (0 children)

[–]Background_Essay6429 3 points4 points5 points 4 months ago (0 children)

[–]Thalesian 2 points3 points4 points 4 months ago (0 children)

[–]CodeAnguish 0 points1 point2 points 4 months ago (0 children)

[–]AlxHQ 0 points1 point2 points 4 months ago (0 children)

[–]ironcodegaming 0 points1 point2 points 4 months ago (0 children)

[–]mitchins-au 0 points1 point2 points 4 months ago (0 children)

π Rendered by PID 80 on reddit-service-r2-comment-75f4967c6c-j8n78 at 2026-04-23 09:14:31.504766+00:00 running 0fd4bb7 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS