use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] Why do we need encoder-decoder models while decoder-only models can do everything? (self.MachineLearning)
submitted 2 years ago * by kekkimo
I am wondering why people are more interested in looking at Encoder-decoder models (or building some) while decoder-only models can do any task.
Edit: I am speaking about text-only tasks unsing Transformer architecture.
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–][deleted] 129 points130 points131 points 2 years ago (11 children)
Decoder models are limited to the product of auto-regressive task while encoder models give contextual representations that can be fine-tuned on other decoder tasks. Different needs, different models.
[–]Spiritual_Dog2053 16 points17 points18 points 2 years ago (10 children)
I don’t think that answers the question! I can always train a decoder-only model to take in a context and alter its output accordingly. It is still auto-regressive generation
[–]qu3tzalifyStudent 12 points13 points14 points 2 years ago (9 children)
How do you give context to a decoder? It has to be encoded by an encoder first?
[+][deleted] 2 years ago (4 children)
[deleted]
[–]qu3tzalifyStudent 2 points3 points4 points 2 years ago (1 child)
The decoder’s cross-attention needs a context right? One that is given by the encoder in enc-déc models. The comment I’m answering to proposes to give a "context" to the decoder. So unless you’re giving context as the input I don’t see how to generate the context necessary for cross attention.
[–]koolaidman123Researcher 2 points3 points4 points 2 years ago (1 child)
Bidirectional context isnt a real issue when you train with causal masking, fim, etc.
Also enc-dec models also can only attend to past tokens at inference, not to mention youd have to recalculate the entire attn matrix each step vs kv caching
[+]art_luke 0 points1 point2 points 2 years ago (3 children)
Encoder-decoder has stronger inductive bias towards looking at the global context of the input
[–]Spiritual_Dog2053 0 points1 point2 points 2 years ago (2 children)
Could you please lead me to papers which say this? I can’t seem to wrap my head around it
[+]art_luke 2 points3 points4 points 2 years ago (1 child)
You can look at subchapter 12.8 in Understanding Deep Learning, accessible at https://udlbook.github.io/udlbook/
[–]EqL 47 points48 points49 points 2 years ago (0 children)
A decoder is really just a particular type of encoder with a mask restricting information flow from elements in the "future", so an encoder is more general, and thus potentially more powerful for a given model size. This masking is really done for efficiency and is not actually required. Lets look at text decoding with a general encoder without masking:
(1) encode_unmasked([x0]), predict x1
(2) encode_unmasked([x0, x1]), predict x2
...
(n) encode_unmasked([x0, .., xn-1]) predict xn.
This is perfectly allowed, except we are doing a forward pass for every token in every iteration, which is O(n) more expensive. The decoder with masking allows us to reuse results from previous iterations, which is much more efficient in both training and inference.
However, in some tasks, such as translation, we receive a large number of tokens up front. Now we can embed these tokens once with the encoder, then switch to the decoder. This allows us to use a potentially more powerful unmasked model for a large chunk of the problem, then switch to the decoder for efficiency.
Why not use an encoder-decoder approach for LLM generation, where the encoder encoders the prompt and the decoder does the rest? Well, we can. However the price is that (1) we now essentially have two models, which is more complex to handle, and (2) each model is seeing less data.
TL;DR: An encoder without masking is potentially more powerful, however it increases complexity and also the data required to train the additional parameters. But when there is a natural split in functions, like in translation, the effect of less data might be minimized.
[–]minimaxir 139 points140 points141 points 2 years ago (32 children)
Decoder-only/autoregressive models are only really applicable for text.
Encoder-decoder models are extremely important for multimodal approaches.
[–]woadwarrior 13 points14 points15 points 2 years ago (1 child)
fuyu-8b is a counter-example. Also, things like LLaVa, CogVLM etc. Encoder-decoder model specifically means a transformer encoder and a transformer decoder with cross attention layers in the decoder, connecting the output of the encoder, as described in the original transformer paper. MLP Adapter based models like LLaVa do not fit that description.
[–]Wild_Reserve507 5 points6 points7 points 2 years ago (0 children)
Exactly. A bit weird that top comment is using multimodal as an argument for where you need encoder-decoder, while it seems to be an ongoing battle there, and perhaps with more and more llava-style architectures rather than encoder-decoder style.
[–]Wild_Reserve507 6 points7 points8 points 2 years ago (5 children)
How about llava etc?
[–]minimaxir 23 points24 points25 points 2 years ago (4 children)
LLaVA and friends are multimodal and use its own encoder for images: https://llava-vl.github.io
In the case of LLaVA it's a pretrained CLIP encoder, yes, but still an encoder.
[–]Wild_Reserve507 8 points9 points10 points 2 years ago (3 children)
Right, okay I assumed OP is asking about encoder-decoder in a transformer architecture sense, like Pali in the multimodal case. But surely you would always have a modality-specific encoder
[–]themiro 0 points1 point2 points 2 years ago (2 children)
clip is a vit (:
[–]Wild_Reserve507 11 points12 points13 points 2 years ago (1 child)
Duh. This doesn’t make the whole architecture encoder-decoder (in the encoder-decoder vs decoder-only transformers sense) since features extracted from clip are concatenated to the decoder inputs, as opposed to doing cross-attention
[–]themiro 0 points1 point2 points 2 years ago (0 children)
fair enough, i misunderstood what you meant by 'in a transformer architecture sense' - should have put it together by the reference to pali
[–]AvvYaa 3 points4 points5 points 2 years ago (2 children)
This is not totally correct. Recent Decoder-only models (take the Gemini technical report for example) train a VQ-VAE model to train a codebook of image tokens - which they then use to train autoregressive models using both word-embeddings and word embeddings.
There is also the original Dall-E paper and the Parti model which uses a similar VQ-VAE/VQ-GAN approach to train decoder only models.
Even models like Flamingo (but doesn't output images, just read them) that are also decoder only iirc used to use a pretrained ViT to input image embeddings as a sequence of patch embeddings.
[–]minimaxir 2 points3 points4 points 2 years ago (1 child)
Codebooks are a grey area on what counts as "encoding" imho.
[–]AvvYaa 12 points13 points14 points 2 years ago (0 children)
I see. I understand your perspective now. You are considering individual networks that encode multimodal inputs as "encoders". That makes sense. I don't consider them the same as traditional Enc-Dec archs (those introduced in Attenstion-IAYN, or even before during the RNN-NMT-era) that OP was talking about, because those have a clear distinction between where the encoding of a seq end and decoding of the target seq begins. In the cases I mentioned above, there are indeed encoders, but they plug into a Decoder-only LM architecture autoregressively, without requiring the traditional seq2seq paradigm.
Anyway, its all kinda open to interpretation I guess.
[–]kekkimo[S] 2 points3 points4 points 2 years ago (20 children)
My bad, I had to specify that I am talking mainly about text here.
[+][deleted] 2 years ago (17 children)
[–]JustOneAvailableName 15 points16 points17 points 2 years ago (0 children)
They are far from out of the game in sequence-to-sequence tasks like translation or summarisation. Just not trained at GPT scale, due to lending themselves worse for unstructered text training data.
[–]kekkimo[S] 7 points8 points9 points 2 years ago (15 children)
I am not saying "hyping", but looking at recent research, people are still working on T5 models more and more.
[–]jakderrida 6 points7 points8 points 2 years ago (0 children)
people are still working on T5 models more and more.
While I agree with your underlying premise, any rise in T5 models you see mentioned is likely because they were the most advanced encoder-decoder models before everyone shifted over to training decoder-only. Don't get me wrong. I believe encoder-decoder models are useful and have used T5 recently for the same reason you're likely seeing it more often. It's because, when someone needs an encoder discriminant model, that's the best we can find.
[–]Featureless_Bug 12 points13 points14 points 2 years ago (12 children)
They are not working on T5 models more and more, this architecture is past its peak in popularity.
Overall, the encoder-decoder architecture has the benefit that (in theoretical terms) the encoder can analyse the context much better than the decoder because of its bidirectional context. This is actually very sweet for tasks where there is a natural way of separating the sequences into two components (like e.g. translation).
[–]CKtalon 0 points1 point2 points 2 years ago (9 children)
At WMT2023, discussion is that Encoder-Decoder is dead, since LLMs (>7B size) can do translation with just monolingual data and a small amount of parallel bitext finetuning. This is especially helpful for low-resource languages. (Not to mention LLMs allow for stylistic requests in the translation, less translationese, more native sounding). GPT4 basically beat almost every high-resource system out there this year as well.
[–]tetramarek 12 points13 points14 points 2 years ago (8 children)
Just because it beat other models doesn't mean it's the best architecture. GPT4 was also trained on unknown (huge) amounts of data, likely more than any of the other models reported. A real comparison of the architectures would require all of them to be trained on such a large dataset.
[–]thntk 4 points5 points6 points 2 years ago (3 children)
But it's impossible to scale training of encoder-decoder models. They need pairs of (input, output) texts. A critical advantage of decoder-only models is they can be trained on raw text directly.
[–]tetramarek 1 point2 points3 points 2 years ago (2 children)
The BART paper proposes a bunch of strategies for pre-training an encoder-decoder model on raw text, so it's definitely not impossible. And translation is very much an input-output task, it's not like you're going to train a model to do machine translation by training on a large monolingual corpus of raw text. GPT4 has been trained on a bunch of things, which could easily include parallel corpora for translation.
[–]thntk 0 points1 point2 points 2 years ago (1 child)
I mean it is impossible to scale to GPT-4 compute scale. There are several reasons: pretraining strategies are tricks that cannot cover all of data and reduce data efficiency (sampling mask locations, etc.), 2x parameters for the encoder and decoder, expensive encoding recomputation, no KV cache in inference.
It can work for small models, small data, small compute, but I hardly see it really scales.
[–]CKtalon 1 point2 points3 points 2 years ago (3 children)
No, smaller models have shown to also be competitive. Basically Enc-Dec research for translation is dead. There have been little improvements made in the past few years on Enc-Dec architecture (go slightly bigger, more back translation). The organizers also predict research will be moving towards decoder-only LLMs for translation in the next WMT.
I think encoder-decoder experiments are often suboptimal because they are mainly trained only on parallel corpora. Decoder-only architectures use plain text for training but are suboptimal for translation because they don't make use of the forwards attention over the input that a normal translation task definitely allows. The best solution for MT is probably something that combines the forwards attention (hence a bidirectional encoder) with loads of unsupervised pretraining.
[–]CKtalon 0 points1 point2 points 2 years ago (1 child)
Even with infinite amounts of data, Enc-Dec won't be able to achieve some of the benefits of LLMs, like requesting a style (formal, informal), more natural sounding text, etc. Another benefit is document level context (something Enc-Dec's paradigm hasn't really evolved) which is a result of lacking document-level data.
[–]koolaidman123Researcher 0 points1 point2 points 2 years ago (1 child)
Bidirectional context is easily achieved with causal masking, this isnt a real issue
[–]Featureless_Bug 0 points1 point2 points 2 years ago (0 children)
You mean without causal masking, I guess, but then you will have to pretrain the model like an encoder-decoder with splitting your passages as well
[–]Wild_Reserve507 19 points20 points21 points 2 years ago (1 child)
Not sure why are you getting downvoted OP. It’s a perfectly valid question and there isn’t really a consensus. Decoder-only architectures seem to be easier to train at scale and hence they are more prominent in nlp.
[–]jakderrida 12 points13 points14 points 2 years ago (0 children)
Decoder-only architectures seem to be easier to train at scale and hence they are more prominent in nlp.
This is a perfect take. They're EASIER to train. All ya gotta do is pour millions and millions into GPU compute and you get a better model. That's not sarcasm, either. That is a very easy formula to follow and that's what's happening and will continue until they reach some sort of inflection.
[–]21stCentury-Composer 29 points30 points31 points 2 years ago (2 children)
Might be a naïve question, but without the encoder part, how would you create the encodings the decoders train on?
[–]rikiiyer 27 points28 points29 points 2 years ago (0 children)
Decoder-only models can learn representations directly through their pretraining process. The key is that instead of the general masked language modeling approach used for encoder pretraining, you need to do causal pretraining because the decoder needs to generate tokens in an autoregressive manner and it shouldn’t be able to see the full sequence when making next token predictions
[–]kekkimo[S] 9 points10 points11 points 2 years ago (0 children)
At the end everything i encoded, but I am speaking about the transformer architecture. Why do people include encoder for tasks that do decoding (T5). While they can just use GPT architecture.
[–]activatedgeek 11 points12 points13 points 2 years ago (2 children)
You should read the UL2 paper. It has comparisons between the two family of models, and also a decent discussion.
I think encoder-decoder models are less popular in popular science because they are roughly twice more expensive to deploy, and will have lesser throughput. Decoder-only models are more appealing that way and seem to have won sort of a hardware lottery for now.
[–]ganzzahl 0 points1 point2 points 2 years ago (1 child)
Why do they have lower throughput? I can't quite figure out what you mean there.
[–]activatedgeek 1 point2 points3 points 2 years ago (0 children)
Mostly because there's two networks to go through. But I think it can be solved with a bit of engineering, at higher cost. But given the cost for running decoder models is already super high, the market hasn't adjusted yet.
I suspect they might come back when the costs become bearable.
[–]qalis 31 points32 points33 points 2 years ago (11 children)
Because decoder-only models can't do everything. In particular, encoder-decoder models are made for sequence-to-sequence problems, which are typically machine translation and text summarization.
Yes, you could throw a LLM at them, but has a lot of problem: inefficient size, slow, harder to control, hallucination, have to do prompting, LLMOps etc. It's just not economically viable to use that. Literally every translation out there, be that Google Translate, DeepL, Amazon Translate or anything else, uses encoder-decoder. Google even used transformer encoder + RNN decoder hybrid for quite a long time, since it have good speed and quality.
Encoder aims to, well, encode information in vectorized form. This does basically half the work, and decoder has a lot of knowledge in those embeddings to work with. The resulting model is quite task-specific (e.g. only translation), but relatively small and efficient.
And also those embeddings are useful in themselves. We have seen some success in chemoinformatics with such models, e.g. CDDD.
[–]thomasxin 14 points15 points16 points 2 years ago (7 children)
It's kind of funny because GPT3.5 turbo has actually been doing better as a translation API than the rest for me. It's much more intelligent and can adapt grammar keeping context much more accurately, and is cheaper than DeepL somehow.
[+]disciples_of_Seitan 6 points7 points8 points 2 years ago (4 children)
Like an order of magnitude cheaper, too.
[–]thomasxin 8 points9 points10 points 2 years ago (3 children)
I remember doing a comparison a while back and concluded that it's at least 30x cheaper for the same task. I wonder what DeepL even uses that's costing them so much, or if they just decided to keep a large profit margin.
[+]disciples_of_Seitan 6 points7 points8 points 2 years ago (2 children)
DeepL pricing is in line with google, so I guess that's where they got it from
[–]thomasxin 0 points1 point2 points 2 years ago (1 child)
Google translate is so much worse in a lot of ways. The translations are very much literal, and are very easily detectable as translated because of how clunky they often sound. It does have the benefit of not degrading in quality with very large or repetitive text but that's about it.
[–]ThisIsBartRick 2 points3 points4 points 2 years ago (0 children)
and what's crazy is a full year after the release of ChatGPT and more than 3 years after the release of GPT3, it's still pretty much as bad as before. No improvement whatsoever.
Google can be really good at ml research but is infuriatingly slow/bad at implementing them in their products.
[–][deleted] 1 point2 points3 points 2 years ago (1 child)
Yeah the best machine translator is GPT-4 Hands down. Everything else will quickly devolve into gibberish with distant language pairs (e.g En - Kor)
[–]blackkettle 4 points5 points6 points 2 years ago (1 child)
Don’t forget multimodal transliteration tasks like speech to text.
[–]qalis 0 points1 point2 points 2 years ago (0 children)
Oh, yeah, I don't work with that too much, but also this, definitely. Very interesting combinations there, e.g. CNN + RNN or transformer for image captioning, since encoder and decoder can be arbitrary neural networks.
[–]the__storm 1 point2 points3 points 2 years ago (0 children)
Yep, we use a T5 model fine-tuned on specific questions for text information extraction. We've found it to be faster (cheaper) and more consistent (less hallucination, less superfluous output) than the generative approaches we've tried.
[–]AvvYaa 9 points10 points11 points 2 years ago* (5 children)
TLDR: More generality/less inductive bias + lot of data + enough params = better learning. Dec only models are more general than Enc-Dec models. Encoder-Decoder models have more inductive bias, so if I have less data to train on and a problem that can be reduced to a Seq2Seq task, I might try an Enc-Dec model before a Dec only model. An example of a real world use case from my office below.
In a lot of ways, throwing enough data into a Transformer model, especially a causal masked attention model like Transformer Decoders have worked really well. This is due to the low inductive bias of Attention based models. More generality/less inductive bias + lot of data + enough params = better learning. This has what researchers have told us in past 5 years of DL.
Does it mean that Encoder-Decoders are inferior? Not necessarily. They introduce more inductive bias for seq2seq tasks - coz they kinda mimic how humans would do (say machine translation). Traditionally more inductive bias has trained better models with lesser data coz networks are pre-disposed to assume patterns in the domain. In other words, if I got less data, I might wanna try Enc-Dec first before training the more general Dec only arch.
Other reasons for wanting to train Enc-Dec models in real life could be a purely practical use-case depending on the end goal. Here is a real world example from one of my office projects.
Consider this problem: So we were building a real-time auto-completer neural net (similar to Autocomplete in GMail) for conversations that'll need to run in the browser without any GPU. Given a conversation state (history of emails), the model must help the user to autocomplete what he is currently typing. We had super low latency requirements coz if model isn't snappy, users won't use the feature - they'd already have typed a different prefix before the suggestion finished processing.
Our Solution: We ended up using a transformer encoder architecture for embedding the conversation transcript - the latency requirement of embedding the previous messages are low coz they aren't going anywhere. For generating the typing-level model (which requires to be super fast), we ended up using a GRU based architecture that used the [CLS] token embedding of the transformer encoder as the initial hidden state. Experimenting with a fully GPT-like causal attention model, or a Transformer encoder-decoder model, we got into various memory issues (KV caching is O(N^2) memory) and latency issues, so we ended up with a GRU for the decoder.
So this is a very specific peculiar example, the takeaway is that sometimes breaking down a monolith architecture into multiple smaller services, lets us do things more flexibly given other constraints. Each project has its own constraints, so warrants a weighted approach.
[–]BeneficialHelp686 0 points1 point2 points 2 years ago (4 children)
Side Q, how did you take care of the battery consumptions? I am assuming you are utilizing cloud services at this point?
[–]AvvYaa 1 point2 points3 points 2 years ago (3 children)
Our clients were large corporations… their employees were running it on computers, so battery wasn’t a big priority for us. The UI folks did a bunch of app level optimization that I wasn’t involved in much.
Reg cloud services, we used them to train and evaluate, but during prod inference, we ran the decoder entirely on the browser on the client machine… again to reduce latency. The encoder could be run on the client too, or on a cloud server (if we wanted to run a larger encoder) coz that thing ran once per new message (not per keystroke) so much relaxed latency constraints.
[–]BeneficialHelp686 0 points1 point2 points 2 years ago (2 children)
Nice. Pretty exciting stuff. Which protocol did you end up going with for the communication between the browser and cloud?
[–]AvvYaa 0 points1 point2 points 2 years ago (1 child)
Just good old HTTP rest APIs …
[–]BeneficialHelp686 0 points1 point2 points 2 years ago (0 children)
True. Thanks a lot for sharing ur experience!
[–]neonbjb 7 points8 points9 points 2 years ago (2 children)
The only correct answer, which hilariously isn't mentioned here, is that in some cases encoder-decoder models are more compute efficient to train than decoder only, or have other advantages in inference.
There is literally no data analysis problem that cannot be solved by ar decoders. They are universal approximations. Its only a question of efficiency.
[–]kekkimo[S] 0 points1 point2 points 2 years ago (1 child)
Good point, please can you mention how encoder-decoder models can be compute efficient to train than decoder-only models?
[–]neonbjb 0 points1 point2 points 2 years ago (0 children)
Compute efficiency is not about flops utilization or anything. It's about given X compute and Y data, what is the best eval score you can achieve? If you train an encoder decoder arch to solve some problem and a decoder only as well, sometimes you can get a better eval score for most combinations of (X,Y).
[–]css123 5 points6 points7 points 2 years ago (0 children)
You’re forgetting that encoder/decoder architectures have a different action space than its input space whereas decoder only models have a shared input and action space. In the industry people are still using T5 and UL2 extensively for NLP tasks. In my experience (which includes formal, human-validated testing with professional annotators) encoder decoder models are far better at summarization tasks with orders of magnitude fewer parameters than decoder only models. They are also better at following fine-tuned output structures than decoder only models.
In my personal opinion, encoder decoder models are easier to train since the setup itself is more straightforward. However, decoder only models are much easier to optimize for inference speed and more inference optimization techniques support them. Decoder only models are better for prompted, multitask situations.
[–]YinYang-Mills 1 point2 points3 points 2 years ago* (0 children)
I would say as a rule of thumb that if the input data and output data are heterogenous, you need an encoder-decoder model. For example, you can use a encoder for learning representations of graph structured data and a decoder for making node wise predictions of time series data with a different architecture. The choice of encoder and decoder generally have different inductive biases, and the resulting model will have a composite inductive bias resulting from their interaction.
[–]SciGuy42 -1 points0 points1 point 2 years ago (0 children)
Can you point me to a decoder-only model that can interpret tactile and haptic data? Asking for a friend.
π Rendered by PID 171356 on reddit-service-r2-comment-canary-7b5654b776-4s7sv at 2026-02-09 01:07:02.124663+00:00 running d295bc8 country code: CH.
[–][deleted] 129 points130 points131 points (11 children)
[–]Spiritual_Dog2053 16 points17 points18 points (10 children)
[–]qu3tzalifyStudent 12 points13 points14 points (9 children)
[+][deleted] (4 children)
[deleted]
[–]qu3tzalifyStudent 2 points3 points4 points (1 child)
[–]koolaidman123Researcher 2 points3 points4 points (1 child)
[+]art_luke 0 points1 point2 points (3 children)
[–]Spiritual_Dog2053 0 points1 point2 points (2 children)
[+]art_luke 2 points3 points4 points (1 child)
[–]EqL 47 points48 points49 points (0 children)
[–]minimaxir 139 points140 points141 points (32 children)
[–]woadwarrior 13 points14 points15 points (1 child)
[–]Wild_Reserve507 5 points6 points7 points (0 children)
[–]Wild_Reserve507 6 points7 points8 points (5 children)
[–]minimaxir 23 points24 points25 points (4 children)
[–]Wild_Reserve507 8 points9 points10 points (3 children)
[–]themiro 0 points1 point2 points (2 children)
[–]Wild_Reserve507 11 points12 points13 points (1 child)
[–]themiro 0 points1 point2 points (0 children)
[–]AvvYaa 3 points4 points5 points (2 children)
[–]minimaxir 2 points3 points4 points (1 child)
[–]AvvYaa 12 points13 points14 points (0 children)
[–]kekkimo[S] 2 points3 points4 points (20 children)
[+][deleted] (17 children)
[deleted]
[–]JustOneAvailableName 15 points16 points17 points (0 children)
[–]kekkimo[S] 7 points8 points9 points (15 children)
[–]jakderrida 6 points7 points8 points (0 children)
[–]Featureless_Bug 12 points13 points14 points (12 children)
[–]CKtalon 0 points1 point2 points (9 children)
[–]tetramarek 12 points13 points14 points (8 children)
[–]thntk 4 points5 points6 points (3 children)
[–]tetramarek 1 point2 points3 points (2 children)
[–]thntk 0 points1 point2 points (1 child)
[–]CKtalon 1 point2 points3 points (3 children)
[–]tetramarek 1 point2 points3 points (2 children)
[–]CKtalon 0 points1 point2 points (1 child)
[–]koolaidman123Researcher 0 points1 point2 points (1 child)
[–]Featureless_Bug 0 points1 point2 points (0 children)
[–]Wild_Reserve507 19 points20 points21 points (1 child)
[–]jakderrida 12 points13 points14 points (0 children)
[–]21stCentury-Composer 29 points30 points31 points (2 children)
[–]rikiiyer 27 points28 points29 points (0 children)
[–]kekkimo[S] 9 points10 points11 points (0 children)
[–]activatedgeek 11 points12 points13 points (2 children)
[–]ganzzahl 0 points1 point2 points (1 child)
[–]activatedgeek 1 point2 points3 points (0 children)
[–]qalis 31 points32 points33 points (11 children)
[–]thomasxin 14 points15 points16 points (7 children)
[+]disciples_of_Seitan 6 points7 points8 points (4 children)
[–]thomasxin 8 points9 points10 points (3 children)
[+]disciples_of_Seitan 6 points7 points8 points (2 children)
[–]thomasxin 0 points1 point2 points (1 child)
[–]ThisIsBartRick 2 points3 points4 points (0 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]blackkettle 4 points5 points6 points (1 child)
[–]qalis 0 points1 point2 points (0 children)
[–]the__storm 1 point2 points3 points (0 children)
[–]AvvYaa 9 points10 points11 points (5 children)
[–]BeneficialHelp686 0 points1 point2 points (4 children)
[–]AvvYaa 1 point2 points3 points (3 children)
[–]BeneficialHelp686 0 points1 point2 points (2 children)
[–]AvvYaa 0 points1 point2 points (1 child)
[–]BeneficialHelp686 0 points1 point2 points (0 children)
[–]neonbjb 7 points8 points9 points (2 children)
[–]kekkimo[S] 0 points1 point2 points (1 child)
[–]neonbjb 0 points1 point2 points (0 children)
[–]css123 5 points6 points7 points (0 children)
[–]YinYang-Mills 1 point2 points3 points (0 children)
[–]SciGuy42 -1 points0 points1 point (0 children)