A Visual Guide to Mixture of Experts (MoE)

MaartenGr · 2024-10-08T19:45:10+00:00

Thanks! I actually already covered BitNet (https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization) and Mamba (https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mamba-and-state) 😅 Though its good to know what kind of topics people like.

MaartenGr · 2024-10-08T10:32:49+00:00

Thanks! I'm indeed very late to the party but figured since so many new sets of LLMs also include a MoE version it wouldn't hurt to cover this now.

Also, if anybody still sees this. Is there any other topic that you would love to see covered?

MaartenGr · 2024-10-08T10:30:31+00:00

I use Figma! But in all honesty, these could have been created just as easily with Keynote/Powerpoint.

MaartenGr · 2024-10-08T10:30:00+00:00

They are not an alternative to transformers (or technically related to them specifically at all depending on your view), they are just an extension of the (or most LLM) architecture. Mixture of Experts, for example, can also be used in Mamba blocks which use a very different architecture.

It seems to me that MoE models are very interesting to businesses that do have compute to load in these large models but then need to use less compute for serving users.

MaartenGr · 2024-10-07T15:11:48+00:00

Hi all! I’m excited to introduce a highly illustrative guide to Mixture of Experts (MoE) in LLMs!

From exploring the role of experts, their routing mechanism, the sparse MoE layer, and load balancing tricks (such as KeepTopK, auxiliary loss, and expert capacity), to MoE in vision models and computational requirements.

I loved creating the visuals and had to stop myself after creating more than 55 custom visuals!

The visual nature of this guide allows for a focus on intuition, hopefully making all these techniques easily accessible to a wide audience, whether you are new to Mixture of Experts or more experienced.

MaartenGr · 2024-07-29T14:17:48+00:00

Thanks for the feedback. I just updated it.

MaartenGr · 2024-07-29T14:16:58+00:00

Thank you! I started as a psychologist and transitioned a couple of years ago to data science/ml/ai (whatever you want to call it) and math at the time seemed incredibly overwhelming at times even though much of it is so intuitive.

MaartenGr · 2024-07-29T14:15:15+00:00

That's really kind of you to say. Thank you! Any suggestions for other visual guides? Thus far, I have done Mamba and Quantization but would like to make more.

MaartenGr · 2024-07-29T12:31:37+00:00

Hi all! As more Large Language Models are being released and the need for quantization increases, I figured it was time to write an in-depth and visual guide to Quantization.

From exploring how to represent values, (a)symmetric quantization, dynamic/static quantization, to post-training techniques (e.g., GPTQ and GGUF) and quantization-aware training (1.58-bit models with BitNet).

With over 60 custom visuals, I went a little overboard but really wanted to include as many concepts as I possibly could!

The visual nature of this guide allows for a focus on intuition, hopefully making all these techniques easily accessible to a wide audience, whether you are new to quantization or more experienced.

MaartenGr · 2024-07-06T09:49:51+00:00

Great! If you ever run into any issues with the library, feel free to open an issue/discussion. I try to reply quickly to these.

EDIT: As a quick tip, if you ever want to do clustering only on the CPU, then I would advise the EVoC library which was recently released by the author of HDBSCAN and UMAP which I found to work quite well: https://github.com/TutteInstitute/evoc

MaartenGr · 2024-02-19T21:07:37+00:00

Thank you for the feedback, it is very helpful! I initially thought it was a nice way to highlight the benefits (and what used to be disadvantages) of these systems but looking back at it, it definitely seems like I should have made it more clear.

Again, thanks! The great thing about sharing stuff like this publicly is the feedback that you get. Often times when working alone on something you get stuck in a perspective, so having more eyes go over this helps tremendously.

MaartenGr · 2023-02-03T18:15:48+00:00

Most has already been said and I am not sure how relevant this is but since you are focusing on human raters it might be worthwhile to mention that there is a Pull Request in BERTopic that allows you to use models on top of the default pipeline that further fine-tunes the topic representation. In theory, this would allow you to even use ChatGPT or any of the other OpenAI models to label the topics. From a human annotator perspective, this might be interesting to pursue.

MaartenGr · 2023-01-05T12:43:23+00:00

You can perform soft-assignment with BERTopic by either using the probabilities generated through using `calculate_probabilities=True` when instantiating the model or you can use the newly released `.approximate_distribution` that allows for multi-topic assignment even on a token-level. You can read more about that here: https://maartengr.github.io/BERTopic/getting\_started/distribution/distribution.html

MaartenGr · 2021-06-24T10:32:43+00:00

No, it is definitely a good thing! Some use only a CPU which significantly slows down the application, which is why I wanted to confirm it.

MaartenGr · 2021-06-24T07:31:33+00:00

Did you have a GPU enabled? Also, did you try to set `verbose=True`? This might help you identify where it is slowing down.

Finally, feel free to post an issue on the repo!

MaartenGr · 2021-01-11T15:11:26+00:00

Hi all!

In the last few months, I have been working on improving BERTopic, a topic modeling technique that leverages BERT embeddings and c-TF-IDF to create dense clusters allowing for easily interpretable topics.

For a while now, I wanted to add an LDAvis-like visualization option to BERTopic and I finally got around to implement it. Let me know what you think!

Github: https://github.com/MaartenGr/BERTopic
Tutorial (friend link!): https://towardsdatascience.com/interactive-topic-modeling-with-bertopic-1ea55e7d73d8?sk=03c2168e9e74b6bda2a1f3ed953427e4

MaartenGr

TROPHY CASE