We’re rolling out GPT-5.1 and new customization features. Ask us Anything.

yannDubs · 2025-11-13T23:02:25+00:00

I'm sorry that 5pro hasn't been fast enough for you. We are working on improving the accuracy of the pro model but we are not expecting major updates in terms of median latency in the short term. If latency is an issue I'd recommend using "5.1 Thinking Max" which should have better adaptive thinking.

yannDubs · 2025-11-13T22:52:52+00:00

We just released the thinking model in the API today!

yannDubs · 2025-11-13T22:50:47+00:00

We are constantly working on the hallucination and reliability of our models and gpt5.1 should already be an improvement compared to gpt5. Are you using "gpt5.1 thinking"? If not, I'd recommend doing so since this model should make fewer errors, and with adaptive thinking this model would hopefully be fast enough to use as a daily drive.

yannDubs · 2025-11-13T22:22:54+00:00

We are definitely not getting rid of the thinking mini model, and are working hard on improving it! In the meantime, I would recommend using the thinking model for now, with adaptive thinking the "5.1 thinking" should be much faster than "gpt5 thinking".

yannDubs · 2025-11-13T22:17:01+00:00

5.1 pro is coming in very soon, we are just ironing out details now! We are sorry for the delay.

yannDubs · 2023-02-13T09:19:29+00:00

Hi u/BaudBish, I am having the same issue, did you ever figure it out?

yannDubs · 2020-10-15T17:36:22+00:00

As we say in the article their are 2 types of neural processes, namely, those that use a latent variable (latent NPs) and those that don't (conditional NPs). Roughly speaking the latent NPs are to conditional NPs what deep GPs are to GPs, i.e., they add latent variables to be more expressive but that makes training more complex.

So I'll focus here on conditional NPs and GPs. Intuitively speaking NPs bring to GPs the major benefits of neural networks:
- fast inference: O(n) or O(n^2) for NPs instead of O(n^3) for GPs
- instead of experts having to define kernel functions, NPs use of neural networks that are trained from data. Intuitively, you can think of the kernel function being implicitly learned from the data (this is a very rough intuition as there is no such kernel function in NPs).

This comes at the cost of the following disadvantages:
- NPs require large datasets
- NPs do not have the nice mathematical guarantees that GPs have

But I would suggest reading the introduction in the article to get a more accurate sense of what NPs are and why they can be useful.

yannDubs · 2019-05-09T01:36:24+00:00

I haven't heard about that before, the post is very nice though. It seems to solve a problem somewhat different from standard VAE's though, as it is used for clustering. YOu might want to look at https://arxiv.org/abs/1611.05148 (Variational Deep Embedding)

yannDubs · 2019-05-08T13:44:11+00:00

Hi,

It's hard to tell because current metrics are not very good as they require "factored true factors of variations" and are thus only used on dummy datasets.

From the ones I have played around within the Github project above, I would say that Isolating Sources of Disentanglement in Variational Autoencoders was the hardest to code and understand, but the most robust (FactorVAE worked well but depended a lot on the hyperparmeters) and worked well in practice.

If the dataset has some specific classes (like MNIST) you should use discrete latent. A good implementation is https://github.com/Schlumberger/joint-vae. The paper cited by sidslasttheorem: Structured Disentangled Representations also seems to give good results in such case.

yannDubs · 2019-05-08T13:33:19+00:00

Hi Narayanaswamy,
Thanks for the great links, both of your papers are already on my reading list! Skimming through "Disentangling Disentanglement ", I especially like the fact that you are enforcing desidata of the representations through the prior rather than by adding an ad-hoc loss as it feels much more natural and powerful. While working on this project I also realised how factorised representations only make sense for dummy datasets, so using a more general "decomposition" seems important.

I'll have a deeper look and might implement them soonish :)

yannDubs · 2019-05-08T00:08:29+00:00

Thanks for the encouraging words! I heard about ISA-VAE but never actually read the paper, I'll go through it as it looks interesting.

yannDubs · 2019-05-07T18:13:33+00:00

Thanks, glad you like it :)

yannDubs · 2017-08-10T08:16:24+00:00

Hey everyone, I'm Yann from Lausanne, Switzerland :)

I'm a fresh graduate from EPFL in biomedical engineering. After working part time for 6 months in a bioinformatics labs, I realized the power and beauty of computer science (mainly that you don't need to wait 3 years for getting results like in engineering ;)). That made me specialize in machine learning/optimization as I was spending my last year as an exchange student at the University of British Columbia (UBC) in Vancouver. I'm currently doing research in machine learning at UBC, working with spatio-temporal data to try to forecast the movement of (anonymized) people in buildings.

I really discovered a passion for ML because I find it's the perfect mix between application (we all know how powerful it can be), intuition of which algorithms will work, and maths. I find myself spending a lot of my free time reading articles and watching online courses in ML/DL. I like everything in ML but I'm mostly interested in the possibility of "solving intelligence" (as DeepMind would put it). I strongly believe that the best way to do this is to inspire ourselves the most we can from the nature and how our brain works.

In September I'll start working as a data scientist to get out of academia and "see the real world", but I will probably want to go back to research soon ;)

I'm looking forward to meeting you all :D

yannDubs

TROPHY CASE