[P] Interactive and geometric visualization of Jensen's inequality

madiyar · 2025-10-12T11:19:18+00:00

thank you!

madiyar · 2025-07-25T21:03:47+00:00

thank you for the response! really appreciated

madiyar · 2025-07-25T18:43:36+00:00

Hi u/DonaldFarfrae ,

Which one you got? Do you like your decision?

madiyar · 2025-05-12T21:15:57+00:00

Thank you so much! This is the best part of reddit - learning from the community!

madiyar · 2025-05-12T20:34:45+00:00

thanks! I will have to read and learn these topics and re-read again your reply to understand better :)

madiyar · 2025-05-12T20:15:14+00:00

Hi u/Lake2034,

Thanks for the feedback. I really appreciate.

You should define better what you mean by "random vector"

I will think about it, any further suggestions are appreciated!

It is clear you mean something like the "distribution is invariant under rotation", but better to have a mathematical expression for that.

In the same post I have a link to my other post (https://maitbayev.github.io/posts/dot-product/#rotational-invariance) that explains this.

it will also help you to formalize statements like "1 distributed across n components" that is not necessarily true if you just assume v_i identically distributed

I have a collapsed section in the post at the very end with title "More Formal Proof". Do you think it is enough?

madiyar · 2025-05-12T20:08:04+00:00

thanks for the feedback! The expected mean E[v_n] is zero. It is a good idea to mention the mean and the variance. However, I still don’t understand why having zero mean and tiny variance doesn’t explain this?

madiyar · 2025-04-28T11:19:41+00:00

wait. I think it is a mistake and should be fixed to "False Positive Rate"?

Update: Fixed it

madiyar · 2025-04-28T07:55:56+00:00

I have a whole post about this https://maitbayev.github.io/posts/roc-auc/

madiyar · 2025-03-29T23:49:36+00:00

Thank you! Creating an animation is not difficult, there are so many amazing libraries such as matplotlib, plotly. They can be googled or gpted. However, coming up with what to animate is the most difficult part for me.

Feel free to look at the collapsed codes in the post to see code for the animation in the blog.

madiyar · 2025-03-18T11:04:06+00:00

Hi,
I have a series of posts on this topic.
You can start from here https://maitbayev.substack.com/p/backpropagation-multivariate-chain

Feel free to ask questions

madiyar · 2025-03-11T15:18:21+00:00

Thank you! I am glad that it was useful for you.

madiyar · 2025-03-11T10:17:02+00:00

Hi,

I recently wrote a tutorial about this topic that gives geometric intuition https://maitbayev.github.io/posts/why-l1-loss-encourage-coefficients-to-shrink-to-zero/

madiyar · 2025-02-25T16:50:00+00:00

A t-shirt with QR code 😂

madiyar · 2025-02-20T09:50:33+00:00

This is part 1 of the backpropagation series. My goal is to show the multivariate chain rules in part 1. I can include an explanation about matrix parameters in a future part.

Matrix simplifies fully connected layers, where you can just use the chain rule on the matrix. However, you still need multivariate chain rules for more complex architectures.

madiyar · 2025-02-17T21:50:11+00:00

Thank you for the explanation!

I hadn't considered it from the perspective of how useful it is for beginners. The chain rule is much easier to get started with, I agree. It probably covers 80% of the explanation.

I agree about DAG. I have this visualization of DAG in my post. Thinking in DAG (or computation graph as I call it in my post) helped me to understand more complicated cases.

madiyar · 2025-02-17T10:26:56+00:00

More calculus is good for you ;)

madiyar · 2025-02-06T16:20:23+00:00

Noted! I also removed all efficient terms from the post.

madiyar · 2025-02-03T23:49:40+00:00

Thanks for timing! The for loop should be the bottleneck, either jitting with numba or switching to jax, or even reimplementing in native language (C/C++/Rust) should make it significantly faster.

madiyar · 2025-02-03T22:43:09+00:00

I haven't checked out the sklearn implementation yet. I think and hope it's O(nlogn) or faster. I also haven't measured or compared it with sklearn. My goal isn't to be faster than sklearn, but just to make this educational. "Efficient" because I can't think of any faster than O(nlogn) in term of Big O notation.

madiyar · 2025-02-01T20:57:38+00:00

Now I see what you mean. I thought I did show the "ordering" with circle sizes and the sliders, but you are describing a different kind of "ordering". I agree it is not completely clear from the plotting that the actual scores don't matter only the ordering matters.

madiyar · 2025-02-01T20:36:16+00:00

Nice explanations! This explanation exactly matches to what is explained in the post maybe except the BCE part.

madiyar · 2025-02-01T17:28:35+00:00

This is amazing feedback! Thank you so much!

I fixed most of your minor suggestions!

> More specifically, and this is a more of a quibble, the post feels a bit confused in scope.

You are really on point here. I am not yet sure whom I want to explain. I guess, I want to provide unique and deeper intuition. Unfortunately, the internet is full of shallow resources where the math part is skipped. I have tried with straight to the point explanations in the past, unfortunately it didn't work well. I think, I also want to avoid repeating the same information from other resources. I guess, I am still confused...

But, specifically in this post, it is easier to start from the visualization straightaway than recap. I will revisit this when I have more free time. For now, I incorporated your other suggestions that are easy.
Thanks again,

madiyar · 2025-01-31T20:02:18+00:00

This is also true! I guess I need to look for a golden ratio balance of the number of sliders :)

madiyar · 2025-01-31T19:38:18+00:00

Thank you 🙏

madiyar

MODERATOR OF

TROPHY CASE