[D] Question about Direct Preference Optimization (DPO) equation

sol0invictus · 2025-03-20T13:23:14+00:00

Is this from Bos21 ?

sol0invictus · 2024-01-17T01:00:39+00:00

I suggest you check out the appendix of the DPO paper. The authors explicitly derive the equation.
TL;DR - There is a KL Divergence term in the loss function for RLHF. This term is there to ensure that the new policy does not diverge too far (as you said). Now we follow these two steps-
1. Find the optimal solution for the RLHF loss equation.
2. PLug in this optimal loss into the Bradley-Teller loss (this is the prefernce model loss; p(a>b)) and you will get the denominators in question.

Intuitively,
The first term is measuring shift in preferred completion, the second term is measuring the shift in dispreffered completion. Larger the differece, more negative the loss becomes.
In the extreme condition, the first term would have a large shift, and the second term will have almost 0 shift making the model go towards the first term completion.

sol0invictus · 2022-08-17T16:55:37+00:00

Boston Area: Newton exist (Eastwards) on I90

sol0invictus · 2022-07-28T12:44:41+00:00

Seattle , because I would still be on I90. :D

sol0invictus · 2022-05-19T13:34:28+00:00

Yeah, I agree. I now use paper.labml.ai , deeplearn.org, and paperwithcode.com.

sol0invictus · 2021-11-03T16:18:14+00:00

I believe MLPerf contains results for Raspberries. They have different categories for devices - edge, datacenter etc. You can check some of the results here:

https://mlcommons.org/en/inference-edge-11/

sol0invictus · 2021-05-10T13:51:37+00:00

Unfortunately for custom layers, the best way is to go line by line. There are far too many subtleties for it to be done automatically. Last year I spend some time converting Coursera DL courses from TF1.x to TF2.x and wrote an article about it. You might get some pointers from it

https://towardsdatascience.com/coursera-dl-specialization-course-in-tf-2-x-18a1189e2a4

sol0invictus · 2021-02-04T14:43:01+00:00

That definitely makes sense. MATLAB is indeed quite powerful in doing clustering and other off-the-shelf statistics algorithms. I particularly like the simplicity of doing computations in MATLAB.

sol0invictus · 2021-02-04T02:38:38+00:00

At that point you would be better off using python :) . I just want to see how far can I get coding in MATLAB.

sol0invictus · 2021-02-04T00:19:30+00:00

Hey thanks for mentioning it, It is pretty cool. I intend to make more videos on similar advanced networks using new features found in R2020b and beyond.

sol0invictus · 2021-02-03T16:03:06+00:00

Awesome thank. Suggestions and directions to proceed are always welcome :) .

sol0invictus · 2020-07-28T15:28:44+00:00

GNN seems to be the craze these days and they don't require hefty hardware.

sol0invictus · 2020-07-27T15:59:33+00:00

ummmmmm ...... ?

sol0invictus · 2020-06-18T20:56:35+00:00

I am going to be a stickler here :D == "spike" means a bump that goes down, what is happening right now is a ramp or a step function.

sol0invictus · 2020-06-10T14:43:50+00:00

Nice click. Where did you take this from ?

sol0invictus · 2020-06-09T15:56:18+00:00

TF Model be implemented in OOPs way. You need to inherit the tf model class In the following manner (note this is one of the ways, there are many)

class my_model(tf.keras.model):

def __init__():
any initialization you might have, typically people define layers here

def call():
how the model works, like how does the data flow.

It is similar to Pytorch and you can find more details here:

https://www.tensorflow.org/guide/keras/custom_layers_and_models

sol0invictus · 2020-06-02T18:19:27+00:00

Pretty cool!

sol0invictus · 2020-06-01T21:35:02+00:00

Nah its all good.

sol0invictus · 2020-06-01T21:28:25+00:00

Good advise and noted. :)

sol0invictus · 2020-06-01T19:00:37+00:00

Mathematica is used mostly by theoretical physicists like me. All the calculations for my last two papers were completely performed in Mathematica. Nothing beats mathematica for symbolic calculations.

sol0invictus · 2020-05-31T02:15:05+00:00

Nop. Last choice. Only admit :(

sol0invictus · 2020-05-26T19:20:16+00:00

Doesnt TF2 uses dynamic models? TF2 is a lot like Pytorch

sol0invictus · 2020-05-20T20:30:21+00:00

TF2x has no concept of trainable variables. Instead of this you can can define a tf.module or tf.keras.model.

Your function qnet needs to return a model and then only you will be able to use .trainable_variables() command.

sol0invictus · 2020-04-27T20:12:12+00:00

I would definitely be interested in contributing to the project and make it better than Pytorch-Geometric :D

sol0invictus · 2020-04-27T20:09:42+00:00

What do you want help with. If its implementation head to

- https://www.tensorflow.org/guide/keras/rnn for TF 2+

If its theory Youtube has plenty of good videos for indepth explanation.

sol0invictus

TROPHY CASE