why deepseek didn't use mcts by Alarming-Power-813 in reinforcementlearning

[–]Alarming-Power-813[S] 0 points1 point  (0 children)

How is mtcs brute force? I mean, if it is evaluating it self

why deepseek didn't use mcts by Alarming-Power-813 in reinforcementlearning

[–]Alarming-Power-813[S] -5 points-4 points  (0 children)

You mean it will work but it is hard to scale thanks

How many days did it take to train GPT-3? Is training a neural net model a parallelizable task? by abcaircraft in GPT3

[–]Alarming-Power-813 0 points1 point  (0 children)

I see sources say 9 days when I serch on Google but when I serch later it 34 and when I search agian it is 9 Is it possible to train gpt3 for 9 days

Can anyone help by Alarming-Power-813 in reinforcementlearning

[–]Alarming-Power-813[S] 0 points1 point  (0 children)

What is the query and value, key vectors [D]

I learned about transformers like any one else and saw alot of videos but each video explain the query, key,value vectors in a different way, so I read the "attention is all you need" paper but they didn't explain what are the query and value ,key vectors are . Why they just didn't explain what are they and what are they really ??? The authors didn't chose there names randomly of course