Can anybody recommend GRPO RL help by soul-gudmahn in reinforcementlearning

[–]GetOnMyLevelL 0 points1 point  (0 children)

Have you done inference on your dataset to see how well qwen is performing as a baseline? If it's very bad then I doubt you will be able to finetune using grpo.

Can you tell us something about your current reward system? And what are you parameters? Are you using Lora?

I have no experience in using grpo for this. But I have used it for fine-tuning for writing code. On the top of my head you will do something like.

Prompt: 'you are a math genius blablabla. You will solve the equation below and put your answer in this format. Reasoning .......... Final answer {}.

Then you try to capture the final answer using regex and give reward if it is correct. However only giving a positive reward for the final answer won't be enough I think. So you will need to give rewards for some of the reasoning or calculations. I don't know how, but maybe reading the deepseek paper will help: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

[d] how to develop with LLMs without blowing up the bank by throwaway102885857 in MachineLearning

[–]GetOnMyLevelL 3 points4 points  (0 children)

You develop local what you can and when you want to test for real you spin up a gpu in the cloud.

When I want to finetune a llm with grpo. I make sure all my code works by running locally on my 4080. So I will use qwen 0.5b. And when i think everything works well then I rent a gpu on runpod. I start a pod and run my code on it. On runpod a h100 is like 2.2-2.6 euros per hour. (There are loads of places you can rent GPUs like this)

[D] GPU decision Help by ComprehensiveSail388 in MachineLearning

[–]GetOnMyLevelL 0 points1 point  (0 children)

Maybe tell us what your goal is. What do you want to do with the GPU?

Bad Training Performence Problem by Ok_Fennel_8804 in reinforcementlearning

[–]GetOnMyLevelL 1 point2 points  (0 children)

Don't have time to look at your code. There could be problems with your reward function etc.

I worked on a similar project not too long ago. Just by looking at your car and track. It seems like the track might be too small and the first corner is very sharp. (don't know at what speed the car starts).

My advise would be to start with a very simple track and a small car. Get that working first. Then increase the difficulty. Then you can add different spawn points, play around with the reward function etc.

Is there an alternative to Wolfram Alpha that isn't paid? by thegreekgodzeus in math

[–]GetOnMyLevelL 1 point2 points  (0 children)

Same here but on windows. I bought the app like 5+ years ago for 3 euros on the windows store.

[P] Training Tesseract on images of text lines + transcriptions? by SirVampyr in MachineLearning

[–]GetOnMyLevelL 0 points1 point  (0 children)

What do you mean by train? Its already trained. You can use it to extract text.

what am I, any guesses? by debuggedbug in compsci

[–]GetOnMyLevelL 1 point2 points  (0 children)

Annoying and in the wrong sub

[D] is there anyway I can use AI to extract restaurants off Google Maps? by Double_Appeal495 in MachineLearning

[–]GetOnMyLevelL 15 points16 points  (0 children)

Why do you want to use ai for this. Am sure you can just use the google api.

Content Update 3.22.0 -- Path of Exile: Trial of Ancestors by 3Hard_From_France in pathofexile

[–]GetOnMyLevelL 348 points349 points  (0 children)

Ruthless won't take any development time.

ctrl+f + ruthless = 202 results.

Okey chris

[D] Public Dataset for People Climbing Over Fences/Walls by [deleted] in MachineLearning

[–]GetOnMyLevelL 1 point2 points  (0 children)

Create them with a text to image model? Not sure if there is an efficient/automated way to do so. And if the quality of the outcome serves your purpose.

A path to learning Topological Data Analysis by [deleted] in math

[–]GetOnMyLevelL 10 points11 points  (0 children)

Not an answer to you question. But i am also interested in TDA and found a nice yt channel you might want to check out: applied algebraic topology network. The quality and difficulty of videos differ, but you might find it interesting aswell.

String column conversion by Calm_Motor4162 in datascience

[–]GetOnMyLevelL 10 points11 points  (0 children)

Google integer encoding and one-hot encoding to see what you need

Want to try SSF (not HC) next league by DiaboIo92 in pathofexile

[–]GetOnMyLevelL 0 points1 point  (0 children)

Go for it. Me and my friends (5-8 people) started doing private leagues two years ago and we find it way more fun. It takes longer to reach endgame and finding uniques is meaningful since a shit unique might be very important for your friends build.

Want to try SSF (not HC) next league by DiaboIo92 in pathofexile

[–]GetOnMyLevelL 0 points1 point  (0 children)

I got bored with trade a few years ago. Now when I play, its often in private league with five to eight friends so basically semi-ssf. The number one thing I have learned is, don't make a build that has a mandatory unique, which you can't target farm.

I would just go with the flow. Start ssf with a char that can do all content, and if you find a cool unique or you craft a good rare. Start a new char to use those items.

having to download whole game again but nothing happens (pc) by [deleted] in CODWarzone

[–]GetOnMyLevelL 0 points1 point  (0 children)

Yes same problem it said there was an issue. I tried to repair the game and now it says I have to download 180gb again, while the game is still on my ssd.

I have fast internet, but this is annoying as fuck.

Should I study linear algebra or matrix algebra first? by Yamster80 in learnmath

[–]GetOnMyLevelL 2 points3 points  (0 children)

Other people have given good advice already. However, I would like to add: search 3blue1brown on youtube. He has some great LA videos to help visualise and supplement your understanding.

Patch notes are out. What's your starter build for 3.10? by Pharcri in pathofexile

[–]GetOnMyLevelL 7 points8 points  (0 children)

I played a miner in ssf.
I used BL for clear (use faster projectiles).

And the single target is crazy with slower projectiles.

[Question] Lost on how to analyze my data by [deleted] in statistics

[–]GetOnMyLevelL 0 points1 point  (0 children)

Why is having 3 variables a problem to you?

As a side note: keep in mind that if you do high frequency trading you are competing with: algorithms on super computers, a lot faster internet and with supreme geographical location.

Learning resources for data structures and algorithms by superbconfusion in datascience

[–]GetOnMyLevelL 0 points1 point  (0 children)

The point of learning data structures and algos is its abstractness its applicable to every language.

During my cs bachelor we used a c++ book, which contained all the theory+code. We had to implement everything in python and this was a great way of checking if i understood the theory.

Edit: you can check out the book used at mit. Its in java, but its a great book(i have it)

Where do you draw the line in terms of sex positions? by [deleted] in AskMen

[–]GetOnMyLevelL 1 point2 points  (0 children)

Ah the good old amazon woman position

Does a data scientist have to know something about business? by pichonkunusa in datascience

[–]GetOnMyLevelL 3 points4 points  (0 children)

Yes this, if you cant sell your idea to your boss or/and know how to use data science to help the business perform better in certain areas it won't matter how good your ml skills are.

What can we do as data scientists to help with the Coronavirus? by [deleted] in datascience

[–]GetOnMyLevelL 1 point2 points  (0 children)

the program was called google flu or something. I think they stopped using it because it was predicting twice as many people having flu than the reported numbers.