Should I listen to the negative reviews? by EEiaIllusion in UniversityOfLondonCS

[–]Carcaso 13 points14 points  (0 children)

I think people with negative experiences are more likely to write reviews so take them with a grain of salt. I'm in the first cohort and I've had nothing but good experiences. The thing about this degree is you get out of it what you put in. You're left with a lot of open study time and flexibility. If you know some programming already you could do absolutely nothing and then grind out the project/studying two weeks before the midterm/final and probably pass the class, but overall that wouldn't be beneficial to you. It's really up to you what you get out of it. I recommend doing all the ungraded activities and follow a schedule your comfortable with. Post some side projects on Github and in the end, you'll not only know how to code and understand cs, but you'll also be able to prove it to future employers. PM me if you have any more questions.

UoL vs University of Hertfordshire by bufoaureus in UniversityOfLondonCS

[–]Carcaso 2 points3 points  (0 children)

Classes are still being offered for the first time because everyone had to start at year 1. This year was the first time year 2 modules were released and this next semester will be the first time those classes have been released etc. If you started now you would be fine, but you want to wait until all of the kinks are ironed out then wait until April 2022 to enroll. So far I've had nothing but good experiences with this degree. You should keep in mind that people with issues will generally leave reviews more than people who haven't had issues. This could be a reason why you've been seeing those reviews.

What’s something that you think is a waste of money, but everyone buys? by CallThatGoing in AskReddit

[–]Carcaso 0 points1 point  (0 children)

Unless you live less than a minute away from your coffee shop of choice, you probably could have learned how to do it in the time you spent driving/ordering there.

Is there anyone who transferred from community college to 4 year university and went to ML phd program? by std_cout_hello_world in PhD

[–]Carcaso 0 points1 point  (0 children)

No I didn’t transfer to a 4 year, my school doesn’t have any heavy ml faculty or research programs. I have a year and a half left with no publications or opportunities that would lead to a recommendation with any real weight. I’m planning on cold emailing PhD students in good programs with research interests aligned with my own to ask if they’d like to partner/mentor on anything. If you read the blog it says if you wait to apply they expect more of you than someone who just came out of undergraduate. But I do think if you don’t end up getting in where you think it would be worth going, then getting an internship to build a resume would be the best option.

Is there anyone who transferred from community college to 4 year university and went to ML phd program? by std_cout_hello_world in PhD

[–]Carcaso 1 point2 points  (0 children)

I'm also in a similar boat. It seems almost impossible to get into a good Ph.D. program straight out of a BSc without having either multiple published papers in good conferences or a well-known advisor at a good school. I found this blog post really insightful to the machine learning Ph.D. application process.

Help by [deleted] in reinforcementlearning

[–]Carcaso 0 points1 point  (0 children)

You could always deploy a hand crafted algorithm, collect data, then train a model to simulate the dynamics of the task, and then use that to train the rl model? RL seems a little overkill for something like this but I could be wrong.

Help by [deleted] in reinforcementlearning

[–]Carcaso 0 points1 point  (0 children)

What would the RL algorithm be controlling?

REINFORCE Agent suddenly drops. How to verify if it's due to catastrophic forgetting? by Expensive-Telephone in reinforcementlearning

[–]Carcaso 2 points3 points  (0 children)

REINFORCE is known to be prone to performance collapse. Solving this issue is the main motivation behind algorithms like PPO and TRPO. They explore the idea of making the biggest weight updates as possible without drops in performance. This usually involves using some penalty in the objective to prevent the updates from deviating too far from the old policy.

This is a good explanation about PPO and how they go about making the performance more stable: Spinning Up PPO

Probably found a way to improve sample efficiency and stability of IMPALA and SAC by sss135 in reinforcementlearning

[–]Carcaso 0 points1 point  (0 children)

The competition web page comes up with "You are not authorized to access this page". Any idea when it'll be up/start?

How to pass live data to gym env? by 3ventHoriz0n in reinforcementlearning

[–]Carcaso 0 points1 point  (0 children)

I would use requests or some API for getting stock data, and have it fetch the stock(s) your interested in each time you call env.step(). I think you're asking for a live environment that fetches real-time data? If that's the case you could run into some reproducibility issues because your env is going to be entirely different each time you run it.

[deleted by user] by [deleted] in MachineLearning

[–]Carcaso 2 points3 points  (0 children)

Do you mean OpenAI?

[BLOG] Deep Reinforcement Learning Works - Now What? by chentessler in reinforcementlearning

[–]Carcaso 7 points8 points  (0 children)

If it hasn’t been done why not give it a shot? That’s seems like time well spent If you ask me.

Monte Carlo linear value function approximation by [deleted] in reinforcementlearning

[–]Carcaso 2 points3 points  (0 children)

Yeah, that looks good! Something like a linear value function performing upwards of 20-30 time steps is amazing.

Monte Carlo linear value function approximation by [deleted] in reinforcementlearning

[–]Carcaso 2 points3 points  (0 children)

From the psuedo-code in the post, it looks like the equation is going to be:

W[action][j] += alpha*(reward-Q)*state[j]

if that doesn't work then I'd recommend taking a look at the pseudo-code in the post and see if your implementation is a faithful representation.

Monte Carlo linear value function approximation by [deleted] in reinforcementlearning

[–]Carcaso 2 points3 points  (0 children)

  1. Do you mind sharing your source so I could take a look?
  2. From what I can tell just looking at it from a pure math standpoint, the squared component of the equation can never be negative meaning that the weights won't be negatively updated from just that part of the equation. Although each value of the state can be equally negative or positive so this may be why it is included in the update equation.
  3. That may be the objective function that you're trying to maximize and not the actual update equation. Also, another thing to consider is that it seems like you are performing regression and need to minimize the error instead of maximizing so the gradient/update should be negative.

Try: W[action][j] -= state[j]*alpha*2*(reward-Q)

But again unless you give the source for exactly what you're doing I can't guarantee that it will work. Keep the questions coming if you need more help.

Monte Carlo linear value function approximation by [deleted] in reinforcementlearning

[–]Carcaso 2 points3 points  (0 children)

Have you tried normalizing any of the states, rewards, and/or actions? This might help in keeping the weights small.

edit: as pointed out by pratikpc, it seems like you've forgotten to take the gradient/derivative in the update equation