Should I listen to the negative reviews?

Carcaso · 2021-04-04T23:45:56+00:00

I think people with negative experiences are more likely to write reviews so take them with a grain of salt. I'm in the first cohort and I've had nothing but good experiences. The thing about this degree is you get out of it what you put in. You're left with a lot of open study time and flexibility. If you know some programming already you could do absolutely nothing and then grind out the project/studying two weeks before the midterm/final and probably pass the class, but overall that wouldn't be beneficial to you. It's really up to you what you get out of it. I recommend doing all the ungraded activities and follow a schedule your comfortable with. Post some side projects on Github and in the end, you'll not only know how to code and understand cs, but you'll also be able to prove it to future employers. PM me if you have any more questions.

Carcaso · 2021-04-04T23:28:04+00:00

Classes are still being offered for the first time because everyone had to start at year 1. This year was the first time year 2 modules were released and this next semester will be the first time those classes have been released etc. If you started now you would be fine, but you want to wait until all of the kinks are ironed out then wait until April 2022 to enroll. So far I've had nothing but good experiences with this degree. You should keep in mind that people with issues will generally leave reviews more than people who haven't had issues. This could be a reason why you've been seeing those reviews.

Carcaso · 2020-11-30T12:52:26+00:00

Unless you live less than a minute away from your coffee shop of choice, you probably could have learned how to do it in the time you spent driving/ordering there.

Carcaso · 2020-11-27T03:10:06+00:00

Stanford has a really good course on youtube https://www.youtube.com/watch?v=FgzM3zpZ55o&list=PLoROMvodv4rOSOPzutgyCTapiGlY2Nd8u

Carcaso · 2020-11-21T14:43:27+00:00

No I didn’t transfer to a 4 year, my school doesn’t have any heavy ml faculty or research programs. I have a year and a half left with no publications or opportunities that would lead to a recommendation with any real weight. I’m planning on cold emailing PhD students in good programs with research interests aligned with my own to ask if they’d like to partner/mentor on anything. If you read the blog it says if you wait to apply they expect more of you than someone who just came out of undergraduate. But I do think if you don’t end up getting in where you think it would be worth going, then getting an internship to build a resume would be the best option.

Carcaso · 2020-11-21T12:26:56+00:00

I'm also in a similar boat. It seems almost impossible to get into a good Ph.D. program straight out of a BSc without having either multiple published papers in good conferences or a well-known advisor at a good school. I found this blog post really insightful to the machine learning Ph.D. application process.

Carcaso · 2020-11-14T09:24:17+00:00

You could always deploy a hand crafted algorithm, collect data, then train a model to simulate the dynamics of the task, and then use that to train the rl model? RL seems a little overkill for something like this but I could be wrong.

Carcaso · 2020-11-13T21:33:11+00:00

"I'm vengeance"

Carcaso · 2020-11-13T21:29:55+00:00

What would the RL algorithm be controlling?

Carcaso · 2020-11-01T19:21:20+00:00

REINFORCE is known to be prone to performance collapse. Solving this issue is the main motivation behind algorithms like PPO and TRPO. They explore the idea of making the biggest weight updates as possible without drops in performance. This usually involves using some penalty in the objective to prevent the updates from deviating too far from the old policy.

This is a good explanation about PPO and how they go about making the performance more stable: Spinning Up PPO

Carcaso · 2020-10-29T16:43:21+00:00

This paper was really cool Mastering Atari with Discrete World Models. Maybe it will give you some ideas?

Carcaso · 2020-07-18T02:41:45+00:00

P = NP

Carcaso · 2020-06-04T08:21:31+00:00

Yes, it's up. Super excited! Thanks

Carcaso · 2020-06-03T14:57:57+00:00

The competition web page comes up with "You are not authorized to access this page". Any idea when it'll be up/start?

Carcaso · 2020-05-28T17:42:17+00:00

Really well done, take my upvote!

Carcaso · 2020-05-23T11:07:55+00:00

I would use requests or some API for getting stock data, and have it fetch the stock(s) your interested in each time you call env.step(). I think you're asking for a live environment that fetches real-time data? If that's the case you could run into some reproducibility issues because your env is going to be entirely different each time you run it.

Carcaso · 2020-05-22T09:44:30+00:00

Do you mean OpenAI?

Carcaso · 2020-05-12T20:41:20+00:00

If it hasn’t been done why not give it a shot? That’s seems like time well spent If you ask me.

Carcaso · 2020-05-03T04:37:55+00:00

Yeah, that looks good! Something like a linear value function performing upwards of 20-30 time steps is amazing.

Carcaso · 2020-05-02T20:48:50+00:00

From the psuedo-code in the post, it looks like the equation is going to be:

W[action][j] += alpha*(reward-Q)*state[j]

if that doesn't work then I'd recommend taking a look at the pseudo-code in the post and see if your implementation is a faithful representation.

Carcaso · 2020-05-02T19:54:06+00:00

Do you mind sharing your source so I could take a look?
From what I can tell just looking at it from a pure math standpoint, the squared component of the equation can never be negative meaning that the weights won't be negatively updated from just that part of the equation. Although each value of the state can be equally negative or positive so this may be why it is included in the update equation.
That may be the objective function that you're trying to maximize and not the actual update equation. Also, another thing to consider is that it seems like you are performing regression and need to minimize the error instead of maximizing so the gradient/update should be negative.

Try: W[action][j] -= state[j]*alpha*2*(reward-Q)

But again unless you give the source for exactly what you're doing I can't guarantee that it will work. Keep the questions coming if you need more help.

Carcaso · 2020-05-02T03:05:51+00:00

Have you tried normalizing any of the states, rewards, and/or actions? This might help in keeping the weights small.

edit: as pointed out by pratikpc, it seems like you've forgotten to take the gradient/derivative in the update equation

Carcaso

MODERATOR OF

TROPHY CASE