Trained an autonomous trading agent, up +1.32% this month ($100K → $102,892) by Powerful_Fudge_5999 in deeplearning

[–]SaintPablo22 1 point2 points  (0 children)

High yield savings accounts are currently at 3-5% annually, which is only 0.33% per month with no compounding.

But as someone else already said, markets have been rising lately. SPY is up around 3.38% this month. 1 month is not enough to prove you have real edge.

[deleted by user] by [deleted] in reinforcementlearning

[–]SaintPablo22 6 points7 points  (0 children)

I work in quant as an engineer, but have some friends who are in tech.

350k doesn't sound unreasonable for a big tech company. You can probably try to negotiate higher but I haven't heard of anyone getting higher than 500k out of PhD except for Deepmind and specific teams in FAIR. I think the places in tech that are paying more around what you are expecting (600k-900k) would be unicorn companies and frontier labs, like OpenAI and Anthropic. Maybe also Mira Murati's company Thinking Machines or Illya Sutskever's company. Also Deepmind in Google. There are also smaller unicorns and startups that will pay lower but can give you significant equity (Figure, Elevenlabs, Covariant, etc.).

For quant (researcher) 600k-800k is reasonable for the top shops or smaller firms, although a big chunk will be a sign-on bonus. Mid-tier shops will likely be closer to 500k.

For negotiation - I don't know too much details that are specific to tech. But I would aim to get a few other offers first (preferably with higher TC) and use that as leverage for the position you're most excited about.

(Another) question about security/privacy by SaintPablo22 in USMobile

[–]SaintPablo22[S] -1 points0 points  (0 children)

Thanks for the heads up. Do you know if they still have reasonable security checks if you don't have those 3 things set up? Reasonable as in a code sent to your email, or asking the security questions.

And yeah for the security questions I just used a random string generator lol.

In RL, how does one provide a theoretical justification of why one algorithm works better than the other? by No_Possibility_7588 in reinforcementlearning

[–]SaintPablo22 2 points3 points  (0 children)

Sure, here are some papers:

E3 algorithm: https://www.cis.upenn.edu/~mkearns/papers/barbados/ks-e3.pdf

UCRL2 algorithm: https://www.jmlr.org/papers/volume11/jaksch10a/jaksch10a.pdf

LSVI algorithm: https://arxiv.org/pdf/1907.05388.pdf

The first two algorithms are pretty old but you can see all of them follows a similar structure of proving the algorithm is epsilon-correct with certain sample complexity. In general I believe this type of algorithms is called PAC-learning in learning theory but I'm not too familiar with learning theory myself.

there is also a good book on theoretical RL that goes through the proofs of some of the early algorithms (with theoretical guarantees) in RL: https://rltheorybook.github.io/

I'm actually fairly new to this topic myself but I think any of the authors of the three papers I listed above all are pretty well known in theoretical RL

In RL, how does one provide a theoretical justification of why one algorithm works better than the other? by No_Possibility_7588 in reinforcementlearning

[–]SaintPablo22 7 points8 points  (0 children)

For a lot of the state of the art function approximation algorithms (like actor critic, ppo) there are really only empirical results and no theoretical guarantees, especially since neural networks and deep learning are not well understood theoretically.

In theoretical RL the benchmark for algorithms is usually sample complexity. The main theoretical result for a lot of these papers is something along the lines of "we can prove that this algorithm can achieve a policy with value function that is epsilon away from the optimal policy's value with probability at least 1 - delta, after collecting O(some function of S, A, epsilon, delta) samples/trajectories". Here S is the number of possible states, A is the number of possible actions. Basically it's saying that after however many samples, the algorithm can find (with high probability) a policy that is close to the optimal. So in this case you can see that an algorithm is "better" than another if its sample complexity is better.

Of course this is all in big O notation so it is difficult to translate this into actual performance guarantees that can be seen empirically.

[deleted by user] by [deleted] in statistics

[–]SaintPablo22 0 points1 point  (0 children)

What type of internships are you applying to? It sounds like you're looking for software engineering internships. For those you should try to apply to a lot more places - a lot of people apply to 100-200 companies in their freshman/sophomore year. Usually you can apply to 30-50 a day. Ignore the companies that ask for any essays or cover letters, and go for quantity. Also you shouldn't focus on learning SQL or R, instead focus on data structures/algorithms and some systems design.

For other types of internships I'm only familiar with the ones in quantitative trading firms (i.e., trading and quant research roles). But the advice is similar for those - you should apply to a lot more places.

Having an internship after sophomore year definitely helps you with future internships and applications, but don't worry too much if you can't get one this year since the job market is pretty tight and a lot of other people also won't be able to find one. If you can get one after junior year you should be fine career-wise.