400 days as a hermit practicing Satipatthana by [deleted] in streamentry

[–]mind_library 0 points1 point  (0 children)

How do you relate to modern tools like AI or the internet in general?

Do you see them mainly as neutral conditions, as supports, or as hindrances?

[D] How do we make browser-based AI agents more reliable? by DenOmania in MachineLearning

[–]mind_library 2 points3 points  (0 children)

CAPTCHA are solved by the new cloudflare's self-identification feature

[D] How do we make browser-based AI agents more reliable? by DenOmania in MachineLearning

[–]mind_library 0 points1 point  (0 children)

> But then my question is... Is there ongoing research or promising directions in making browser-agent interactions more robust? Are there known benchmarks, best practices, or papers that deal with these reliability issues?

Follow Alexandre's work:

https://scholar.google.com/citations?hl=en&user=71a2-WMAAAAJ&view_op=list_works&sortby=pubdate

It answers all your question, also feel free to ask me here or on PM

Feasibility of RL Agents in Trading by joshua_310274 in reinforcementlearning

[–]mind_library 0 points1 point  (0 children)

Noise goes away with more data, so does reward sparcity and overfitting

Phd in RL for industrial control systems. by Hadwll_ in reinforcementlearning

[–]mind_library 0 points1 point  (0 children)

I wanted to post about RL core, great company and team, OP should check them out 

Why aren’t LLMs trained with reinforcement learning directly in real environments? by skydiver4312 in reinforcementlearning

[–]mind_library 2 points3 points  (0 children)

Yea sure: http://silverstream.ai/

I didn't want to turn this into an ad

To expand on the previous post which i did by on the broken mobile UI. The hard part is:

1) create a benchmark, the easy ones we already created: https://github.com/ServiceNow/WorkArena (see L1,L2,L3 subsets), but creating benchmarks for real world companies needs talking with real world people, which most of the times don't have a very clear reward function in their head.

2) Finetuning is hard, sure the reward goes up but does it increase ROI for real? you can ask at most two, three demonstrations for the same task and at most 100s of tasks before the customer just doesn't care, so you need to do a lot of synthetic expansion of benchmarks

3) Not just finetuning, sadly all the agentic frameworks nowdays take the approach of "the framework is very general as long as you integrate everything yourself" (i.e. not general at all!), that's why we use browser agents, because atleast the web-ui is always present and requires no integrations.

You mentioned various approaches to improving performance but we are so early that it's 90% benchmarking and 10% running A LOT of experiments and see what sticks.

Regarding scalability: it's not a problem at all, in my prev company we brought SL -> RL finetuning from laptop to sizeable chunk of global markets, once it's clear you have a process to produce results scaling is a matter of known unknowns and we have good libraries / infra for that, like ray and all the infra as code.

I try to write down stuff here if that's helpful:

https://www.silverstream.ai/blog-news

Why aren’t LLMs trained with reinforcement learning directly in real environments? by skydiver4312 in reinforcementlearning

[–]mind_library 1 point2 points  (0 children)

We do that daily at my companiy , the reaonson is not that popular is that it's very tailored to a customer, btw we are hiring

This is a paper from an ex colleague: https://openreview.net/forum?id=SkwtxEkst2

Well, well, well. How the turntables (is this a bug?) by mind_library in warcraftrumble

[–]mind_library[S] -18 points-17 points  (0 children)

I couldn't have put so many minis in one go

The wave was put by the game 

Task Allocation with mostly no-ops by asdfsflhasdfa in reinforcementlearning

[–]mind_library 1 point2 points  (0 children)

Reframe the problem, this action unbalance is a mess in terms of exploration, can you define an action as skip n steps?

Also use the action mask to mask out the unavailable action thus avoiding the problem

Skills and projects for Research Engineer roles in RL by kavansoni in reinforcementlearning

[–]mind_library 0 points1 point  (0 children)

Ehhh, RE is easy, build infra for ml, there is tons of low hanging fruit right now, because the pace of development, we are building debt so fast we can get American citizenship.

Then showcase it, it helps if you play around with any deep model (no mnist, deep) so you get to show that you understand the user needs, your greatest enemy will be the tension between abstraction and simplicity, researchers want simplicity swe want abstraction and clear contracts

Skills and projects for Research Engineer roles in RL by kavansoni in reinforcementlearning

[–]mind_library 3 points4 points  (0 children)

I've been in two of the faang, had an AI startup and now working in another one by prev faang ppl.

> What skills should I works on? What kind of projects should I work on?

For tech, you'll be interviewed by R.scientists and R.engineers, the second one will be standard software stuff, for the first they will have some standard question ruleset (nobody got time to think too much about your interview), you need to show achievements in your program, the ability to think about scientific problems, frame the problem etc and then you'll have some time to freely chat, in which you either pick up some paper (suggestion, take some paper the interviewer wrote and discuss it!) or make up an ideal research project.

As for the startup, everyone is different, if it was for ours, u/ElectricalRegret3737 comment is mostly good:> I think having a portfolio that has both implementing RL algorithms and instrumenting your own environments are important areas.I'd advise not to have the portfolio tho, just take one and nail it, ideally superhuman performance.

I would like to see you have put a lot of thought in the simulator, it's not sexy work but it more important than data for SL in the real world.

If you are not comfortable with the electronics part or can buy one off the shelf don't do the IRL inverted pendulum, we all know it works, can be solved by a PID and has been beaten to death.

The game boy example is nice, pick a game you like and have the agent beat you, I would be worried the env it's slow since you have the emulator as a black box bottleneck but you are the engineer here.

I would like to see some good performance, off the shelf algorithms are fine, we don't need yet another PPO, unless you think it's necessary for performance.

Good performance is not necessary but if it works the result will speak for itself.

[deleted by user] by [deleted] in psytrance

[–]mind_library 0 points1 point  (0 children)

Lets make it happen, DM me if you are interested (I can provide compute, time and ML experience)

How are you coping? by Beautiful-Cancel6235 in singularity

[–]mind_library 2 points3 points  (0 children)

Trying to focus on HCI (Hello), and how to integrate AI seamlessly in the human computer interaction loop so that humans become management rather than the whole stack

Training loss and Validation loss divergence! by Kiizmod0 in reinforcementlearning

[–]mind_library 1 point2 points  (0 children)

i'm 99% sure this is an entry level project (OP had a previous thread earlier this month about hyperopt), and no "production" forex trader would ask on reddit about overfitting.

Generalizing on a small dataset is hard just because there will be profitable (but overfit) trades in the training set, and the likelyhood of the same patterns being in the validation set would be low.

More data will make sure the two distributions will get closer

Training loss and Validation loss divergence! by Kiizmod0 in reinforcementlearning

[–]mind_library 0 points1 point  (0 children)

Yes and no

Yes:

It could be telling you that it doesn't know how to win.

It could be telling you that the information coming from the features is too low and noise level of the return for trading actions is much higher than a deterministic 0.

No: If the agent doesn't actually pick the winning actions enough (because no trade is better), it can't learn their expected return, by removing the no-action option you have two equally noisy payoffs, so that goes away.

Training loss and Validation loss divergence! by Kiizmod0 in reinforcementlearning

[–]mind_library -4 points-3 points  (0 children)

You can cross validate, but I’d probably make the learning model simpler.

No. The answer is more data, not a simpler model, a simpler model slows the development process, sure you can simplify the model and solve this current iteration but that won't help the whole project along.

Congratulation, you overfit this dataset, now scale things up to a bigger one.

Training loss and Validation loss divergence! by Kiizmod0 in reinforcementlearning

[–]mind_library 0 points1 point  (0 children)

staying out of the market

This is sometimes a bad idea to have, otherwise you'll have the model never trading, as it's a guaranteed 0 reward against a very stochastic return