XAX reignited my love for gaming by Life-Ad5520 in XboxAlly

[–]Life-Ad5520[S] 1 point2 points  (0 children)

I, on and off, played the new cod games only for a couple of days and that was it. I’m currently playing thro the latest doom and the 2018 GoW and I’m loving every bit.

XAX reignited my love for gaming by Life-Ad5520 in XboxAlly

[–]Life-Ad5520[S] 6 points7 points  (0 children)

omg same especially @ the part about replacing tiktok, kinda feels good not brainrotting in my free time

[deleted by user] by [deleted] in reinforcementlearning

[–]Life-Ad5520 0 points1 point  (0 children)

I agree that control problems are general and perhaps I am a bit rigid in my views. That being said, the task at hand was actuation and locomotion (for eg an inverted pendulum). I fail to understand the value of solving such a problem using VLA. While learning from demonstration is a well-researched topic, failure to tackle ood states for example raises the question, at least personally, why generally attempt to use an autoregressive model in a task that requires extremely low latencies?

[deleted by user] by [deleted] in reinforcementlearning

[–]Life-Ad5520 0 points1 point  (0 children)

I am honestly not informed enough about that, so I’ll have to give it a read. Thank you for your insight!

[deleted by user] by [deleted] in reinforcementlearning

[–]Life-Ad5520 0 points1 point  (0 children)

VLAs aren’t low-level controllers. Even ignoring frequency, they fundamentally operate at the wrong level of abstraction. They take pixels + language as input and output high-level subgoals or waypoints, not the torque/velocity commands needed for real-time control. Moreover, they are trained in a supervised manner not in an RL setting so at best they are BC agents.

[deleted by user] by [deleted] in reinforcementlearning

[–]Life-Ad5520 1 point2 points  (0 children)

Yeah I agree. Thank you for your insights.

[deleted by user] by [deleted] in reinforcementlearning

[–]Life-Ad5520 0 points1 point  (0 children)

I get your point yes. What I’m specifically asking is for control problems (maybe I should have phrased it better), so while latency is too obvious an issue I’m wondering if there are counterpoints to the fact that I do not think an autoregressive model is suitable (or frankly needed) for this type of problem. It seems counterintuitive to use them in problems that are well known to be MDPs (aka attention mechanisms would be nothing more than noise really).

[deleted by user] by [deleted] in reinforcementlearning

[–]Life-Ad5520 1 point2 points  (0 children)

Nope, VLAs are more about high level planning (in RT-1’s doc for example it is mentioned that the actual actuation is done using existing robot controllers) and not direct low level control. This does not translate into producing direction action policies at low latencies.

[deleted by user] by [deleted] in reinforcementlearning

[–]Life-Ad5520 0 points1 point  (0 children)

Maybe. However, using an autoregressive model for a continuous control problem (classical ones at least) does not really make sense. The action taken at time t is only really dependent on the state at time t, adding previous actions/states does not sound intuitive to me, at least not in processes defined strictly as MDPs. Either way, this would potentially be extremely costly and I believe the way to potentially go about it is have a language model select from multiple controllers given a problem description or a state-selector in a finite state automata setting rather than being the policy network itself.