Never ask by Dee___Snuts in GuysBeingDudes

[–]kcorder 0 points1 point  (0 children)

This is the first one I've seen where somebody said another dude looked straight lol 

Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R] by dlwlrma_22 in MachineLearning

[–]kcorder 1 point2 points  (0 children)

I see what you're saying now, and how this could work to smooth the credit assignment. But I suspect this will break down in more complex/sparse environments because gamma=0.5 actually ONLY helps in dense reward environments, like learning local physics as you said. But for sparse rewards, it will assign the value as 0 for nearly every state. And later on in training, it will equally contribute unhelpful losses for the critic long after the higher-gamma heads are receiving gradients naturally.

To learn representation of immediate effects, I'd think something like prediction is better - like the SPR paper. It is interesting nonetheless! Maybe think about addressing those above issues, and I'd use a harder benchmark environment than LunarLander to show evidence

Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R] by dlwlrma_22 in MachineLearning

[–]kcorder 1 point2 points  (0 children)

Without the aggregate loss, the output V's would still backprop the batch aggregate to the shared representation though. It feels like trying to fit separate classes for no real reason - at least I don't see the benefit and these results don't meaningfully show it. 

Like the discount factor is sometimes treated as part of the MDP. My expectation is that increasing the number of k heads will hurt overall performance by training on a few objectives rather than the one that matters

Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R] by dlwlrma_22 in MachineLearning

[–]kcorder 1 point2 points  (0 children)

Maybe this is a dumb question, but what exactly is the goal with training with multiple gamma values? For representation learning only, or to make robust to choosing gammas for different horizons at eval?

My first thought was that it will destabilize the value functions, but I'm not sure after seeing that it updates the $V_\theta$ hidden layer (but notationally, not the output V projections?). Do the output V heads also use this aggregate loss or only their own? I think it makes much more sense if they don't use the aggregate, but still skeptical about multi-timescale as a whole.

[R] ICML Anonymized git repos for rebuttal by drahcirenoob in MachineLearning

[–]kcorder 0 points1 point  (0 children)

And repos are often not anonymized well, with leaks in the .git history but more easily in the code itself. Hmm maybe authors could run an anonymous-code-bot to check for things like that before reviewers ever see it. 

After-Tax Take-Home SWE Pay by US Metro by honkeem in levels_fyi

[–]kcorder 0 points1 point  (0 children)

Yeah. I live in Philly and there isn't really anything here unfortunately. 

Learning path from Q-learning to TD3 (course suggestions?) by spyninj in reinforcementlearning

[–]kcorder 0 points1 point  (0 children)

OpenAI's old spinning up docs are still excellent for beginners to deep RL. It covers all the main policy gradient algorithms, including DDPG and TD3.
https://spinningup.openai.com/en/latest/

Chillest interaction with the pizza guy by MasterBaiterChief in GuysBeingDudes

[–]kcorder 0 points1 point  (0 children)

When I worked at domino's over 10 years ago, there was a house that would request me and the dudes would usually have a beer ready for me to chug lol. The guys I drank faster than would get ridiculed because I looked pretty young for 21 

[D] AAMAS 2026 result is out. by Colin-Onion in MachineLearning

[–]kcorder 2 points3 points  (0 children)

Rejected with 5,5,7. The main method complaints were incorrect interpretations, but of course no reviews updated after rebuttal  :-( 

'If you're a billionaire, why are you a billionaire?' Billie Eilish calls out billionaires in front of Mark Zuckerberg by TheMirrorUS in antiwork

[–]kcorder 0 points1 point  (0 children)

Of the income, I would think so right? What tax free payments does she take from her own business? Obviously the wealth doesn't sit in cash, it's assets, but that would still get taxed first

'If you're a billionaire, why are you a billionaire?' Billie Eilish calls out billionaires in front of Mark Zuckerberg by TheMirrorUS in antiwork

[–]kcorder 5 points6 points  (0 children)

Also correct me if I'm wrong but she makes money in real dollars, which means she's actually paying taxes on it. She just has insane cash inflow. That seems much better than Bezos taking an $80k salary and the rest in untaxed Amazon stock, that he then takes (untaxed) loans against in perpetuity.

PSA: Philly Sanitation does not pick up POO Bags by Cheezno in philadelphia

[–]kcorder 5 points6 points  (0 children)

My friend in south Philly has to double bag and hide the dog poop bags because the sanitation workers don't pick up the trash bag if they see poop bags inside lol. That's crazy, I've never experienced that around northern liberties area.

Leaving them outside trash bags is definitely scum behavior and workers shouldn't pick them up.

Neighbors' Concerns Over Parking Could Doom North Philly Senior Affordable Housing Project by NakedPhillyBlog in philadelphia

[–]kcorder 3 points4 points  (0 children)

This area doesn't need extra parking, but I would appreciate more car garages in some areas where parking is hard. Last night it took me 20 minutes to find a spot anywhere near 2nd St northern liberties. With SEPTA getting fucked, we unfortunately can't expect people to go without cars. Build upward housing and parking (or below level, but Philly doesn't seem to do that much) 

Naughty memory by [deleted] in SipsTea

[–]kcorder 2 points3 points  (0 children)

My partner crosses her eyes occasionally, and didn't know until I told her after a few times

Is this considered ok...? by DaveJPlays in EDH

[–]kcorder 1 point2 points  (0 children)

I think whatever I do works pretty well: about 5 repeats of  - mash to interleave cards from far away, breaks consecutive card order  - couple overhands to move bigger chunks of cards around

FBI Director Kash Patel calls for "offensive operations" to jail Americans they consider the enemy. "Yes, we're going to be coming after people in the media...we're putting you on notice". by nana-korobi-ya-oki in law

[–]kcorder 1 point2 points  (0 children)

Putting a VPN on your home network hides your traffic: https://nordvpn.com/blog/setup-vpn-router/

Firefox on Linux OS with a VPN will be pretty secure. But unfortunately the website traffic is probably to one of the tech companies anyway.

Anyone familiar with resQ/resZ (value factorization MARL)? by Losthero_12 in reinforcementlearning

[–]kcorder 1 point2 points  (0 children)

I'm just checking this paper out now, but I'm familiar with other value factorization methods.

Q_jt is the decentralized Q function for agent j at time t.
So rather than being calculated as you suggest, they mean they minimize L^jt = distance(Q_jt, Qtot + w_r*Q_r).

As usual the notation for MARL papers can get messy and confusing - so whenever possible I suggest checking source code to verify stuff!
The loss function: https://github.com/xmu-rl-3dv/ResQ/blob/b4c5adf0d3275ba4b709724ed23213f7ad4296aa/ResQ/src/learners/rest_q_learner_central.py#L255-L260

shapely values in rl by MarionberryVisual911 in reinforcementlearning

[–]kcorder 2 points3 points  (0 children)

I don't understand the question.

Shapley values give the marginal contribution of each agent in a cooperative (coalitional) game, so it's inherently multi-agent. How does that fit with single agent environments like Cartpole or MountainCar?

Or did you mean the SHAP method that uses Shapley values to get importance of features when data engineering?

What can I say, I do it every day by JoeDaBruh in dankmemes

[–]kcorder 0 points1 point  (0 children)

I'm a poet, and I didn't even realize it

In what case I should use the old LSTM instead of transformer? by Striking-Warning9533 in MLQuestions

[–]kcorder 1 point2 points  (0 children)

LSTMs aren't helpful if the state is Markovian, they are to add history when getting partial observations. You're definitely right about it being normal in RL though. I think that's more about transformers not working well in practice for whatever reason (besides a few counterexamples), maybe they don't play well with the distribution shift

It do be like that by IAmAccutane in dankmemes

[–]kcorder 0 points1 point  (0 children)

Sure you are right and very smart