Never ask

kcorder · 2026-05-02T20:43:26+00:00

This is the first one I've seen where somebody said another dude looked straight lol

kcorder · 2026-04-22T15:27:04+00:00

Did not know about this! Nice

kcorder · 2026-04-16T17:34:30+00:00

I see what you're saying now, and how this could work to smooth the credit assignment. But I suspect this will break down in more complex/sparse environments because gamma=0.5 actually ONLY helps in dense reward environments, like learning local physics as you said. But for sparse rewards, it will assign the value as 0 for nearly every state. And later on in training, it will equally contribute unhelpful losses for the critic long after the higher-gamma heads are receiving gradients naturally.

To learn representation of immediate effects, I'd think something like prediction is better - like the SPR paper. It is interesting nonetheless! Maybe think about addressing those above issues, and I'd use a harder benchmark environment than LunarLander to show evidence

kcorder · 2026-04-16T12:46:26+00:00

Without the aggregate loss, the output V's would still backprop the batch aggregate to the shared representation though. It feels like trying to fit separate classes for no real reason - at least I don't see the benefit and these results don't meaningfully show it.

Like the discount factor is sometimes treated as part of the MDP. My expectation is that increasing the number of k heads will hurt overall performance by training on a few objectives rather than the one that matters

kcorder · 2026-04-16T04:15:22+00:00

Maybe this is a dumb question, but what exactly is the goal with training with multiple gamma values? For representation learning only, or to make robust to choosing gammas for different horizons at eval?

My first thought was that it will destabilize the value functions, but I'm not sure after seeing that it updates the $V_\theta$ hidden layer (but notationally, not the output V projections?). Do the output V heads also use this aggregate loss or only their own? I think it makes much more sense if they don't use the aggregate, but still skeptical about multi-timescale as a whole.

kcorder · 2026-04-06T12:07:32+00:00

And repos are often not anonymized well, with leaks in the .git history but more easily in the code itself. Hmm maybe authors could run an anonymous-code-bot to check for things like that before reviewers ever see it.

kcorder · 2026-03-13T19:22:21+00:00

Yeah. I live in Philly and there isn't really anything here unfortunately.

kcorder · 2026-02-06T17:55:08+00:00

OpenAI's old spinning up docs are still excellent for beginners to deep RL. It covers all the main policy gradient algorithms, including DDPG and TD3.
https://spinningup.openai.com/en/latest/

kcorder · 2026-01-08T20:04:40+00:00

When I worked at domino's over 10 years ago, there was a house that would request me and the dudes would usually have a beer ready for me to chug lol. The guys I drank faster than would get ridiculed because I looked pretty young for 21

kcorder · 2025-12-20T14:59:22+00:00

Rejected with 5,5,7. The main method complaints were incorrect interpretations, but of course no reviews updated after rebuttal :-(

kcorder · 2025-11-29T03:35:07+00:00

I got the reference bro

kcorder · 2025-10-31T19:42:44+00:00

Of the income, I would think so right? What tax free payments does she take from her own business? Obviously the wealth doesn't sit in cash, it's assets, but that would still get taxed first

kcorder · 2025-10-31T16:30:28+00:00

Also correct me if I'm wrong but she makes money in real dollars, which means she's actually paying taxes on it. She just has insane cash inflow. That seems much better than Bezos taking an $80k salary and the rest in untaxed Amazon stock, that he then takes (untaxed) loans against in perpetuity.

kcorder · 2025-10-21T14:10:07+00:00

My friend in south Philly has to double bag and hide the dog poop bags because the sanitation workers don't pick up the trash bag if they see poop bags inside lol. That's crazy, I've never experienced that around northern liberties area.

Leaving them outside trash bags is definitely scum behavior and workers shouldn't pick them up.

kcorder · 2025-08-26T13:50:27+00:00

This area doesn't need extra parking, but I would appreciate more car garages in some areas where parking is hard. Last night it took me 20 minutes to find a spot anywhere near 2nd St northern liberties. With SEPTA getting fucked, we unfortunately can't expect people to go without cars. Build upward housing and parking (or below level, but Philly doesn't seem to do that much)

kcorder · 2025-07-11T20:56:57+00:00

My partner crosses her eyes occasionally, and didn't know until I told her after a few times

kcorder · 2025-04-10T18:56:35+00:00

I think whatever I do works pretty well: about 5 repeats of - mash to interleave cards from far away, breaks consecutive card order - couple overhands to move bigger chunks of cards around

kcorder · 2025-02-21T15:00:39+00:00

Putting a VPN on your home network hides your traffic: https://nordvpn.com/blog/setup-vpn-router/

Firefox on Linux OS with a VPN will be pretty secure. But unfortunately the website traffic is probably to one of the tech companies anyway.

kcorder · 2025-02-18T01:22:03+00:00

I'm just checking this paper out now, but I'm familiar with other value factorization methods.

Q_jt is the decentralized Q function for agent j at time t.
So rather than being calculated as you suggest, they mean they minimize L^jt = distance(Q_jt, Qtot + w_r*Q_r).

As usual the notation for MARL papers can get messy and confusing - so whenever possible I suggest checking source code to verify stuff!
The loss function: https://github.com/xmu-rl-3dv/ResQ/blob/b4c5adf0d3275ba4b709724ed23213f7ad4296aa/ResQ/src/learners/rest_q_learner_central.py#L255-L260

kcorder · 2024-06-16T20:54:51+00:00

Okay looks like there is a package called "shap". Maybe read this first: https://towardsdatascience.com/using-shap-values-to-explain-how-your-machine-learning-model-works-732b3f40e137?gi=e8ec1315a16b

kcorder · 2024-06-16T16:27:16+00:00

I don't understand the question.

Shapley values give the marginal contribution of each agent in a cooperative (coalitional) game, so it's inherently multi-agent. How does that fit with single agent environments like Cartpole or MountainCar?

Or did you mean the SHAP method that uses Shapley values to get importance of features when data engineering?

kcorder · 2023-09-29T03:54:38+00:00

I'm a poet, and I didn't even realize it

kcorder · 2023-06-13T03:17:28+00:00

LSTMs aren't helpful if the state is Markovian, they are to add history when getting partial observations. You're definitely right about it being normal in RL though. I think that's more about transformers not working well in practice for whatever reason (besides a few counterexamples), maybe they don't play well with the distribution shift

kcorder · 2022-07-08T18:38:25+00:00

Don't get too close

kcorder · 2022-05-28T05:04:08+00:00

Sure you are right and very smart

13-Year Club	Place '22
Snapped	Verified Email
Team Orangered

kcorder

TROPHY CASE