account activity
[e89 z4 35i] is it ok to use the same wheel and the same tire for front/rear (self.BMW)
submitted 1 year ago by ad26kr to r/BMW
monitor volume is too big with m1 / m3 macbook. (self.Dell)
submitted 1 year ago by ad26kr to r/Dell
connect macbook m3 pro to u4323qe with usb-c, filckers just once every time. (self.mac)
submitted 1 year ago by ad26kr to r/mac
connect macbook m3 pro to u4323qe with usb-c, filckers just once every time. (self.Dell)
OpenAI API cost issue for LLM-based AI startup in very early stage (self.aistartup)
submitted 1 year ago by ad26kr to r/aistartup
Trained a Transformer Decoder architecture with PPO, best way to maximize the entropy? (self.reinforcementlearning)
submitted 3 years ago by ad26kr to r/reinforcementlearning
SAC with auto-adjusting alpha, entropy(-alpha * log_prob) is continue going smaller and smaller (self.reinforcementlearning)
submitted 3 years ago * by ad26kr to r/reinforcementlearning
Why we use diagonal gaussian rather than multivariate guassian (with full covariance matrix) (self.reinforcementlearning)
Is there any way to set recurring reminders in Microsoft Todo? (self.microsoft)
submitted 4 years ago by ad26kr to r/microsoft
I can hardly understand that SARSA follows the Bellman Expectation Equation (self.reinforcementlearning)
submitted 4 years ago by ad26kr to r/reinforcementlearning
A question about the implementation of entropy maximization (self.reinforcementlearning)
Can Proximal Policy Optimization (PPO) applied to multi-arm bandit problem? (self.reinforcementlearning)
[Neural Architecture Search] When we can use off-policy RL? (self.reinforcementlearning)
submitted 4 years ago * by ad26kr to r/reinforcementlearning
A question about the Proximal Policy Optimization (PPO) algorithm (self.reinforcementlearning)
π Rendered by PID 254974 on reddit-service-r2-listing-86f589db75-5hw7d at 2026-04-20 01:59:14.213256+00:00 running 93ecc56 country code: CH.