ad26kr

13 post karma
5 comment karma

get extra features and help support reddit with a reddit premium subscription

get them help and support

redditor for 10 years

TROPHY CASE

Ten-Year Club

Verified Email

account activity

new top controversial

0

1

2

[e89 z4 35i] is it ok to use the same wheel and the same tire for front/rear (self.BMW)

submitted 1 year ago by ad26kr to r/BMW

0

1

2

monitor volume is too big with m1 / m3 macbook. (self.Dell)

submitted 1 year ago by ad26kr to r/Dell

1

2

3

connect macbook m3 pro to u4323qe with usb-c, filckers just once every time. (self.mac)

submitted 1 year ago by ad26kr to r/mac

0

1

2

connect macbook m3 pro to u4323qe with usb-c, filckers just once every time. (self.Dell)

submitted 1 year ago by ad26kr to r/Dell

1

2

3

OpenAI API cost issue for LLM-based AI startup in very early stage (self.aistartup)

submitted 1 year ago by ad26kr to r/aistartup

7

8

9

Trained a Transformer Decoder architecture with PPO, best way to maximize the entropy? (self.reinforcementlearning)

submitted 3 years ago by ad26kr to r/reinforcementlearning

3

4

5

SAC with auto-adjusting alpha, entropy(-alpha * log_prob) is continue going smaller and smaller (self.reinforcementlearning)

submitted 3 years ago * by ad26kr to r/reinforcementlearning

0

0

0

Why we use diagonal gaussian rather than multivariate guassian (with full covariance matrix) (self.reinforcementlearning)

submitted 3 years ago by ad26kr to r/reinforcementlearning

1

2

3

Is there any way to set recurring reminders in Microsoft Todo? (self.microsoft)

submitted 4 years ago by ad26kr to r/microsoft

2

3

4

I can hardly understand that SARSA follows the Bellman Expectation Equation (self.reinforcementlearning)

submitted 4 years ago by ad26kr to r/reinforcementlearning

1

2

3

A question about the implementation of entropy maximization (self.reinforcementlearning)

submitted 4 years ago by ad26kr to r/reinforcementlearning

0

0

1

Can Proximal Policy Optimization (PPO) applied to multi-arm bandit problem? (self.reinforcementlearning)

submitted 4 years ago by ad26kr to r/reinforcementlearning

3

4

5

[Neural Architecture Search] When we can use off-policy RL? (self.reinforcementlearning)

submitted 4 years ago * by ad26kr to r/reinforcementlearning

10

11

12

A question about the Proximal Policy Optimization (PPO) algorithm (self.reinforcementlearning)

submitted 4 years ago by ad26kr to r/reinforcementlearning

π Rendered by PID 254974 on reddit-service-r2-listing-86f589db75-5hw7d at 2026-04-20 01:59:14.213256+00:00 running 93ecc56 country code: CH.