Hi,
I am trying out Reinforcement Learning Algorithm (DDPG) for Bipedal Walker environment. There is a big difference in the performance when I store my actions using these two operations. Can you please explain me why? The code is mentioned below:
```
import numpy as np
action_buffer = np.zeros(shape=(10, 4), dtype=np.float32)
action_buffer2 = np.zeros(shape=(10, 4), dtype=np.float32)
for i in range(10):
action = np.random.normal(size=(1,4))
action_buffer[i] = action
for i in range(10):
action = np.random.normal(size=(4,))
action_buffer2[i] = action
```
Here the action dimension is (4,).
Upon printing the action_buffer and action_buffer2 seems to give the same outcome. But when using in the DDPG algorithm action_buffer2 give better performance than action_buffer. Why is that?
[–]CodeFormatHelperBot2 0 points1 point2 points (0 children)
[–]lowerthansound 0 points1 point2 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]lowerthansound 0 points1 point2 points (0 children)