From 165c62257216fad44aa13fa924e2b224703d234d Mon Sep 17 00:00:00 2001
From: AurelianTactics <jamesdilorenzo@gmail.com>
Date: Tue, 30 Oct 2018 13:13:39 -0400
Subject: [PATCH] DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError
 (#680)

* DDPG has unused 'seed' argument

DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for:

```
from baselines.common import set_global_seeds
...
def learn(...):
...
   set_global_seeds(seed)
```

DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.

* DDPG: duplicate variable assignment

variable nb_actions assigned same value twice in space of 10 lines
nb_actions = env.action_space.shape[-1]

* DDPG: noise_type 'normal_x' and 'ou_x' cause assert

noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2' cause an assert message and DDPG not to run. Issue is noise following block:
'''
        if self.action_noise is not None and apply_noise:
            noise = self.action_noise()
            assert noise.shape == action.shape
            action += noise
'''

noise is not nested: [number_of_actions]
actions is nested: [[number_of_actions]]
Can either nest noise or unnest actions

* Revert "DDPG: noise_type 'normal_x' and 'ou_x' cause assert"

* DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError

noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2') cause an assert message and DDPG not to run. Issue is the following block:
'''
        if self.action_noise is not None and apply_noise:
            noise = self.action_noise()
            assert noise.shape == action.shape
            action += noise
'''

noise is not nested: [number_of_actions]
action is nested: [[number_of_actions]]
Hence the shapes do not pass the assert line even though the action += noise line is correct
---
 baselines/ddpg/ddpg.py         | 1 -
 baselines/ddpg/ddpg_learner.py | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/baselines/ddpg/ddpg.py b/baselines/ddpg/ddpg.py
index 8b8659b..307205f 100755
--- a/baselines/ddpg/ddpg.py
+++ b/baselines/ddpg/ddpg.py
@@ -59,7 +59,6 @@ def learn(network, env,
 
     action_noise = None
     param_noise = None
-    nb_actions = env.action_space.shape[-1]
     if noise_type is not None:
         for current_noise_type in noise_type.split(','):
             current_noise_type = current_noise_type.strip()
diff --git a/baselines/ddpg/ddpg_learner.py b/baselines/ddpg/ddpg_learner.py
index 44f231f..5b3b5ea 100755
--- a/baselines/ddpg/ddpg_learner.py
+++ b/baselines/ddpg/ddpg_learner.py
@@ -268,7 +268,7 @@ class DDPG(object):
 
         if self.action_noise is not None and apply_noise:
             noise = self.action_noise()
-            assert noise.shape == action.shape
+            assert noise.shape == action[0].shape
             action += noise
         action = np.clip(action, self.action_range[0], self.action_range[1])