Consistent initial type (float) for episode_rewards

This commit is contained in:
Quan Vuong
2017-05-30 11:49:25 +08:00
committed by GitHub
parent fc2bbed4da
commit 86054f7a98

View File

@@ -222,7 +222,7 @@ def learn(env,
episode_rewards[-1] += rew
if done:
obs = env.reset()
episode_rewards.append(0)
episode_rewards.append(0.0)
if t > learning_starts and t % train_freq == 0:
# Minimize the error in Bellman's equation on a batch sampled from replay buffer.