Merge pull request #14 from quanvuong/master
Consistent initial type (float) for episode_rewards
This commit is contained in:
@@ -222,7 +222,7 @@ def learn(env,
|
|||||||
episode_rewards[-1] += rew
|
episode_rewards[-1] += rew
|
||||||
if done:
|
if done:
|
||||||
obs = env.reset()
|
obs = env.reset()
|
||||||
episode_rewards.append(0)
|
episode_rewards.append(0.0)
|
||||||
|
|
||||||
if t > learning_starts and t % train_freq == 0:
|
if t > learning_starts and t % train_freq == 0:
|
||||||
# Minimize the error in Bellman's equation on a batch sampled from replay buffer.
|
# Minimize the error in Bellman's equation on a batch sampled from replay buffer.
|
||||||
|
Reference in New Issue
Block a user