Effectively apply weights from the replay buffer

It seems that the weights retrieved from the replay buffer are not applied when training the model. Is there any reason for that or am I missing something?

In any case, I have added a parameter in order for them to be used; just in case it is useful.
This commit is contained in:
Fernando Arbeiza
2017-07-11 11:09:51 +02:00
committed by GitHub
parent 0778e9f10f
commit d76cd1297a

View File

@@ -89,6 +89,7 @@ def learn(env,
gamma=1.0,
target_network_update_freq=500,
prioritized_replay=False,
prioritized_importance_sampling=False,
prioritized_replay_alpha=0.6,
prioritized_replay_beta0=0.4,
prioritized_replay_beta_iters=None,
@@ -232,7 +233,10 @@ def learn(env,
else:
obses_t, actions, rewards, obses_tp1, dones = replay_buffer.sample(batch_size)
weights, batch_idxes = np.ones_like(rewards), None
td_errors = train(obses_t, actions, rewards, obses_tp1, dones, np.ones_like(rewards))
if prioritized_importance_sampling:
td_errors = train(obses_t, actions, rewards, obses_tp1, dones, weights)
else:
td_errors = train(obses_t, actions, rewards, obses_tp1, dones, np.ones_like(rewards))
if prioritized_replay:
new_priorities = np.abs(td_errors) + prioritized_replay_eps
replay_buffer.update_priorities(batch_idxes, new_priorities)