2020-08-13 12:00:20 +02:00
|
|
|
---
|
|
|
|
id: 5e8f2f13c4cdbe86b5c72da4
|
2021-07-15 13:04:11 +05:30
|
|
|
title: '使用 Q-Learning 进行强化学习:第 2 部分'
|
2020-08-13 12:00:20 +02:00
|
|
|
challengeType: 11
|
|
|
|
videoId: DX7hJuaUZ7o
|
2021-10-03 12:24:27 -07:00
|
|
|
bilibiliIds:
|
|
|
|
aid: 420570359
|
|
|
|
bvid: BV1G341127zr
|
|
|
|
cid: 409139190
|
2021-01-13 03:31:00 +01:00
|
|
|
dashedName: reinforcement-learning-with-q-learning-part-2
|
2020-08-13 12:00:20 +02:00
|
|
|
---
|
|
|
|
|
2020-12-16 00:37:30 -07:00
|
|
|
# --question--
|
2020-08-13 12:00:20 +02:00
|
|
|
|
2020-12-16 00:37:30 -07:00
|
|
|
## --text--
|
2020-08-13 12:00:20 +02:00
|
|
|
|
2021-07-15 13:04:11 +05:30
|
|
|
如果智能体在采取随机动作和使用学习动作之间没有很好的平衡,会发生什么?
|
2020-08-13 12:00:20 +02:00
|
|
|
|
2020-12-16 00:37:30 -07:00
|
|
|
## --answers--
|
2020-08-13 12:00:20 +02:00
|
|
|
|
2021-07-15 13:04:11 +05:30
|
|
|
智能体将始终尝试将其对当前状态/动作的奖励最小化,从而导致局部最小值。
|
2020-12-16 00:37:30 -07:00
|
|
|
|
|
|
|
---
|
|
|
|
|
2021-07-15 13:04:11 +05:30
|
|
|
智能体将始终尝试将其对当前状态/动作的奖励最大化,从而导致局部最大值。
|
2020-12-16 00:37:30 -07:00
|
|
|
|
|
|
|
## --video-solution--
|
|
|
|
|
|
|
|
2
|
|
|
|
|