| 
									
										
										
										
											2021-05-05 10:13:49 -07:00
										 |  |  | --- | 
					
						
							|  |  |  | id: 5e8f2f13c4cdbe86b5c72da5 | 
					
						
							| 
									
										
										
										
											2021-07-16 11:03:16 +05:30
										 |  |  | title: '使用 Q-Learning 進行強化學習:示例' | 
					
						
							| 
									
										
										
										
											2021-05-05 10:13:49 -07:00
										 |  |  | challengeType: 11 | 
					
						
							|  |  |  | videoId: RBBSNta234s | 
					
						
							| 
									
										
										
										
											2021-10-03 12:24:27 -07:00
										 |  |  | bilibiliIds: | 
					
						
							|  |  |  |   aid: 848073871 | 
					
						
							|  |  |  |   bvid: BV1uL4y187Eq | 
					
						
							|  |  |  |   cid: 409139471 | 
					
						
							| 
									
										
										
										
											2021-05-05 10:13:49 -07:00
										 |  |  | dashedName: reinforcement-learning-with-q-learning-example | 
					
						
							|  |  |  | --- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | # --question--
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## --text--
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-07-16 11:03:16 +05:30
										 |  |  | 填空以完成以下 Q-Learning 方程: | 
					
						
							| 
									
										
										
										
											2021-05-05 10:13:49 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | ```py | 
					
						
							|  |  |  | Q[__A__, __B__] = Q[__A__, __B__] + LEARNING_RATE * (reward + GAMMA * np.max(Q[__C__, :]) - Q[__A__, __B__]) | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## --answers--
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A: `state` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | B: `action` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | C: `next_state` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | --- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A: `state` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | B: `action` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | C: `prev_state` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | --- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A: `state` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | B: `reaction` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | C: `next_state` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## --video-solution--
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 1 | 
					
						
							|  |  |  | 
 |