| 
									
										
										
										
											2021-05-05 10:13:49 -07:00
										 |  |  | --- | 
					
						
							|  |  |  | id: 5e8f2f13c4cdbe86b5c72da4 | 
					
						
							| 
									
										
										
										
											2021-07-16 11:03:16 +05:30
										 |  |  | title: '使用 Q-Learning 進行強化學習:第 2 部分' | 
					
						
							| 
									
										
										
										
											2021-05-05 10:13:49 -07:00
										 |  |  | challengeType: 11 | 
					
						
							|  |  |  | videoId: DX7hJuaUZ7o | 
					
						
							| 
									
										
										
										
											2021-10-03 12:24:27 -07:00
										 |  |  | bilibiliIds: | 
					
						
							|  |  |  |   aid: 420570359 | 
					
						
							|  |  |  |   bvid: BV1G341127zr | 
					
						
							|  |  |  |   cid: 409139190 | 
					
						
							| 
									
										
										
										
											2021-05-05 10:13:49 -07:00
										 |  |  | dashedName: reinforcement-learning-with-q-learning-part-2 | 
					
						
							|  |  |  | --- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | # --question--
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## --text--
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-07-16 11:03:16 +05:30
										 |  |  | 如果智能體在採取隨機動作和使用學習動作之間沒有很好的平衡,會發生什麼? | 
					
						
							| 
									
										
										
										
											2021-05-05 10:13:49 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | ## --answers--
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-07-16 11:03:16 +05:30
										 |  |  | 智能體將始終嘗試將其對當前狀態/動作的獎勵最小化,從而導致局部最小值。 | 
					
						
							| 
									
										
										
										
											2021-05-05 10:13:49 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | --- | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-07-16 11:03:16 +05:30
										 |  |  | 智能體將始終嘗試將其對當前狀態/動作的獎勵最大化,從而導致局部最大值。 | 
					
						
							| 
									
										
										
										
											2021-05-05 10:13:49 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | ## --video-solution--
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 2 | 
					
						
							|  |  |  | 
 |