| 
									
										
										
										
											2020-04-21 11:19:42 -04:00
										 |  |  | --- | 
					
						
							|  |  |  | id: 5e8f2f13c4cdbe86b5c72da4 | 
					
						
							| 
									
										
										
										
											2020-04-24 05:52:42 -05:00
										 |  |  | title: 'Reinforcement Learning With Q-Learning: Part 2' | 
					
						
							| 
									
										
										
										
											2020-04-21 11:19:42 -04:00
										 |  |  | challengeType: 11 | 
					
						
							|  |  |  | videoId: DX7hJuaUZ7o | 
					
						
							| 
									
										
										
										
											2021-01-13 03:31:00 +01:00
										 |  |  | dashedName: reinforcement-learning-with-q-learning-part-2 | 
					
						
							| 
									
										
										
										
											2020-04-21 11:19:42 -04:00
										 |  |  | --- | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-11-27 19:02:05 +01:00
										 |  |  | # --question--
 | 
					
						
							| 
									
										
										
										
											2020-08-04 20:56:41 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-11-27 19:02:05 +01:00
										 |  |  | ## --text--
 | 
					
						
							| 
									
										
										
										
											2020-04-21 11:19:42 -04:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-11-27 19:02:05 +01:00
										 |  |  | What can happen if the agent does not have a good balance of taking random actions and using learned actions? | 
					
						
							| 
									
										
										
										
											2020-08-04 20:56:41 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-11-27 19:02:05 +01:00
										 |  |  | ## --answers--
 | 
					
						
							| 
									
										
										
										
											2020-04-21 11:19:42 -04:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-11-27 19:02:05 +01:00
										 |  |  | The agent will always try to minimize its reward for the current state/action, leading to local minima. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | --- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The agent will always try to maximize its reward for the current state/action, leading to local maxima. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## --video-solution--
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 2 | 
					
						
							| 
									
										
										
										
											2020-04-21 11:19:42 -04:00
										 |  |  | 
 |