| 
									
										
										
										
											2021-06-15 00:49:18 -07:00
										 |  |  | --- | 
					
						
							|  |  |  | id: 5e8f2f13c4cdbe86b5c72da5 | 
					
						
							| 
									
										
										
										
											2021-07-22 21:31:38 +05:30
										 |  |  | title: 'Aprendizagem de reforço com Q-Learning: Exemplo' | 
					
						
							| 
									
										
										
										
											2021-06-15 00:49:18 -07:00
										 |  |  | challengeType: 11 | 
					
						
							|  |  |  | videoId: RBBSNta234s | 
					
						
							| 
									
										
										
										
											2021-10-03 12:24:27 -07:00
										 |  |  | bilibiliIds: | 
					
						
							|  |  |  |   aid: 848073871 | 
					
						
							|  |  |  |   bvid: BV1uL4y187Eq | 
					
						
							|  |  |  |   cid: 409139471 | 
					
						
							| 
									
										
										
										
											2021-06-15 00:49:18 -07:00
										 |  |  | dashedName: reinforcement-learning-with-q-learning-example | 
					
						
							|  |  |  | --- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | # --question--
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## --text--
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-07-22 21:31:38 +05:30
										 |  |  | Preencha as lacunas para completar a seguinte equação de Q-Learn: | 
					
						
							| 
									
										
										
										
											2021-06-15 00:49:18 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | ```py | 
					
						
							|  |  |  | Q[__A__, __B__] = Q[__A__, __B__] + LEARNING_RATE * (reward + GAMMA * np.max(Q[__C__, :]) - Q[__A__, __B__]) | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## --answers--
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A: `state` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | B: `action` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | C: `next_state` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | --- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A: `state` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | B: `action` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | C: `prev_state` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | --- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A: `state` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | B: `reaction` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | C: `next_state` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## --video-solution--
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 1 | 
					
						
							|  |  |  | 
 |