29 lines
		
	
	
		
			1.5 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			29 lines
		
	
	
		
			1.5 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
|   | --- | ||
|  | title: Its Generalization That Counts | ||
|  | --- | ||
|  | ## Its Generalization That Counts
 | ||
|  | 
 | ||
|  | The power of machine learning comes from not having to hard code or explicitly | ||
|  | define the parameters that describe your training data and unseen data. This is | ||
|  | the essential goal of machine learning: to generalize a learner's findings. | ||
|  | 
 | ||
|  | To test a learner's generalizability, you'll want to have a separate test data | ||
|  | set that is not used in any way in training the learner. This can be created by | ||
|  | either splitting your entire training data set into a training and test set, or | ||
|  | to just collect more data. If the learner were to use data found in the test | ||
|  | data set, this would create a sort of bias in your learner to do better than in | ||
|  | reality. | ||
|  | 
 | ||
|  | One method to get a sense on how your learner will do on a test data set is to | ||
|  | perform what is called **cross-validation**. This randomly splits up your | ||
|  | training data into a given number of subsets (for example, ten subsets) and | ||
|  | leaves one subset out while the learner trains on the rest. And then once the | ||
|  | learner has been trained, the left out data set is used for testing. This | ||
|  | training, leaving one subset out, and testing is repeated as you rotate through | ||
|  | the subsets. | ||
|  | 
 | ||
|  | #### More Information:
 | ||
|  | 
 | ||
|  | - <a href='https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf' target='_blank' rel='nofollow'>A Few Useful Things to Know about Machine Learning</a> | ||
|  | - <a href='https://stats.stackexchange.com/a/153058/132399' target='_blank' rel='nofollow'>"How do you use test data set after Cross-validation"</a> |