29 lines
		
	
	
		
			1.5 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			29 lines
		
	
	
		
			1.5 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
---
 | 
						|
title: Its Generalization That Counts
 | 
						|
---
 | 
						|
## Its Generalization That Counts
 | 
						|
 | 
						|
The power of machine learning comes from not having to hard code or explicitly
 | 
						|
define the parameters that describe your training data and unseen data. This is
 | 
						|
the essential goal of machine learning: to generalize a learner's findings.
 | 
						|
 | 
						|
To test a learner's generalizability, you'll want to have a separate test data
 | 
						|
set that is not used in any way in training the learner. This can be created by
 | 
						|
either splitting your entire training data set into a training and test set, or
 | 
						|
to just collect more data. If the learner were to use data found in the test
 | 
						|
data set, this would create a sort of bias in your learner to do better than in
 | 
						|
reality.
 | 
						|
 | 
						|
One method to get a sense on how your learner will do on a test data set is to
 | 
						|
perform what is called **cross-validation**. This randomly splits up your
 | 
						|
training data into a given number of subsets (for example, ten subsets) and
 | 
						|
leaves one subset out while the learner trains on the rest. And then once the
 | 
						|
learner has been trained, the left out data set is used for testing. This
 | 
						|
training, leaving one subset out, and testing is repeated as you rotate through
 | 
						|
the subsets.
 | 
						|
 | 
						|
#### More Information:
 | 
						|
 | 
						|
- <a href='https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf' target='_blank' rel='nofollow'>A Few Useful Things to Know about Machine Learning</a>
 | 
						|
- <a href='https://stats.stackexchange.com/a/153058/132399' target='_blank' rel='nofollow'>"How do you use test data set after Cross-validation"</a>
 |