fix(guide): simplify directory structure

This commit is contained in:
Mrugesh Mohapatra
2018-10-16 21:26:13 +05:30
parent f989c28c52
commit da0df12ab7
35752 changed files with 0 additions and 317652 deletions

View File

@ -0,0 +1,23 @@
---
title: Correlation Does not Imply Causation
---
## Correlation Does not Imply Causation
<!-- The article goes here, in GitHub-flavored Markdown. Feel free to add YouTube videos, images, and CodePen/JSBin embeds -->
Many Fitness and Health related websites often miss this point about research that tends to happen in these fields. They report the scientific research as Causation other than what it really is, Correlation. For eg. researchers found that early risers have lower BMI and are found to less obese. This correlation can be misrepresented as 'Waking up early can reduce chances of Obesity'. We do not know that just waking up early 'caused' the outcome - lower obesity. What we have found here is Correlation.
Informal definition of Correlation goes as - when event A happens, event B also tends to happen and vice-versa. Or people that wake up early tend to be towards the lower end of the weight spectrum. Both events tend to happen together. But it is not necessary that one event caused the other.
Causality means that event A 'caused' or lead to the happening of event B. For eg. if I stand in the sun, I would get tanned. Here then second event occurs because of the first.
In statistics, there is a lot of talk about **correlated variables**. A correlation is a relationship between two variables. **Causation** refers to a relationship where a change in one variable **is responsible for** the change of another variable. This is also known as a **causal relationship**.
When there is a causal relationship between two variables, there is also a correlation between them. But, a correlation between two variables does not imply a causal relationship between them. This is a <a href='https://en.wikipedia.org/wiki/Formal_fallacy' target='_blank' rel='nofollow'>logical fallacy</a>.
This is because a correlation between two variables can be explained by many reasons:
- One variable influences the other. This _would_ be a causal relationship. For example, there is a correlation between household salary and number of cars owned.
- Both variables influence each other. This _would_ be a two-way causal relationship. For example, a correlation between education level and the wealth of a person.
- There is another variable that is influencing both variables under examination. This would _not_ be a causal relationship. For example, number of cars owned and size of the house may be correlated, but these two variables are influenced by another variable: salary. An increase in the number of cars owned does not influence the size of the house.
- Correlation could be a random accident. This would _not_ be a causal relationship. This is the explanation for the previous example of margarine consumption and the divorce rate in Maine.
In machine learning, correlation suffices for making a predictive model. However, just because two variables are correlated does not mean one variable influences the other. In other words, although machine learning may help find a relationship between two variables, it does not necessarily help find the reason for the relationship.

View File

@ -0,0 +1,33 @@
---
title: Data Alone Is not Enough
---
## Data Alone Is not Enough
Without accounting for changing machine learning algorithms or other aspects of
training the model, data alone is not enough to help your learner do better.
> Every learner must embody some knowledge or assumptions beyond the data it's
> given in order to generalize beyond it (Domingos, 2012).
What this statement is essentially saying is that if you blindly choose a
learner just because you've heard it does well, collecting more data won't
necessarily help you in your machine learning goals.
For example, say you have data which depends on time (e.g. time series data)
and you want to use a binary classifier (e.g. logistic regression). Collecting
more time series data might not be the best to help your learner. This is
because a binary classifier isn't designed for time series.
This is not to say that once you've chosen the best machine learning algorithm
based on your problem that adding more data does you no good. In this case, it
will help you.
> Machine learning is not magic; it can't get something from nothing. What it
> does is get more from less...Learners combine knowledge with data to grow
> programs (Domingos, 2012).
#### More Information:
- <a href='https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf' target='_blank' rel='nofollow'>A Few Useful Things to Know about Machine Learning</a>
- <a href='http://www.kdnuggets.com/2015/06/machine-learning-more-data-better-algorithms.html' target='_blank' rel='nofollow'>In Machine Learning, What is Better: More Data or better Algorithms?</a>
- <a href='https://www.quora.com/In-machine-learning-is-more-data-always-better-than-better-algorithms/answer/Xavier-Amatriain?srid=Tds3' target='_blank' rel='nofollow'>In machine learning, is more data always better than better algorithms?</a>

View File

@ -0,0 +1,15 @@
---
title: Feature Engineering Is the Key
---
## Feature Engineering Is the Key
This is a stub. <a href='https://github.com/freecodecamp/guides/tree/master/src/pages/machine-learning/principles/feature-engineering-is-the-key/index.md' target='_blank' rel='nofollow'>Help our community expand it</a>.
<a href='https://github.com/freecodecamp/guides/blob/master/README.md' target='_blank' rel='nofollow'>This quick style guide will help ensure your pull request gets accepted</a>.
<!-- The article goes here, in GitHub-flavored Markdown. Feel free to add YouTube videos, images, and CodePen/JSBin embeds -->
#### More Information:
<!-- Please add any articles you think might be helpful to read before writing the article -->

View File

@ -0,0 +1,15 @@
---
title: Principles
---
## Principles
This is a stub. <a href='https://github.com/freecodecamp/guides/tree/master/src/pages/machine-learning/principles/index.md' target='_blank' rel='nofollow'>Help our community expand it</a>.
<a href='https://github.com/freecodecamp/guides/blob/master/README.md' target='_blank' rel='nofollow'>This quick style guide will help ensure your pull request gets accepted</a>.
<!-- The article goes here, in GitHub-flavored Markdown. Feel free to add YouTube videos, images, and CodePen/JSBin embeds -->
#### More Information:
<!-- Please add any articles you think might be helpful to read before writing the article -->

View File

@ -0,0 +1,30 @@
---
title: Intuition Fails in High Dimensions
---
## Intuition Fails in High Dimensions
#### Imagine
A 2D plane with `X` and `Y` axis. On it you mark points `(1,0)` and `(0,1)`. And through them you draw a straight line. Even without looking at the image below, one can get an idea about how the graph would look like.
![X-Y plane with your imaginary line](https://ka-perseus-graphie.s3.amazonaws.com/466568bad0126c402380ff2ea57aad004f36172b.svg)
Now let's imagine a 3D plane with `X`, `Y` and `Z` axis. Through this 3D structure, a plane passes that intersects the `X`axis at `(2, 0, 0)`, `Y` axis at `(0, 3, 0)` and `Z` Axis at `(0, 0, 6)`. Such a plane is tough to imagine in our heads, but if we try we would end up with something that looks like this.
![X-Y-Z with our plane](http://tutorial.math.lamar.edu/Classes/CalcIII/SurfaceArea_files/image001.gif)
With that, we get to step into the next higher dimesion. Planes in dimensions higher than `3` are refered to as hyperplanes. But first let us address just where does the fourth axis even point to. Let's call it `W` axis. And similar to previous cases of new Axis creation, this `W` axis would be perpendicular to pre-existing axes (`X`, `Y` and `Z`). Just like `Z` was perpendicular to `X` and `Y` axes.
![X-Y-Z-W Axes](http://eusebeia.dyndns.org/4d/vis/4d-axes.png)
> It is important to understand that the W-axis as depicted here is perpendicular to all of the other coordinate axes. We may be tempted to try to point in the direction of W, but this is impossible because we are confined to 3-dimensional space.
Because we live in a world that is 3-dimensional, its difficult for us to comprehend the world that has dimensions higher than 3. This is the reason for our intuition and imagination to be of limited help in higher dimensions.
<!-- The article goes here, in GitHub-flavored Markdown. Feel free to add YouTube videos, images, and CodePen/JSBin embeds -->
#### More Information:
* <a href="http://eusebeia.dyndns.org/4d/vis/01-intro">4D Visualization and Why It Matters</a>
<!-- Please add any articles you think might be helpful to read before writing the article -->

View File

@ -0,0 +1,28 @@
---
title: Its Generalization That Counts
---
## Its Generalization That Counts
The power of machine learning comes from not having to hard code or explicitly
define the parameters that describe your training data and unseen data. This is
the essential goal of machine learning: to generalize a learner's findings.
To test a learner's generalizability, you'll want to have a separate test data
set that is not used in any way in training the learner. This can be created by
either splitting your entire training data set into a training and test set, or
to just collect more data. If the learner were to use data found in the test
data set, this would create a sort of bias in your learner to do better than in
reality.
One method to get a sense on how your learner will do on a test data set is to
perform what is called **cross-validation**. This randomly splits up your
training data into a given number of subsets (for example, ten subsets) and
leaves one subset out while the learner trains on the rest. And then once the
learner has been trained, the left out data set is used for testing. This
training, leaving one subset out, and testing is repeated as you rotate through
the subsets.
#### More Information:
- <a href='https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf' target='_blank' rel='nofollow'>A Few Useful Things to Know about Machine Learning</a>
- <a href='https://stats.stackexchange.com/a/153058/132399' target='_blank' rel='nofollow'>"How do you use test data set after Cross-validation"</a>

View File

@ -0,0 +1,15 @@
---
title: Learn Many Models not Just One
---
## Learn Many Models not Just One
This is a stub. <a href='https://github.com/freecodecamp/guides/tree/master/src/pages/machine-learning/principles/learn-many-models-not-just-one/index.md' target='_blank' rel='nofollow'>Help our community expand it</a>.
<a href='https://github.com/freecodecamp/guides/blob/master/README.md' target='_blank' rel='nofollow'>This quick style guide will help ensure your pull request gets accepted</a>.
<!-- The article goes here, in GitHub-flavored Markdown. Feel free to add YouTube videos, images, and CodePen/JSBin embeds -->
#### More Information:
<!-- Please add any articles you think might be helpful to read before writing the article -->

View File

@ -0,0 +1,31 @@
---
title: Learning Equals Representation Evaluation Optimization
---
## Learning Equals Representation Evaluation Optimization
The field of machine learning has exploded in recent years and researchers have
developed an enormous number of algorithms to choose from. Despite this great
variety of models to choose from, they can all be distilled into three
components.
The three components that make a machine learning model are representation,
evaluation, and optimization. These three are most directly related to
supervised learning, but it can be related to unsupervised learning as well.
**Representation** - this describes how you want to look at your data.
Sometimes you may want to think of your data in terms of individuals (like in
k-nearest neighbors) or like in a graph (like in Bayesian networks).
**Evaluation** - for supervised learning purposes, you'll need to evaluate or
put a score on how well your learner is doing so it can improve. This
evaluation is done using an evaulation function (also known as an *objective
function* or *scoring function*). Examples include accuracy and squared error.
**Optimization** - using the evaluation function from above, you need to find
the learner with the best score from this evaluation function using a choice of
optimization technique. Examples are a greedy search and gradient descent.
#### More Information:
<!-- Please add any articles you think might be helpful to read before writing the article -->
- <a href='https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf' target='_blank' rel='nofollow'>A Few Useful Things to Know about Machine Learning</a>

View File

@ -0,0 +1,15 @@
---
title: More Data Beats a Cleverer Algorithm
---
## More Data Beats a Cleverer Algorithm
This is a stub. <a href='https://github.com/freecodecamp/guides/tree/master/src/pages/machine-learning/principles/more-data-beats-a-cleverer-algorithm/index.md' target='_blank' rel='nofollow'>Help our community expand it</a>.
<a href='https://github.com/freecodecamp/guides/blob/master/README.md' target='_blank' rel='nofollow'>This quick style guide will help ensure your pull request gets accepted</a>.
<!-- The article goes here, in GitHub-flavored Markdown. Feel free to add YouTube videos, images, and CodePen/JSBin embeds -->
#### More Information:
<!-- Please add any articles you think might be helpful to read before writing the article -->

View File

@ -0,0 +1,39 @@
---
title: Overfitting Has Many Faces
---
## Overfitting Has Many Faces
If a learning algorithm fits a given training set well, this does not simply indicate a good hypothesis. Overfitting occurs when the hypothesis function J(Θ) fits your training set too closely having a high variance and low error on the training set while having a high test error on any other data.
In other words, overfitting occrus if the error of the hypothesis as measured on the data set that was used to train the parameters happens to be lower than the error on any other data set.
### Choosing an Optimal Polynomial Degree
Choosing the right degree of polynomial for the hypothesis function is important in avoiding overfitting. This can be achieved by testing each degree of polynomial and observing the effect on the error result over various parts of the data set. Hence, we can break down our data set into 3 parts that can be used in optimizing the hypothesis' theta and polynomial degree.
A good break-down ratio of the data set is:
- Training set: 60%
- Cross validation: 20%
- Test set: 20%
The three error values can thus be calculatted by the following method:<sup>1</sup>
1. Use the training set for each polynomial degree in order to optimize the parameters in `Θ`
2. Use the cross validation set to find the polynomial degree with the lowest error
3. Use the test set to estimate the generalization error
### Ways to Fix Overfitting
These are some of the ways to address overfitting:
1. Getting more training examples
2. Trying a smaller set of features
3. Increasing the parameter `λ lambda`
#### More Information:
<!-- Please add any articles you think might be helpful to read before writing the article -->
[Coursera's Machine Learning Course](https://www.coursera.org/learn/machine-learning)
### Sources
1. [Ng, Andrew. "Machine Learning". *Coursera* Accessed January 29, 2018](https://www.coursera.org/learn/machine-learning)

View File

@ -0,0 +1,15 @@
---
title: Representable Does not Imply Learnable
---
## Representable Does not Imply Learnable
This is a stub. <a href='https://github.com/freecodecamp/guides/tree/master/src/pages/machine-learning/principles/representable-does-not-imply-learnable/index.md' target='_blank' rel='nofollow'>Help our community expand it</a>.
<a href='https://github.com/freecodecamp/guides/blob/master/README.md' target='_blank' rel='nofollow'>This quick style guide will help ensure your pull request gets accepted</a>.
<!-- The article goes here, in GitHub-flavored Markdown. Feel free to add YouTube videos, images, and CodePen/JSBin embeds -->
#### More Information:
<!-- Please add any articles you think might be helpful to read before writing the article -->

View File

@ -0,0 +1,15 @@
---
title: Simplicity Does not Imply Accuracy
---
## Simplicity Does not Imply Accuracy
This is a stub. <a href='https://github.com/freecodecamp/guides/tree/master/src/pages/machine-learning/principles/simplicity-does-not-imply-accuracy/index.md' target='_blank' rel='nofollow'>Help our community expand it</a>.
<a href='https://github.com/freecodecamp/guides/blob/master/README.md' target='_blank' rel='nofollow'>This quick style guide will help ensure your pull request gets accepted</a>.
<!-- The article goes here, in GitHub-flavored Markdown. Feel free to add YouTube videos, images, and CodePen/JSBin embeds -->
#### More Information:
<!-- Please add any articles you think might be helpful to read before writing the article -->

View File

@ -0,0 +1,15 @@
---
title: Theoretical Guarantees Are not What They Seem
---
## Theoretical Guarantees Are not What They Seem
This is a stub. <a href='https://github.com/freecodecamp/guides/tree/master/src/pages/machine-learning/principles/theoretical-guarantees-are-not-what-they-seem/index.md' target='_blank' rel='nofollow'>Help our community expand it</a>.
<a href='https://github.com/freecodecamp/guides/blob/master/README.md' target='_blank' rel='nofollow'>This quick style guide will help ensure your pull request gets accepted</a>.
<!-- The article goes here, in GitHub-flavored Markdown. Feel free to add YouTube videos, images, and CodePen/JSBin embeds -->
#### More Information:
<!-- Please add any articles you think might be helpful to read before writing the article -->