Files
freeCodeCamp/guide/english/data-science-tools/pandas/pandas-data-visualization/index.md
Harikrishnan 4d6470c7c0 detailed article on pandas data visualization. (#34184)
* detailed article on pandas data visualization.

* Made updates as per the guide rules.

Proof read it and made the review changes.
2019-02-19 14:34:24 -05:00

5.2 KiB

title
title
pandas Data Visualization

Data Visualization in pandas

Data visualization is one of the technique which provides an option to understand the data and also gives important insight about any data. Previous sections explained on how you can work with data in pandas. In this section you will look at some of the method which helps you to visualize those data. Even though there are specific libraries available for visualization like matplotlib, seaborn etc. pandas visualization is a quick way of glancing the data. In fact pandas visualization is built on top of matplotlib as well.

Typically pandas visualization is used for basic plotting like line,histogram,scatter. It is recommended to use the other specialized tools for more detailed and customized visualizations. The advantage with pandas visualization is that we can plot straight away from the DataFrame or the series. It is syntactically very easy.

As like in the previous pandas-operation tutorial you are not going to create your own data set. You are going to use the very famous iris data-set. The reason to use this data set is that it is real data set and you can see how visualization helps to understand any data-set. You are going to load it from seaborn(internet required).

#loading the data set from seaborn
import seaborn as sns
iris = sns.load_dataset('iris')

This data set is about 3 flowers. The features you have are the length and width of both the petal and sepal of the flowers. Lets look at some records.

print(iris.head())

Now, You can use the info() method which you have used in the previous section to get some understanding about the data set.

iris.info()

It says that you have 150 records, 5 columns in that 4 of them are floats and 1 as object and it does not have any null values.

import pandas as pd 
import matplotlib.pyplot as plt                 # using pandas DV with matplotlib provides extra grapical effects.  

plot: In pandas all the different plots can be accessed from a single method called plot. In the below example we will plot a line using the line() function in plot.

iris.plot.line()

As mentioned above using pandas for visualization is syntactical very handy. DataFrame/series.plot.<type of the plot>

#plotting based on series:

iris['sepal_length'].plot.line()

Pandas take all the matplotlib argument as part of keyword arguments. You can change the size of the figure using the figsize argument in matplotlib.

iris.plot.line(figsize = (12,4))

hist() : To plot a histograms you can use the hist function. It takes the regular argument bin size as well.

iris.plot.hist()
iris.plot.hist(bins=20,figsize=(8,4),alpha = .7)     #alpha argument is used for transparency

scatter: scatter is another type of plot which can provides relation between different features. It takes 2 important argument x,y as column names.

iris.plot.scatter(x='sepal_length',y='petal_length')

scatter takes a 3rd variable in c argument and plots the color ratio as per the value.

iris.plot.scatter(x='sepal_length',y='sepal_width',c='petal_length')

scatter can take the 3rd variable in the form of size as well but it must be a series not a column name. Example below:

iris.plot.scatter(x='sepal_length',y='sepal_width',s=iris['petal_length']*15)

plot() : plot is a pandas method which allows all the above plots to be plot from a single place. It has a argument kind which takes the type of the plot like line/hist/scatter etc.

iris.plot(kind='line',figsize=(12,4))

More Information:

plot.line

plot.hist

plot.scatter

pandas plot