* detailed article on pandas data visualization. * Made updates as per the guide rules. Proof read it and made the review changes.
5.2 KiB
title
title |
---|
pandas Data Visualization |
Data Visualization in pandas
Data visualization is one of the technique which provides an option to understand the data and also gives important insight about any data. Previous sections explained on how you can work with data in pandas. In this section you will look at some of the method which helps you to visualize those data. Even though there are specific libraries available for visualization like matplotlib
, seaborn
etc. pandas visualization is a quick way of glancing the data. In fact pandas visualization is built on top of matplotlib
as well.
Typically pandas visualization is used for basic plotting like line,histogram,scatter. It is recommended to use the other specialized tools for more detailed and customized visualizations. The advantage with pandas visualization is that we can plot straight away from the DataFrame or the series. It is syntactically very easy.
As like in the previous pandas-operation
tutorial you are not going to create your own data set. You are going to use the very famous iris data-set. The reason to use this data set is that it is real data set and you can see how visualization helps to understand any data-set. You are going to load it from seaborn(internet required).
#loading the data set from seaborn
import seaborn as sns
iris = sns.load_dataset('iris')
This data set is about 3 flowers. The features you have are the length and width of both the petal and sepal of the flowers. Lets look at some records.
print(iris.head())
Now, You can use the info() method which you have used in the previous section to get some understanding about the data set.
iris.info()
It says that you have 150 records, 5 columns in that 4 of them are floats and 1 as object and it does not have any null values.
import pandas as pd
import matplotlib.pyplot as plt # using pandas DV with matplotlib provides extra grapical effects.
plot
: In pandas all the different plots can be accessed from a single method called plot. In the below example we will plot a line using the line()
function in plot
.
iris.plot.line()

As mentioned above using pandas for visualization is syntactical very handy. DataFrame/series.plot.<type of the plot>
#plotting based on series:
iris['sepal_length'].plot.line()

Pandas take all the matplotlib argument as part of keyword arguments. You can change the size of the figure using the figsize
argument in matplotlib.
iris.plot.line(figsize = (12,4))

hist()
: To plot a histograms you can use the hist function. It takes the regular argument bin size as well.
iris.plot.hist()

iris.plot.hist(bins=20,figsize=(8,4),alpha = .7) #alpha argument is used for transparency

scatter
: scatter is another type of plot which can provides relation between different features. It takes 2 important argument x,y as column names.
iris.plot.scatter(x='sepal_length',y='petal_length')

scatter
takes a 3rd variable in c argument and plots the color ratio as per the value.
iris.plot.scatter(x='sepal_length',y='sepal_width',c='petal_length')

scatter
can take the 3rd variable in the form of size as well but it must be a series not a column name. Example below:
iris.plot.scatter(x='sepal_length',y='sepal_width',s=iris['petal_length']*15)

plot()
: plot is a pandas method which allows all the above plots to be plot from a single place. It has a argument kind
which takes the type of the plot like line/hist/scatter etc.
iris.plot(kind='line',figsize=(12,4))
