Pandas - Plotting
Pandas is a great tool to analyze data. One of the ways to analyze is plotting. On this page, we will introduce plotting functions: Histogram, Scatter Plot, Hexbin Plot, and Box Plot using examples.
1. Histogram:
One of the frequently used plot is a histogram. You probably have seen or heard about histogram since
middle school. We can plot that using Pandas.
We have data that indicates various information of universities in the United States from 1995. We want
to see the distribution of graduation rates using histogram. Thus, we will put the column "Grad.Rate" to
plot the histogram.
After that, you can see the plot which reveals that most of the universities have about 70 % graduation
rate. I can also note that there are some universities have more than 100% graduation rate which we
have to figure out how they got the rate to avoid a misinterpretation.
2. Scatter Plot:
When you wan to compare two data sets and see if there is any relationship between them, scatter plot
can be a powerful tool. Let's keep using the data above. We want to see if schools with lower student to
faculty ratios have higher tuition costs. All we have to do is to define what goes to kind, x, and y values.
Since we are comparing the ratio and the tuition price, we will put kind = "scatter", x = "S.F.Ratio", and
y = "Expend". Then we get the plot!
As you can see, there is a weak relation that the lower tuition it is, the lower ratio is has.

3. Hexbin Plot:
It seems like many data points are collected in one spot in the plot above. Is there a way that we can see
the density of data points? Yes, you can do it using hexbin plots. We have the exactly same values for x
and y but we changed the plotting type to 'hexbin'. Then you can see the density of data points.
Although it's cool that you can see the density, but the scatter plots can show the points better in terms
of statistics.
I have another data that we use to plot hexbin plot.

This information has crime information for different years. Now we want to discern the relationship
between Vehicle Theft and Robbery. So, I coded like this:
It's very similar to scatter plots but the kind will be "hexbin". Then we can see the plot as below:
This time, it seems like there is more relations between Robbery and Vehicle Theft. The more Robbery,
there is more vehicle theft. You can also see the density by the different saturation degree of green
points.
4. Box Plot:
After you take a test and get your result, you might wonder how other students did on their tests. Some
of the educational websites show the picture below:
As the picture depicted, it shows the minimum score, the maximum score, the range between them, the
lower and upper quartile, and the median score.
Using box plot, we want to see each of the distribution of Burglary, Violent, and Vehicle Theft from the
crime data above. So, I
coded like this:
We put "Box" in the kind input because we want to plot the box plot. Since we want to see the
distribution of Burglary, Violent, and Vehicle, we will put them in y input. Then we get the desired plot:
*GitHub: https://github.com/KwakSukyoung/coding/blob/master/ACME/Pandas2/pandas2.ipynb












Comments
Post a Comment