Pandas - Plotting

 

Plotting
Pandas is a great tool to analyze data. One of the ways to analyze is plotting. On this page, we will introduce plotting functions: Histogram, Scatter Plot,  Hexbin Plot, and Box Plot using examples.
1. Histogram:
    One of the frequently used plot is a histogram. You probably have seen or heard about histogram since 
    middle school. We can plot that using Pandas.
    We have data that indicates various information of universities in the United States from 1995. We want 
    to see the distribution of graduation rates using histogram. Thus, we will put the column "Grad.Rate" to 
    plot the histogram. 

     After that, you can see the plot which reveals that most of the universities have about 70 % graduation     
     rate. I can also note that there are some universities have more than 100% graduation rate which we 
     have to figure out how they got the rate to avoid a misinterpretation. 


2. Scatter Plot:
    When you wan to compare two data sets and see if there is any relationship between them, scatter plot 
    can be a powerful tool. Let's keep using the data above. We want to see if schools with lower student to 
    faculty ratios have higher tuition costs. All we have to do is to define what goes to kind, x, and y values. 
    Since we are comparing the ratio and the tuition price, we will put kind = "scatter", x = "S.F.Ratio", and 
    y = "Expend". Then we get the plot!
    
    As you can see, there is a weak relation that the lower tuition it is, the lower ratio is has. 
3. Hexbin Plot:
    It seems like many data points are collected in one spot in the plot above. Is there a way that we can see 
    the density of data points? Yes, you can do it using hexbin plots. We have the exactly same values for x 
    and y but we changed the plotting type to 'hexbin'. Then you can see the density of data points. 
    Although it's cool that you can see the density, but the scatter plots can show the points better in terms 
    of statistics. 

    I have another data that we use to plot hexbin plot. 

    This information has crime information for different years. Now we want to discern the relationship 
    between Vehicle Theft and Robbery. So, I coded like this:

    It's very similar to scatter plots but the kind will be "hexbin". Then we can see the plot as below:


    This time, it seems like there is more relations between Robbery and Vehicle Theft. The more Robbery, 
    there is more vehicle theft. You can also see the density by the different saturation degree of green 
    points. 

4. Box Plot:
    After you take a test and get your result, you might wonder how other students did on their tests. Some 
    of the educational websites show the picture below:

    As the picture depicted, it shows the minimum score, the maximum score, the range between them, the 
    lower and upper quartile, and the median score. 

    Using box plot, we want to see each of the distribution of Burglary, Violent, and Vehicle Theft from the 
    crime data above. So, I 
    coded like this:
    We put "Box" in the kind input because we want to plot the box plot. Since we want to see the 
    distribution of Burglary, Violent, and Vehicle, we will put them in y input. Then we get the desired plot:



*GitHub: https://github.com/KwakSukyoung/coding/blob/master/ACME/Pandas2/pandas2.ipynb


Comments