Skip to Tutorial Content

Introduction

In this lab, you will learn how to create and interpret box plots.

What is a Box plot? They are a visual representation of the minimum, Q1, median (Q2), Q3, and maximum values of a variable. Just like histograms, box plots should only be used with continuous or discrete variables. Box plots are similar to histograms in that they both display the distribution of a variable, but box plots make it to more straightforward to identify the actual value of the minimum, Q1, median (Q2), Q3, and maximum. Finding the actual value for a variable from a histogram is not possible because of the bins (think about it. if there are 12 observations in the 80-85 age interval there is no way to know if the specific max age only that it is some were between 80-85 years).

Making Box plots

You guessed it we are going to use R to make some basic box plots. We will use the boxplot() function check out RDocumentation for more information. A box plot is constructed by drawing lines that correspond to the values of Q1 (the first quartile), the median (Q2), and Q3 (the third quartile) and then “boxing” in the median. Then “whiskers” are added that represents the min and max values in most cases, but if there are suspected outliers the whisker stops short, and the suspected outlier is represented by a dot or star. The definition R uses (and other stats software) for outliers is any observed value more than 1.5 times the IQR below Q1 or above Q3. So if your data has Q1=7 and Q3=10 the IQR=3 so any observations lower than 2.5 or higher than 14.5 would be a suspected outlier and represented by a dot or star.

Interpreting Box plots

Several specific quantities are present visually. The image below shows you the parts different parts of the box plot.

The skewness of the data can also be assessed with a box plot look at the example below. In this example the dots are not outliers, instead each dot represents the value for each observation.

Making and Interpreting Box Plots

To make a basic box plot, we will use the boxplot() function. The code to create a basic box plot is boxplot(dataset$variablename) , so for example to make a box plot of age the code would be boxplot(HeartFailure$age).

The data is the same as the previous labs. The data dictionary is also included.

Data Dictionary

age: age in years
sex: (1 = male; 0 = female)
trestbps: resting blood pressure (in mm Hg on admission to the hospital)
chol: serum cholesterol in mg/dl
fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
thalach: maximum heart rate achieved
exang: exercise induced angina (1 = yes; 0 = no)
oldpeak: ST depression induced by exercise relative to rest

#show the last 10 observations in the dataset
tail(HeartFailure,10)

In the exercises below, you will make a box plot to answer the questions using R’s boxplot() function. I made the first one for you, so all you have to do is run the code and use the box plot to answer the question. For the additional problems, all you need to do is copy and paste the code into the next code box, replace the variable name, and click the “Run Code”. Use the box plots you create to answer the questions below.

# age is a continous/discrete variable so using a box plot is appropriate.
# adding "horizontal = TRUE" lets R know I want the box plot to be horizontal
# If you want to make the box plot vertical just delete ", horizontal = TRUE"
boxplot(HeartFailure$age, horizontal = TRUE)

# Use the summary() function to make it easier to get the exact values of the different parts of the box plot
summary(HeartFailure$age)

Quiz: Questions 1-3

Quiz
# make a box plot for chol.

Quiz: Questions 4-6

Quiz
# make a box plot for trestbps. Use the summary function so you can get the exact value

Quiz: Questions 7-9

Quiz
# make a box plot for oldpeak.

Quiz: Questions 10-12

Quiz

Comparing Groups with Box Plots

Box plots are also very useful to make comparisons between groups. The first step is to stratify (fancy word for divide) the data based on the value of a variable. A box plot is created for each stratum, and they are plotted next to each so you can quickly compare them. In our case, we have three categorical variables that we will use to stratify our data. See if you can find any variables that have major difference between groups.

Instructions Use the dropdown menu to select different variables to compare by different strata. Use the plot to answer the quiz questions.

Quiz: Questions 13-14

Quiz

Summary

In this lab, You completed 5 exercises and answered 14 quiz questions.

  1. You learned how to make and interpret a box plot

  2. You used box plots to see if there were any differences between different strata

Great job! You are done with the lab! Don’t forget to record your answers and take the eLC quiz so you get credit

from: https://xkcd.com/539/

Box Plots

Computer Lab 3