A box plot, also known as a whisker plot, is a visual representation that displays the distribution of a data set along with several key statistical measures. It provides a graphic summary of important information about a set of data, including the minimum score, first quartile, median, third quartile, and maximum values.
The box plot looks like a rectangle, often called the box, which indicates the interquartile range. The box center represents the median, and the outer sides of the box represent the first quartile (25th percentile) and third quartile (75th percentile), respectively. The “whiskers” extending from the box show the range of data, excluding the outliers.
The box plot was first introduced by the mathematician John Tukey in 1970.1
This article discusses what a box plot can tell you about a data set and the different parts of a box plot. It also covers how box plots can help you analyze data and how to interpret one.
What Information Do Box Plots Tell You?
A box plot provides a number of key points of data that can help reveal important details and information about the data they describe.
Some of these key points include the median, the interquartile range, and the minimum and maximum values excluding the outliers.
The 7 Elements of a Box Plot
The seven key elements of a box plot are:
- Minimum score: The minimum score is the lowest data point that excludes any outliers.
- Median: The line inside the box indicates the median of a data set. The median is the middle value of the data set when the values are lined up in order. This number is important because it indicates the central tendency of the data set.
- Lower percentile: Also known as the first quartile or 25th percentile, this is the median of the lower half of the data set.
- Upper percentile: Also known as the third quartile or 75th percentile, this is the median of the upper half of the data set.
- Maximum score: This is the highest data point in the data set once outliers have been excluded.
- Whiskers: The box plot’s ” whiskers ” are the lines extending from either side of the box. These lines represent the range of the data without the outliers. This can be useful for gaining insight into the minimum and maximum values within a certain range.
- The Interquartile Range (IQR): The box in a box plot represents the interquartile range of the data set. The IQR is the range between the first quartile and the third quartile. This contains the middle 50% of the data and is helpful for visualizing the spread of the data set.
Box plots are non-parametric, meaning they do not make assumptions about the underlying distribution of the data. Instead, they function to give a visual representation based on the median and quartiles.2
This makes them useful for comparing data across different groups or conditions. They provide a concise summary of the main characteristics of the data by quickly conveying the central tendency and spread.
Box Plot Orientation
Box plots can be presented horizontally and vertically. They are often presented vertically in professional academic papers.
Why Are Box Plots Helpful in Analyzing Data?
Box plots have a number of important uses and benefits in conveying and analyzing data in psychology research.
These graphic representations can quickly convey important information about the spread and symmetry of a distribution. Using these tools, it is possible to create visual comparisons of different groups of data.
Researchers also suggest that box plots can help people improve their ability to interpret and understand complex information. This allows people to reason more effectively about quantitative information.3
Box plots can also be a good alternative to histograms. While for a histogram it is generally recommended that you have a minimum sample size of 50 data points, a box plot can be effective with a sample size of only five.4 Box plots can also be adapted in various ways to help improve comparison and interpretation. For example, a notch can be added to the box to show the 95% confidence interval for the median.
How to Interpret a Box Plot
When interpreting data in a box plot, researchers typically follow a set of steps or methods to gain insights and draw conclusions. Here are the common steps used to interpret data in a box plot:5
Look at Key Characteristics of the Data
The first step when interpreting a box plot is to look at some of the most important characteristics of the data:
- Identify the median: The median is represented by the line within the box in the box plot. It divides the data into two equal halves, with 50% of the data falling below and 50% above the median. It provides a measure of central tendency.
- Determine the quartiles: The box in the box plot represents the interquartile range (IQR). The lower quartile (Q1) is the median of the lower half of the data, and the upper quartile (Q3) is the median of the upper half. The IQR is the range between Q1 and Q3 and contains the middle 50% of the data.
- Consider the range: The range of the data, indicated by the length of the whiskers, provides an understanding of the spread of the data. A wide range would be indicated by longer whiskers. Short whiskers, on the other hand, would indicate a narrow range. A wide range suggests greater variability in the data.
Look for Signs of Unusual Data
It can also be helpful to look for signs that your data might be skewed or unusual in some way.
- Identify outliers: Outliers are data points that lie significantly above or below the whiskers of the box plot. They are indicated by individual points or dots outside the whiskers. Outliers may represent extreme values or data points that deviate significantly from the majority.
- Look for skewed data: Data that is heavily skewed can be a sign that the data is not normal. You can assess the skewness by looking at the box’s location on the center line and the location of the median in relation to the box. If the median is closer to Q1, the distribution may be left-skewed, while if it is closer to Q3, the distribution may be right-skewed. If your data appears unusually skewed, it is important to investigate what might be causing it.
- Analyze symmetry: The symmetry of the box plot can indicate the symmetry of the data distribution. If the box is roughly symmetric, it suggests a symmetrical distribution. Asymmetrical boxes may suggest skewness or other features of the distribution.
Make Comparisons
Box plots can be used to compare distributions between different groups or categories. Researchers can identify differences or similarities in the data by visually comparing the position, shape, and spread of the boxes and whiskers.
- Look at notches: If you are assessing a box plot, notches in the sides of the box can provide a visual comparison of uncertainty between groups. If the notches do not overlap, the medians may be significantly different.
- Look at medians: If the median line of one box lies outside of the other group’s box, it is a good sign of differences between the two groups.
- Consider sample size: When comparing box plots, it is important to consider the sample sizes. A small sample size may not be representative of the entire population, so conclusions should be drawn with caution.
- Compare box and whisker lengths: Comparing the box lengths (the interquartile range) and the whiskers (the overall data spread) can also provide valuable information when comparing groups. Significant differences in the lengths of the boxes and whiskers suggest differences between the groups.