 # Data analysis: range, mean, median, and mode

In the biology laboratory, we generate a ton of data. But how do we understand what it is telling us? We can use statistics! There are many ways to perform the analysis, depending on the data we collect. We’re going to start here with some basic concepts in statistics, and over the coming months we will explore more concepts in depth!

The first thing that we can do when we have a numerical data set is to perform a quick summary of the data by calculating the range, mean, median, and mode. These are all basic functions that can be done in Google Sheets (which is free to use) or any similar spreadsheet program. Mathematical formula for the median. X = ordered list of values in data setn = number of values in data set
• The range is the difference between the highest value and the lowest value. So, mathematically, we can think of this as range = max value – min value.
• A data set’s mean is what we generally think of as the average. All the numbers in the data set are added into one number, and then divided by the total number of data points. If we wrote this out, we can think of it as mean = (data1 + data2 + … datax ) / x, where x equals the total number of points in our data set.
• The number that is directly in the middle of the data set is the median. This is determined by ordering data set from largest to smallest, and then selecting the number in the middle. For data sets with odd numbers of samples, the median is easy – you just take that middle number. However, with data sets with even numbers of samples, you have to average the two middle numbers and that becomes the median.
• The mode is the number that shows up most in a data set. To determine this value, you have to count the number of times each number shows up in your data. Now, the interesting thing about the mode is that a data set might one mode, multiple modes, or no modes! It all depends on the frequency that a data point shows up.