Generating random datasets with a specified mean and standard deviation

When creating examples in class or on this blog, occasionally I need to generate numbers to fill a cell. Sometimes,I find myself wanting those numbers to have a certain mean and maybe a certain standard deviation. This is often the case when I’m creating data to match means and standard deviations reported in an article that I am using for the example, and I don’t have access to the actual data that the authors used (like here).

Generating data with random.org

Random.org has an easy-to-use solution for generating random datasets. The site provides a service for generating random numbers from a Gaussian (normal) distribution, with the mean, standard deviation, and number of significant digits of your choosing.

Generating data with Excel

In Excel, you can generate (pseudo-)random data using a combination of the NORMINV and RAND functions. According to Excel, NORMINV function returns the “inverse of the normal cumulative distribution for the specified mean and standard deviation.” To me, this means nothing, so I’ll do my best to talk my way through it here.

NORMINV works like this: You are interested in finding a score from a normal distribution, a score that cuts of a specific proportion in the left side of that distribution. For example, maybe you want to identify the cutoff point for the first quartile, which is the score that cuts off 25% of the left side of the distribution, leaving 75% in the right side. NORMINV takes three variables, like this =NORMINV(probability, mean, standard_dev). “Probability” refers to the proportion in the left hand tail, which would be .25 in the example here. “Mean” refers to the mean of the distribution, and standard_dev to its standard deviation. For example, if you were working with a distribution of IQ scores, with a mean of 100 points and standard deviation of 15 points, you would use =NORMINV(.25,100,15). Excel would return 89.88, the IQ score that sits at the first quartile.

Here’s another example. First-term stats students know that a z-score of 1.96 cuts off the most extreme 2.5% from the right-hand side of a standard z-score distribution, leaving 97.5% to the left of the score. Standard distributions have a mean of 0 and a standard deviation of 1. If you use =NORMINV(.975,0,1), you would expect Excel to give a value of 1.96, which it does, as the image below illustrates.

The NORMINV function in Excel.

The NORMINV function in Excel.

To get random numbers with a particular mean and standard deviation, we combine the NORMINV function with the RAND() function. Using =RAND() returns a (pseudo)random number between 0 and 1. Since proportions also vary between 0 and 1, we can just use RAND() for the first variable in the NORMINV function.

So, if you need a sample size of 25, with a mean of 72.3 and a standard deviation of 8.6, insert the following into a cell in the spreadsheet:

=NORMINV(RAND(),72.3,8.6)

Use the drag handle in the bottom right corner of the cell, and drag it down to fill 25 cells with data. It might look something like this:

Twenty-five random scores with a particular mean and standard deviation.

Twenty-five random scores with a particular mean and standard deviation.

The RAND() function generates new values every time you do anything in Excel, making it quite volatile for even modest sized data sets. So you may want to copy the scores and paste them as values only to stop the constant refreshing and create static values.

Leave a Reply

Your email address will not be published. Required fields are marked *