Skip to main content

Exercise 2

This exercise includes questions that cover most topics introduced in Chapter 3 and 10.8.1. The exercise is structured to take you through a number of frequently encountered questions, ones that you are likely to wish to answer in relation to your own data. This exercise includes topics that are taken forward in Chapters 5 and 10.

Example W3.2: Height of male students.

The height of 100 male students was recorded in cm. The raw data is in the table below. The investigator wished to check to see if these data are Normally distributed.

A

table

to

show

height

150.0

166.8

172.7

178.7

185.4

154.0

167.0

172.8

179.0

185.5

154.7

167.1

173.1

179.8

185.8

157.5

167.8

173.2

181.1

187.8

158.2

167.8

173.3

181.2

187.9

158.6

168.0

173.4

181.3

188.8

159.1

168.5

174.0

181.5

189.3

159.9

168.9

174.1

182.1

189.6

160.4

169.4

174.6

182.2

189.9

160.8

169.8

175.6

182.6

190.3

161.6

169.8

175.6

182.9

191.4

162.4

169.9

176.2

183.1

191.4

163.3

170.7

176.2

183.5

193.6

163.3

171.1

176.5

183.6

194.4

164.1

171.2

176.8

183.8

195.9

165.0

171.8

177.2

184.3

196.3

165.3

172.1

177.7

184.3

196.8

165.6

172.3

177.9

185.0

199.1

165.9

172.6

177.9

185.0

199.2

166.4

172.7

177.9

185.4

201.0



1

Q W2.1

What information needs to be added to this table?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

a) Title
b) Numbering
c) Labelling (rows and columns)
d) Title, row and column headings that indicates the units of measurement
e) Differentiation between zero and missing data
f) Simplicity
g) Appropriate classes (if it is a frequency table)
a) Yes. The current title is too vague.a) Incorrect. The current title is too vague and needs changing.b) Yes. At present this table is not numbered.b) Incorrect. At present this table is not numbered.c) Correct. None of the columns are labelled so it is not clear what these numbers are.c) Incorrect. None of the columns are labelled so it is not clear what these numbers are.d) Correct. The title and column heading should include the units of measurement i.e. cm.d) Incorrect. The title and column heading should include the units of measurement i.e. cm.e) Correct, this is not applicable as there are no zeros, nor missing values.e) No. This is not applicable as there are no zeros, nor missing values.f) Correct, this is already a simple table.f) Incorrect. This is already a simple table.g) Correct - this is not a frequency table.g)Incorrect, this is not a frequency table.
Check your answer

Corrected table.

Table W3.3: Height (cm) of 100 male students

   

Height (cm)

   

150.0

166.8

172.7

178.7

185.4

154.0

167.0

172.8

179.0

185.5

154.7

167.1

173.1

179.8

185.8

157.5

167.8

173.2

181.1

187.8

158.2

167.8

173.3

181.2

187.9

158.6

168.0

173.4

181.3

188.8

159.1

168.5

174.0

181.5

189.3

159.9

168.9

174.1

182.1

189.6

160.4

169.4

174.6

182.2

189.9

160.8

169.8

175.6

182.6

190.3

161.6

169.8

175.6

182.9

191.4

162.4

169.9

176.2

183.1

191.4

163.3

170.7

176.2

183.5

193.6

163.3

171.1

176.5

183.6

194.4

164.1

171.2

176.8

183.8

195.9

165.0

171.8

177.2

184.3

196.3

165.3

172.1

177.7

184.3

196.8

165.6

172.3

177.9

185.0

199.1

165.9

172.6

177.9

185.0

199.2

166.4

172.7

177.9

185.4

201.0



2

Q W2.2

The investigator wishes to check to see if these data are Normally distributed. Read BOX 3.2 in the book and use the first four criteria (a - d) to determine if the data are apparently Normally distributed. (We have other interactive exercises in Chapter 5 that consider criterion e).

Use the table in this Word document to answer this question.

Have you finished your table?

a) Yes
b) No

Criterion

YES

NO

MAYBE

a

x

   

b

x

   

c

x

   

d

   

x



The full answer with calculations can be found below.

Finish your table before proceeding!
Check your answer

Full calculation for Q W2.2:

- using no software package

- using Excel

- using SPSS

- using Minitab

Full calculation using no software package

  1. Are the data measured on an interval scale and are therefore quantitative and continuous such as mm and grams?

YES

  1. Does the distribution appear to be a 'bell' shaped curve?

YES (Fig W3.1)

Fig W3.1: Heights of male students
  1. Do about 68% of your observations fall within the range Mean ± 1 s. Given the number of observations we would encourage you to use statistical software such as Excel, SPSS or Minitab to carry out these calculations.

Firstly to calculate the mean (Mean)

∑x = 150.0 + 154.0 + 154.7 .....+ 173.2 + 173.3 + 173.4 = 17607.9

n = 100

Calculation of mean (176.079cm)

To calculate the standard deviation (s)

∑x = 150.0 + 154.0 + 154.7 .....+ 173.2 + 173.3 + 173.4 = 17607.9

(∑x)2 = (17607.9)2 = 310038142.4

n = 100

Step in calculation of standard deviation

∑x2 = 150.02 + 154.02 + 154.72 + .......173.22 + 173.32 + 173.42 = 3112760.57

CAlculation of standard deviation (11.18cm) The range of Mean+ 1s spans from Mean - 1s = 176.079 - 11.18 = 164.90 cm to Mean + 1s = 176.079 + 11.18 = 187.26 cm

There are 68/100 (68%) observations that fall within this range.

Is this criterion met?

YES

  1. Does the mean = median = mode?

We have already calculated the mean for these data (176.079cm).

The median is the middle value when the data are organised in numerical order. When n is an even number then the median is calculated as half the sum of the two middle values. The 50th and 51st values are both 175.6cm, therefore, this is the median.

When data are arranged in order the mode is the category that contains the greatest number of observations, but these data are measured on a continuous scale and it is therefore more useful to consider a modal class. If you are not convinced look back at the data in Table W3.3 to see if you can identify a mode.

In Table W3.4 the data are organised into a frequency table. The modal classes are 156 - 169.9 and 170.0 - 169.9. The mid point of these classes is 169.95.

Table W3.4: Contingency table for number of male students at given heights (cm)

       

Height

of

male

students

(cm)

     
 

150.0 - 154.9

155.0 - 159.9

160.0 - 164.9

165.0 - 169.9

170.0 - 174.9

175.0 - 179.9

180.0 - 184.9

185.0 - 189.9

190.0 - 194.9

195.0 - 199.9

200.0

-204.9

No. of students

3

5

7

17

17

14

14

12

5

5

1



Is this criterion met? MAYBE

The mean is very similar to the median but not the mode. Therefore it is not clear that this criterion has been met.

Full Calculation using Excel

Step 1: Enter data into a new spreadsheet as a single column.

Step 2: From the top tool bar select Tools, then Data Analysis from the drop down menu.

Step 3: Select Descriptive Statistics from the box which opens.

Excel: Data Analysis tool

Step 4: Click on OK and a new box opens.

Excel: Descriptive Statistics box

Here, enter the cell locations for the array. Click on the label in cell A1 and drag across the entire data set. Click the button for 'cells grouped by columns'.

Click on the box for 'labels in first row'. A tick will appear.

Step 5: Select the output location for the returned data. To put it in a new spreadsheet, click in the button for New Worksheet Ply. Click on a single cell where the data will start and Excel will estimate the correct size and return the results.

Step 6: Tick the boxes for Summary statistics and confidence level for mean which should default to 95%.

Step 7: Click on OK.

Excel: Data in spreadsheet

Step 8. Read each value from the table. Mean, median and mode values are very close as would be expected in Normally distributed data.

Full Calculation using SPSS

The parts of this analysis that can be done using software are the calculation of the mean, the median and the standard deviation. (If the data hadn't been ordered, it would have been possible to do this - deciding if about 68% of the measurements fall within one standard deviation of the mean is much easier if the data are ordered.)

Step 1. When SPSS starts, select the 'Type in data' option, and click on 'OK'.

Then choose 'variable' view from the tabs at the bottom left. You will see a screen something like this:

SPSS: Data Editor

Each row represents a variable for the analysis.

In the name cell for variable 1, type 'height' (SPSS won't accept capital letters as parts of a Name). Most of the other characteristics of the variable will be give default values as below:

SPSS: Data Editor containing variables

In our data, the heights are to the nearest 0.1 cm, so we can change the 'Decimals' column to '1' by clicking in the cell, and using the 'up' and 'down' arrows as appropriate.

SPSS: changing Decimals

Change to 'Data View' (use the tabs at bottom left) and insert the data into the first column. (If the data are in Word or Excel, it is possible to 'copy and paste' as necessary - this is faster than typing, and reduces the possibility of errors.)

SPSS: Data inserted into spreadsheet

Step 2. Analyze the data.

(i) From the drop-down menus at the top of the screen, select 'Analyze', 'Descriptive Statistics' and 'Frequencies'.

SPSS: Analyzing the data

(If you select 'Descriptives', you get a smaller range of options.) You should get a dialogue box as below:

SPSS: Frequencies dialogue box

(ii) If 'height' isn't highlighted, click on it to highlight it as above. Click on the arrow to transfer the Variable 'height' into the 'Variable(s)' window. (If you have more than one variable, you can transfer one or more for analysis by highlighting and transferring them in turn.)

SPSS: Frequencies dialogue box (2)

(iii) Click on the 'Statistics' button. This will give you another window in which you can select the statistics you wish to calculate. Select mean, median and standard deviation.

SPSS: Choosing statistics to calculate

(iv) Click on the 'Continue' button. The second window will close, and return you to the 'Frequencies' screen. Click on 'OK', and your results will appear in another window:

Frequencies

SPSS: Results

(and a large table with a row for each data item...)

From this, we obtain a mean of 176.079 cm, a median of 175.600 cm, and a standard deviation of 11.1822 cm.

Full Calculation using Minitab

The parts of this analysis that can be done using software are the calculation of the mean, the median and the standard deviation. (If the data hadn't been ordered, it would have been possible to do this - deciding if about 68% of the measurements fall within one standard deviation of the mean is much easier if the data are ordered.)

Step 1. Enter the data.

(i) Name the first column 'Height (cm)' by typing this into the space below the heading 'C1'. You will need to widen the column to accommodate the text: this can be done by dragging the boundary between C1 and C2.

Minitab: Step 1 (i)

(ii) Type the heights of the students into column 1. (If the data are in Word or Excel, it is possible to 'copy and paste' as necessary - this is faster than typing, and reduces the possibility of errors.)

Minitab: Step 1 (ii)

Step 2.

(i) Go to the 'Stat' drop-down menu, select 'Basic Statistics' and 'Display Descriptive Statistics'.

Minitab: Step 2 (i)

You will see another window:

Minitab: Step 2 (i)

(ii) Click on 'C1 Height (cm)' to highlight it, then click on 'Select' to transfer it to the 'Variables' area.

Minitab: Step 2 (ii)

(iii) Click on 'Statistics', and you will see a window in which you can select the statistics you wish to calculate. Select the ones you want, and deselect those not needed.

Minitab: Step 2 (iii)

(iv) Click on 'OK'. Also click on 'OK' in the 'Display Descriptive Statistics' window. Your results will appear in the upper (Session) window of the Minitab screen.

Minitab: Step 2 (iv)

The mean is 176.08 cm, the standard deviation is 11.18 cm, and the median is 175.60 cm.

3

Q W2.3

Which summary statistics would be most appropriate to use when communicating these results?

[If you would like to save a record of your answer, please type it into this Word document]

a) Mean
b) Median
c) Mode
d) Standard deviation
e) Variance
f) Confidence limits
g) Coefficient of variance
h) Range
i) Interquartile range
j) Percentiles
k) Skew
l) Kurtosis
a) Correct. These are parametric data so we may calculate a mean for a Normal distribution.a) Incorrect. These are parametric data so we may calculate a mean for a Normal distribution.b) Correct. This is a more general measure of the central tendency of a distribution and should be reserved for data where the distribution is not known. b) Incorrect. This is a more general measure of the central tendency of a distribution and should be reserved for data where the distribution is not known. c) Correct. This is a more general measure of the central tendency of a distribution and should be reserved for data where the distribution is not known.c) Incorrect. This is a more general measure of the central tendency of a distribution and should be reserved for data where the distribution is not known.d) Correct. These are parametric data and the standard deviation is given in the same units as the original observations.d) Incorrect. These are parametric data and the standard deviation is given in the same units as the original observations.e) Correct. This could be used in place of the standard deviation since these are parametric data. However, the units are not the same as the original observations.e) Incorrect. This could be used in place of the standard deviation since these are parametric data. However, the units are not the same as the original observations.f) Correct. We have interval data so may consider calculating a confidence interval.f) Incorrect. We have interval data so may consider calculating a confidence interval.g) Correct. At present we are not comparing these data with any other sets of data, so this is not relevant.g) Incorrect. At present we are not comparing these data with any other sets of data, so this is not relevant.h) Correct. This is a more general measure of the central tendency of a distribution and should be reserved for data where the distribution is not known.h) Incorrect. This is a more general measure of the central tendency of a distribution and should be reserved for data where the distribution is not known.i) Correct. This is a more general measure of the central tendency of a distribution and should be reserved for data where the distribution is not known.i) Incorrect. This is a more general measure of the central tendency of a distribution and should be reserved for data where the distribution is not known.j) Correct. This is a more general measure of the central tendency of a distribution and should be reserved for data where the distribution is not known.j) Incorrect. This is a more general measure of the central tendency of a distribution and should be reserved for data where the distribution is not known.k) Correct. A useful measure as it provides an additional descriptive measure of your distribution.k) Incorrect. A useful measure as it provides an additional descriptive measure of your distribution.l) Correct. A useful measure as it provides an additional descriptive measure of your distribution.l) Incorrect. A useful measure as it provides an additional descriptive measure of your distribution.
Check your answer

4

Q W2.4

What are the 95% confidence limits for these data?

[If you would like to save a record of your answer, please type it into this Word document]

There are 100 observations so our 95% confidence limits will be determined by: Calculation of confidence limitsCheck your answer

You may also like to see how we would carry out this calculation using the following software packages:

- Excel

- SPSS

- Minitab

Calculation of Q W2.4 using Excel

This can be calculated indirectly in Excel.

Step 1: Enter data into a new spreadsheet as a single column.

Step 2: From the top tool bar select Tools, then Data Analysis from the drop down menu.

Step 3: Select descriptive Statistics from the box which opens.

Excel: Data analysis box

Step 4: Click on OK and a new box opens.

Excel: Descriptive Statistics box

Enter the cell locations for the array. Click on the label in cell A1 and drag across the entire data set. Click the button for 'cells grouped by columns.

Click on the box for 'labels in first row'. A tick will appear.

Step 5: Select the output location for the returned data. To put it on a new spreadsheet, click in the button for New Worksheet Ply. Click on a single cell where the data will start and Excel will estimate the correct size and return the results.

Step 6: Tick the boxes for Summary statistics and confidence level for mean which should default to 95%.

Step 7: Click on OK.

Excel: Data in spreadsheet

Step 8. Read each value from the table. The value for standard error can be taken from the out put table and has a value of 1.118.

Step 9: To obtain the 95%, multiply the value for S.E. by 1.96. Click on a new cell to select it. Put an '=' in the box. Type in 1.118, then '*' for multiplication, followed by 1 96. Press 'return' and the answer will be given in the cell. 2.1917

The 95% confidence limits would be reported as Mean ± (1.96 × 1.118) i.e. Mean ± 2.1917 cm.

Calculation of Q W2.4 using SPSS

Continuing from the answer to Q W2.3 (above): SPSS doesn't do one-sample z tests, but it does offer the slightly more powerful one-sample t test. Go to 'Analyze', 'Compare means', 'One-sample t-test'.

SPSS: One-Sample T Test

Click on 'height' to highlight it, then click on the arrow to transfer it to the 'test variable(s)' window. Enter the test value (our calculated mean, 176.08 in this case) into the 'test value' window.

SPSS: One-Sample T Test (2)

Click on 'options', and check that the confidence interval is suitable (95% seems to be the default).

SPSS: One-Sample T Test (3)

Click on 'continue', and then on 'OK'. The output will appear in a separate window.

t-test

SPSS: T Test results

This tells us that the actual mean is 176.079 cm, and that the 95% confidence interval extends from 2.220 cm below this to 2.218 cm above it (173.86 cm to 178.30 cm). This is a slightly wider interval than the z test because it is a different test.

Calculation of Q W2.4 using Minitab

Continuing from the answer to W2.3: we need to perform a 1-sample z test. Go to 'Stat', 'Basic Statistics', '1-sample z'.

Minitab: 1-Sample Z test

Click in the 'samples in columns' window, and then click on 'c1 Height (cm)' in the left-hand window to highlight it. Click on 'select' to transfer it to the 'samples in columns' window. Enter the standard deviation and mean into the appropriate windows.

Minitab: 1-Sample Z test (2)

Click on 'options', and check that the confidence level is what you need (95% seems to be the default).

Minitab: 1-Sample Z test (3)

Click on 'OK', and click on 'OK' again. The output will appear in the session window.

One-Sample Z: height (cm)

One-Sample Z: height (cm)

From this, we see that the 95% confidence interval is from 173.888 cm to 178.270 cm.