Skip to main content

Exercise 4

In Chapters 6 and 7 we introduce you to some parametric statistical tests. These tests assume that the data you are using are Normally distributed. We explain in 3.8 some of the ways to tell if your data are Normally distributed. The most comprehensive method however is described in 5.1.3 and requires the use of a chi-squared goodness of fit test. Here we provide another example of how to determine if your data are Normally distributed. The example we use is taken from Chapter 7.

Example 7.1: The evolution of Littorina littoralis at Aberystwyth 2002

Several years ago it was suggested that a species of periwinkle, Littorina littoralis was evolving sympatrically into two species, L. obtusata and L . mariae, through niche partitioning. L. obtusata apparently grazes on the brown alga, Ascophyllum nodosum, on the mid shore, whilst putative L. mariae feeds on the epiphytes growing on Fucus serratus on the lower shore. In a study of the sympatric evolution of L. littoralis a representative sample of the two groups of individuals was collected from Aberystwyth in 2002 and their shell height (mm) recorded (see Table 7.1 in the text).

The investigators wished to test the hypothesis that there is no difference between shell height (mm) of the two groups of periwinkles from the mid and lower shore. When planning their investigation they had decided to use a z test. One criteria for using this test is that the data are parametric. In this exercise we will check that these data for shell heights (mm) from the lower shore periwinkles (Table W5.9) are parametric (Normally distributed) using a goodness of fit chi-squared test.

Table W5.9: Height (mm) of shells of one putative species of periwinkle from the lower shore at Aberystwyth, 2002

Shell height of

periwinkles (mm)

from the lower shore

5.3

4.3

6.5

8.7

8.0

6.8

5.3

10.2

5.0

5.3

5.3

5.1

5.9

7.8

4.9

2.8

7.0

3.7

8.7

6.1

2.8

2.0

5.0

3.8

5.3

5.4

3.0

6.5

5.7

4.2



1

Q W4.1

Organise the data into a frequency table, centered around the mean (mean1 = 5.54667). Use the table in this Word document to do this. Have you completed your table?

a) Yes
b) No

Table W5.10: Frequency table for numbers of periwinkles for a given shell height (mm)

Heights of shells (mm)

Number of periwinkles observed

0.5 - 2.4

1

2.5 - 4.4

7

4.5 - 6.4

13

6.5 - 8.4

6

8.5 - 10.4

3

Total

30



Finish your table before proceeding!
Check your answer

2

Q W4.2

What hypotheses will be tested?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

H0: There is no difference between the numbers of periwinkles in frequency classes for shell height (mm) and that expected if the data have a Normal distribution.

H1: There is a difference between the numbers of periwinkles in frequency classes for shell height (mm) and that expected if the data have a Normal distribution.

Check your answer

3

Q W4.3a

Carry out a chi-squared goodness of fit test by firstly, using the Gaussian equation (3.2.1) to calculate the expected values (5.1.3).

Enter your answers into the table in this Word document.

Have you completed your table?

a) Yes
b) No

Table W5.11: Expected values for shell height (mm) in lower shore periwinkles assuming a Normal distribution

Heights of shells (mm)

Observed numbers of periwinkles

Expected numbers of periwinkles

0.5 - 2.4

1

1.2204

2.5 - 4.4

7

6.8508

4.5 - 6.4

13

12.6168

6.5 - 8.4

6

7.6278

8.5 - 10.4

3

1.5132

Total

30

(29.829)



Complete the table before proceeding!
Check your answer

Full Calculation for Q W4.3a

To calculate expected values we use the Gaussian equation which is the mathematical equation that describes the Normal distribution.

Gaussian equation As explained in 3.2.1 and appendix c, e is a constant and approximately equals 2.72. Pi (Pi) is also a constant and approximately equals 3.14159. The mean for this data is mean1 = 5.54667 and the variance s2 = 3.59016 (BOX 7.1). x will be the mid points for each class as follows:
   

Heights

of shells

(mm)

 

Size classes

0.5 - 2.4

2.5 - 4.4

4.5 - 6.4

6.5 - 8.4

8.5 - 10.4

Mid point of size class

1.45

3.45

5.45

7.45

9.45



Two parts of this calculation remain the same throughout and can be worked out first.

1/root(2(s2) = 1/root( 2 × 3.14159 × 3.59016) = 1/root22.55764 = 1 / 4.74949 = 0.21055

2s2 = 2 × 3.59016 = 7.18032

Therefore, using the mid points of the classes, when

x = 1.45

h = (1.45 - 5.54667)2 / 7.18032 = (- 4.09667)2 / 7.18032 = 16.78267 / 7.18032 = 2.33732

e-h = e-2.33732 = 0.09659

(For more information about using the e and power function buttons on your calculator see appendix c and the instruction manual for your calculator).

y = 0.21055 × 0.09659 = 0.02034

x = 3.45

h = (3.45 - 5.54667)2 / 7.18032 = (- 2.09667)2 / 7.18032 = 4.39603 / 7.18032 = 0.61223

e-h = e-0.61223 = 0.54214

y = 0.21055 × 0.54214 = 0.11418

x = 5.45

h = (5.45 - 5.54667)2 / 7.18032 = (- 0.09667)2 / 7.18032 = 0.00935 / 7.18032 = 0.00130

e-h = e-0.00130 = 0.99869

y = 0.99869 × 0.21055 = 0.21028

x = 7.45

h = (7.45 - 5.54667)2 / 7.18032 = (1.90333)2 / 7.18032 = 3.62267 / 7.18032 = 0.50453

e-h = e-0.50453 = 0.60379

y = 0.60379 × 0.21055 = 0.12713

x = 9.45

h = (9.45 - 5.54667)2 / 7.18032 = (3.90333)2 / 7.18032 = 15.2359 / 7.18032 = 2.12191

e-h = e-2.12191 = 0.11981

y = 0.11981 × 0.21055 = 0.02522

These y values are worked out as proportions per unit class size interval and our final step is to calculate expected numbers from these proportions. Since the sample size for our periwinkles is 30 and the width of each class is 2 mm, then the expected numbers are:

x = 1.45 y = 0.02034 ×30 x 2 = 1.2204

x = 3.45 y = 0.11418 ×30 x 2 = 6.8508

x = 5.45 y = 0.21028 ×30 x 2 = 12.6168

x = 7.45 y = 0.12713 ×30 x 2 = 7.6278

x = 9.45 y = 0.02522 ×30 x 2 = 1.5132

In this example, 2/5 expected values are less than 5. However if we combined just one of these classes we would have only 4 classes against which to test our hypotheses. In this example we have decide to continue with our analysis but will be aware that we are making an assumption here. [How might you re-design the investigation to deal with this problem?]

4

Q W4.3b

What is chi-square (calculated)?

[If you would like to save a record of your answer, please type it into this Word document]

1.87

Full calculation:

chi-squared calculated = 1.8678
Check your answer

There are also instructions on how perform this calculation using the following software packages:

Excel

SPSS

Minitab

How to calculate Q W4.3b in Excel

Step 1: Enter the data into the Excel spreadsheet using suitable column headings. The '(o)' means 'observed' - to distinguish it from the expected values we are going to calculate later.

Excel: Step 1 (i)

Calculate the total number of periwinkles: in a suitable (labelled) cell, enter the formula '=sum(b2:b6)'.

Excel: Step 1 (ii)

Enter the mean (5.54667 mm) and the variance (3.59016 mm2).

Excel: Step 1 (iii)

The standard deviation is the square root of the variance: use the formula '=sqrt(b10)'.

Excel: Step 1 (iv)

Step 2: Calculate the expected values. To do this, we first need to find the centre of each range (class) interval. These can be calculated manually and entered separately.

Excel: Step 2 (i)

Next, we use Excel to calculate the frequency for a Normal distribution at each of these midpoints. In cell d3, type the formula '=normdist(c2,$b$9,$b$11,false)'. In this formula:

'c2' is a relative reference to the cell immediately to the left, which contains the value at which we want the frequency: this will change as we drag the formula down the column;

'$b$9' is an absolute reference to the cell containing the mean of the distribution: this will NOT change as we drag the formula down the column;

'$b$11' is another absolute reference, this time to the cell containing the standard deviation of the distribution;

and 'false' is a logical argument telling the computer that we want the probability mass function, not the cumulative distribution function.

Click on the green tick, or press 'return'.

Excel: Step 2 (ii)

Drag this down into cells d3 to d6 to find the other probabilities.

Excel: Step 2 (iii)

The final step is to calculate the expected numbers. To do this, we multiply the probabilities by the total number of periwinkles (as stored in cell b7) and by the width of the range interval (2 in all cases here). Put a suitable title (e.g. 'number (e)' - where the '(e)' means 'expected') in cell e3, then click in cell e4. Type in the formula '=d2*$b$7*2', where 'd2' is a relative reference to the cell to the left, and '$b$7' is an absolute reference to the cell containing the total number of periwinkles. Click on the green tick, or press 'return'.

Excel: Step 2 (iv)

Now drag the formula down to cells e3-e6.

Excel: Step 2 (v)

In this example 2/5 expected values are less than 5. However if we combined just one of these classes we would have only 4 classes against which to test our hypotheses. In this example we have decide to continue with our analysis but will be aware that we are making an assumption here. How might you re-design the investigation to deal with this problem?

Step 3: Work out the goodness of fit chi-squared.

First, calculate (obs-exp)2/exp for each range (class). Do this in a new column, using the formula '=(b2-e2)^2/e2' in the first result cell.

Excel: Step 3 (i)

Drag this down to fill the results space.

Excel: Step 3 (ii)

Now add these all together, and put them in a suitable (labelled) cell. Use the formula '=sum(f2:f6)'.

Excel: Step 3 (iii)

Therefore, the value of chi-squaredcalculated is 1.862.

How to calculate Q W4.3b in SPSS

Step 1. Set up the variables.

(i) When SPSS starts, select the 'Type in data' option.

(ii) Then choose 'variable view' from the tabs at the bottom left.

SPSS: Step 1 (i)

You will see a screen something like this:

SPSS: Step 1 (ii)

Each row represents a variable for the analysis.

(iii) In the name for variable 1, type 'HEIGHT' (SPSS won't accept capital letters as parts of a Name, and Names can be no more than eight characters long). Most of the other characteristics of the variable will be give default values as below:

SPSS: Step 1 (iii)

(iv) To input the values for the ranges, click in the 'values' cell for 'height'. This will produce a grey area at the right of the cell.

SPSS: Step 1 (iv)

and clicking on this will give a dialogue box for inputting the values.

SPSS: Step 1 (v)

Type in a value of 1 and a value label of '0.5-2.4'. Then click on 'Add'. This will add your new value-label pair to the window at the bottom of the dialogue box.

SPSS: Step 1 (vi)

Repeat for the other four value-label pairs, then click on 'OK'. You should finish up with something like this:

SPSS: Step 1 (vii)

(If you click on the 'decimals' cell, and change the number of decimal places to zero using the arrows, the '1.00' at the start of the Values becomes simply '1'.)

SPSS: Step 1 (viii)

(v) In the 'label' column, enter a description of the ranges, for example 'Height of periwinkle shells (mm)'. Note that the cell width expands to fit the text.

SPSS: Step 1 (ix)

(vi) Next set up the variable to contain the actual counts of periwinkles. Give it the name 'number', and set decimals to zero (we can't have fractions of a periwinkle). Put in the label 'Number of periwinkles'.

SPSS: Step 1 (x)

Step 2. Enter the Data

(i) Transfer to Data View using the tab at the bottom left of the screen. You should get something like this:

SPSS: Step 2 (i)

(ii) Another quirk of SPSS is that you have to input the numbers before it lets you put in the labels. Put the numbers of periwinkles into the second column:

SPSS: Step 2 (ii)

(iii) Check that value labels are enabled by going to 'View' and ensuring that 'Value Labels' is selected.

SPSS: Step 2 (iii)

Now click in the first 'height' cell. You will get a drop-down menu of the values you put in while in variable view.

SPSS: Step 2 (iv)

Select the first one (0.5-2.4). Repeat for the other ranges.

SPSS: Step 2 (v)

Step 3. Perform the test.

(i) The ranges are our test variables, and the weightings for each variable are the number of periwinkles. The first thing to do is assign the weightings.

Go to 'Data' and select 'Weight Cases'. This brings up a dialogue box.

SPSS: Step 3 (i)

Make sure that the 'weight cases by' radio button is clicked. In the left-hand window, there will be a list of variables. Select 'Number of periwinkles' from the list, and click on the arrow to transfer it to the box labelled 'Frequency Variable'.

SPSS: Step 3 (ii)

Click on 'OK'.

(ii) We now need to calculate our expected values. For this, we need some descriptive statistics of our data, in particular the mean and standard deviation

The mean and variance have been found in Box 7.1 to be 5.54667 mm and 3.59016 mm2 respectively. The standard deviation is the square root of the variance, which is 1.89477 mm.

We will also need to input the midpoints of our size ranges. Go back to 'variable view' and create a new variable called 'midpoint'. (the default properties (numeric; two decimal places) are OK. Return to 'data view', and enter the midpoints of each range.

SPSS: Step 3 (iii)

(iii) Next, we use these to calculate our expected frequencies. Go to 'Transform' and 'Compute'. You will get a window like this:

SPSS: Step 3 (iv)

In the 'Target Variable' window, type the name of the quantity to be computed: in this case, 'Normal' would be a good name. Then scroll down the list of available functions, and select PDF.NORMAL(q,mean,stddev). PDF stands for 'probability density function', and q is the value at which the function is to be calculated. Click on the up arrow to move this into the top right window.

SPSS: Step 3 (v)

When the first question mark is highlighted, select 'midpoint of height range' from the bottom left-hand window, and click on the right-arrow to transfer it into the expression we are building.

SPSS: Step 3 (vi)

The second question mark represents the mean (5.54667), and the third represents the standard deviation (1.89477), so we will have to type them in by hand.

SPSS: Step 3 (vii)

Finally, click on 'OK'.

SPSS: Step 3 (viii)

We have a list of probability densities (proportion of periwinkles per unit size interval) as if the distribution were normal. To find the expected values, we need to multiply these by the width of the size interval (2 mm) and the total number of periwinkles (30). However, SPSS doesn't need exact numbers, but can work with the probability densities.

It might be an idea to improve the precision of our probability densities: go back to 'variable view' using the tabs at bottom-left, and change the 'decimals' property of 'normal' to 5. Return to data view.

SPSS: Step 3 (ix)

(iv) Now we go back to the test. Go to 'Analyze' (sic), 'Nonparametric Tests' and 'Chi-Square'.

SPSS: Step 3 (x)

Select 'height of periwinkle shells' and click on the right-pointing arrow to transfer it to the 'Test Variable List' window.

SPSS: Step 3 (xi)

Next, we need to enter our expected values. Select the 'Values' radio button in the 'Expected Values' box. At this stage, it would help to pull the 'Chi-Square Test' box down (click and drag on its title bar) so that the data behind can be seen.

SPSS: Step 3 (xii)

It is important that the values are entered in the correct order. In the text box next to the label 'Values', enter the first value from the 'Normal' column (0.0203 - you can only enter 4 decimal places). Click on 'Add', and it will appear in the window below.

SPSS: Step 3 (xiii)

Repeat the process for all the other values in the 'Normal' column.

SPSS: Step 3 (xiv)

Click on 'OK'. The results will appear in a separate window.

NPar Tests

Chi-Square Test

Frequencies

chi-square test frequency results

From the final table, the value of chi-squared is 1.853.

How to calculate Q W4.3b in Minitab

Step 1: Enter the raw numbers into the spreadsheet part of the Minitab window in the form of a summary results table. The two columns will be 'Length (mm)' and 'Number (o)', where the '(o)' means '0bserved'.

Minitab: Step 1

Step 2: Next, we need to calculate the expected values. To do this, we will need to calculate the Normal distribution values based on the mean and standard deviation of our sample.

The mean and variance are given in the question (5.54667 mm and 3.59016 mm2 respectively), and the standard deviation can be found by taking the square root of the variance. Enter these as constants. Go to 'Calc', 'Calculator', put 'k1' in the 'Store results in variable' window, and put '5.54667' in the expression window.

Minitab: Step 2 (i)

Click on 'OK'. The main display should be unchanged.

Go to the project manager (tab at bottom left), and open the 'constants' folder. Right-click on the word 'unnamed' to the left of k1, select 'rename', and type in the name 'mean'.

Minitab: Step 2 (ii)

Minimise the project manager.

Repeat for the standard deviation using constant k2, the expression 'sqrt(3.59016)' and the name 'standard deviation'.

Minitab: Step 2 (iii)

Next, we need the total number of observations. This is simply the sum of the numbers in column 2. Go to 'Calc', 'Calculator', put 'k3' in the 'Store result in variable' window, and put 'sum(c2)' in the expression window.

Minitab: Step 2 (iv)

Click on 'OK'. Open the project manager, and give k3 the name 'total'.

Minitab: Step 2 (v)

Find the midpoints of the ranges (classes). These are given by half the sum of the top and bottom limits, so they can be calculated manually and entered by hand.

Minitab: Step 2 (vi)

The expected values can be found as follows: go to 'Calc', 'Probability Distributions', 'Normal'.

Minitab: Step 2 (vii)

Select 'Probability density'.

Type in 'k1' for the mean and 'k2' for the standard deviation.

Type in c3 for the input column, and 'probability' for the optional storage.

(Alternatively, many of these can be entered by clicking in the windows, selecting from the left-hand window and clicking on 'Select'.)

Minitab: Step 2 (viii)

Click on 'OK'.

Minitab: Step 2 (ix)

To find the actual expected values, we need to multiply the probabilities by the total number of observations, N, and by the width of the frequency class, which is 2 in this case. Go to 'Calc', Calculator', enter 'number (e)' in the 'Store results in variable' window, and type 'c4*k3*2' in the 'Expression' window.

Minitab: Step 2 (x)

Click on 'OK'.

Minitab: Step 2 (xi)

Step 3: We can now do the test to compare the actual values (in column 2) and the expected ones from a Normal distribution (in column 5). Calculate chi-squared by going to 'Calc', 'Calculator', typing 'chi-squared' in the 'store results in variable' window, and 'sum((c2-c5)**2/c5)' in the 'Expression' window.

Minitab: Step 3 (i)

Then click on 'OK'.

Minitab: Step 3 (ii)

The value of chi-squaredcalculated is therefore 1.86214.

5

Q W4.3c

What are the degrees of freedom?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

When using the goodness of fit chi-squared test to confirm a particular distribution the degrees of freedom (degrees of freedom) = a - 2 (BOX 5.2.) where a is the number of categories. For this example degrees of freedom = 5 - 2 = 3.Check your answer

6

Q W4.3d

What is chi-squared (critical) at p = 0.05?

[If you would like to save a record of your answer, please type it into this Word document]

7.81

Check your answer

7

Q W4.3e

Therefore, do you reject the null hypothesis?

[If you would like to save a record of your answer, please type it into this Word document]

a) Yes
b) No
Correct. chi-squared (calculated) (1.86) is less than chi-squared (critical) (7.81) at p = 0.05 and therefore we do not reject the null hypothesis. Incorrect. chi-squared (calculated) (1.86) is less than chi-squared (critical) (7.81) at p = 0.05 and therefore we do not reject the null hypothesis.
Check your answer

8

Q W4.3f

What does this mean in real terms?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

There is no difference (chi-squared (calculated) = 1.86, p = 0.05) between the numbers of periwinkles in the frequency classes for shell height (mm) and that expected if the data have a Normal distribution.

(If you are reporting the results for this calculation having used one of the statistical software packages your p value will be different as you will be reporting the exact p value at which the decision was made not the threshold value).

Check your answer