Exercise 3
In Chapter 3 we introduced a number of different distributions including the Normal distribution and the Poisson distribution. Each distribution has a set of characteristics. For example in data with a Poisson distribution (3.5.3iii) the mean will approximately equal the variance. To confirm that data has a particular distribution you can use the chi-squared goodness of fit test (5.1.3). In this exercise we take you through the steps needed to confirm that your data has a Poisson distribution and in Exercise 4 we examine data that we believe is Normally distributed.
Example W5.3: Seed dispersal in Taxus baccata (yew)
An undergraduate study of seed dispersal in the vicinity of a single yew tree generated the data that were included in Table 3.3.
Table 3.3. The distance (m) seed are dispersed from the canopy edge of one Taxus baccata (yew) tree
Distance (m) |
Number of seed |
|---|---|
0 |
13 |
1 |
27 |
2 |
27 |
3 |
18 |
4 |
9 |
5 |
4 |
6 |
1 |
7 |
0 |
These data clearly have a skewed leptokurtic distribution (3.4) and the mean (1.99m) approximately equals the variance (1.87 m). This may be a Poisson distribution.
1 |
2 |
Q W3.2aCarry out a chi-squared goodness of fit test. The first step is to use the Poisson equation (3.2.3) to calculate the expected values (5.1.3). Use this table to enter your expected values. Have you completed the table? |
Full calculation for Q W3.2a
The Poisson equation is:
This can be used as outlined in 5.1.3 to calculate expected numbers. The first stage is to use this formula to calculate expected proportions and then calculate expected numbers.
We know that e is a constant with the approximate value of 2.72 andx |
x! |
y |
||
|---|---|---|---|---|
when |
x = 0m |
|
x! = 0! = 1 |
y = 0.13671 × 1/1 = 0.13671 |
when |
x = 1m |
|
x! = 1! = 1 |
y = 0.13671 × (1.98990/1) = 0.27204 |
when |
x = 2m |
|
x! = 2! = 2 |
y = 0.13671 × (3.95970/2) = 0.27066 |
when |
x = 3m |
|
x! = 3! = 6 |
y = 0.13671 × (7.87940/6) = 0.17953 |
when |
x = 4m |
|
x! = 4! = 24 |
y = 0.13671 × (15.67921/24) = 0.08931 |
when |
x = 5m |
|
x! = 5! = 120 |
y = 0.13671 × (31.20004/120) = 0.03554 |
when |
x = 6m |
|
x! = 6! = 720 |
y = 0.13671 × (62.08493/720) = 0.01179 |
when |
x = 7m |
|
x! = 7! = 5040 |
y = 0.13671 × (123.5427/5040) = 0.00335 |
These y values are worked out as proportions and assume that each class is only 1 unit in size. Our final step is to calculate expected numbers from these proportions. Since the total number of observations is 99 and the class size is 1m then the expected numbers are
when:
x = 0, y = 0.13671 × 99 = 13.53421
x = 1, y = 0.27294 × 99 = 26.93172
x = 2, y = 0.27066 × 99 = 26.79570
x = 3, y = 0.17952 × 99 = 17.77358
x = 4, y = 0.08931 × 99 = 8.84191
x = 5, y = 0.03554 × 99 = 3.51890
x = 6, y = 0.01179 × 99 = 1.16704
x = 7, y = 0.00335 × 99 = 0.33176
3 |
There are also instructions on how perform this calculation using the following software packages:
How to calculate Q W3.2b in Excel
Step 1: Enter the data into the Excel spreadsheet using suitable column headings. The '(o)' means 'observed' - to distinguish it from the expected values we are going to calculate later.

Step 2: Calculate the expected values. To do this, we need to find the total number of seeds, the mean distance and the standard deviation of the distance. The total number can be found by using the formula '=sum(b3b10)' in a suitable cell (make sure it is labelled so that you know what the number means). Type in the label, click in the cell where you want the number to appear, and type the formula into the formula bar. Click on the green tick, or press 'return'.

The mean and variance are given in the question: mean = 1.99 m; variance = 1.87 m2. Add these values to the spreadsheet in suitable (labelled) cells.

The standard deviation is simply the square root of the variance, and can be calculated using the formula '=sqrt(b14)'.

Next, we use Excel to calculate the frequency for a Poisson distribution for each distance. In cell c3, type the formula '=$b$12*poisson(a3,$b$13,false)'. In this formula:
'$b$12' is an absolute reference to the cell containing the total number of seeds;
'poisson' is an Excel function that gives the Poisson frequency distribution;
'a3' is a relative reference to the cell two to the left, which contains the value at which we want the frequency: this will change as we drag the formula down the column;
'$b$13' is an absolute reference to the cell containing the mean of the distribution: this will NOT change as we drag the formula down the column;
and 'false' is a logical argument telling the computer that we want the probability mass function, not the cumulative distribution function.

Drag this down into cells c4 to c10 to find the other expected values.

Here we hit a problem: several of these expected values are less than 5, and this violates one of the criteria for a chi-squared test. We can overcome this by combining all distances greater than or equal to 5 m into a single class. The expected value for this class will be 'total - (expected values for all other classes)', to include even larger distances.
Create two new columns, with headings indicating that they are to store values used for testing.

Copy across the values we are going to use unchanged. Before we can do this, we need to make a change to the formula for the expected values. We assumed that the formula would always be in the same column, so we didn't bother to make the column for the distance an absolute reference. If we simply copy it across, it will refer to column C, which is not what we want. Click in cell c3, and insert a '$' sign just in front of the 'A' in the first argument of the 'poisson' function.

Click on the green tick, or press 'return'. Now copy this new formula down into the rest of column C: there should be no visible change.
Click in cell b3, hold down the left mouse button, and drag the cursor to cell c7 before releasing the mouse button.

Either go to 'Edit', 'Copy', or press ctrl-c. This copies the selected cells to the clipboard.
Click in cell d3, and either go to 'Edit', 'Paste', or press ctrl-v.

The value to go in cell d8 is the sum of all the counts for distances greater than or equal to 5 m: use the formula '=sum(b8:b10)'.

The expected value for distances greater than or equal to 5 m is 1 - the sum of all the other expected values. In cell e8, use the formula '=b12-sum(e3:e7)'.

Step 3: Work out chi-squared.
First, work our (o-e)2/e for each class. Create a new column; label it, and type the formula '=(d3-e3)^2/e3' into the third cell. Click on the green tick, or press 'return'.

Now drag this down to fill the column.

Chi-squared is the sum of all these values. Into a suitable (labelled) cell, type the formula '=sum(f3:f8)'. Click on the green tick, or press 'return'.

The value of chi-squared is 0.0314.
How to calculate Q W3.2b in SPSS
Step 1. Set up the variables.
When SPSS starts, select the 'Type in data' option. Then choose 'variable view' from the tabs at the bottom left.

You will see a screen something like this:

Each row represents a variable for the analysis. In the name for variable 1, type 'distance' (SPSS won't accept capital letters as parts of a Name, and Names can be no more than eight characters long). Most of the other characteristics of the variable will be give default values as below:

Repeat for the variable 'number', which we will use to record the number of yew seeds.
In both cases, our values are integers, so we can adjust the property 'decimals' to zero by clicking in the 'decimals' cell, and then using the 'up-and-down' arrows to make the change.

Step 2. Enter the Data
Transfer to Data View using the tab at the bottom left of the screen. You should get something like this:

Type in the data.

Step 3: Perform the test
The distances are our test variables, and the weightings for each variable are the number of seeds. The first thing to do is assign the weightings.
Go to 'Data' and select 'Weight Cases'.

This brings up a dialogue box. Make sure that the 'weight cases by' radio button is clicked. In the left-hand window, there will be a list of variables. Select 'Number' from the list, and click on the arrow to transfer it to the box labelled 'Frequency Variable'.

Click on 'OK'.
We now need to calculate our expected values. For this, we need some descriptive statistics of our data, in particular the mean.
Go to 'Analyze' (sic), 'Descriptive Statistics' and 'Frequencies'.

Select 'distance', and click on the arrow to transfer it to the 'Variable(s)' window.

Click on 'Statistics', and check that 'mean' has been selected (because this is what we will need to calculate the frequencies for a Poisson distribution).

Click on 'continue', and then on 'OK'. The results will appear in a separate window.
Frequencies

(The 'warning' is because the number of seeds at 7m is zero, and we have left this in the analysis.)
This tells us that the mean distance travelled is 1.99m. We can now calculate the Poisson probabilities. Go to 'Transform', 'Compute'.

You will get a dialogue box. In the 'Target Variable' window, type the name of the quantity to be computed: in this case, 'Poisson' would be a good name. Then scroll down the list of available functions, and select PDF.POISSON(q,mean). PDF stands for 'probability density function', and q is the value at which the function is to be calculated. Click on the up arrow to move this into the top right window.

When the first question mark is highlighted, select 'distance' from the bottom left-hand window, and click on the right-arrow to transfer it into the expression we are building.

The second question mark represents the mean (1.99), and we will have to type this in by hand.

Finally, click on 'OK'.

We have generated a list of the expected frequencies if the distribution were Poisson. SPSS is able to use these instead of actual expected numbers in the chi-squared test.
However, note that some of the actual measurements are less than 5, which violates one of the requirements for a chi-squared test. (Specifically, the requirement deals with the expected values, but they are supposed to be similar.) It would be good to combine all measurements for distances of 5 m or more to overcome this.
Delete rows 7 and 8 by clicking in the row label at the left-hand side of the screen: this should highlight the whole row. Now press the 'delete' key on the keyboard, or go to 'edit', 'cut'.
The number in row 6 now represents all seeds that travelled 5 m or more, and so needs changing from 4 to 5.
The value in the 'Poisson' column is best obtained by adding all the other values together, and subtracting the result from 1: we probably need more than two decimal places to do this accurately, do change the 'decimals' property of 'Poisson' by going to variable view using the tabs at bottom-left and performing the change (as described at the beginning for 'distance' and 'number') - five is probably adequate. This gives a pdf value for our last class of 0.05176.
Now we go back to the test. Go to 'Analyze' (sic), 'Nonparametric Tests' and 'Chi-Square'.

Select 'distance' and click on the right-pointing arrow to transfer it to the 'Test Variable List' window.

Next, we need to enter our expected values. Select the 'Values' radio button in the 'Expected Values' box. At this stage, it would help to pull the 'Chi-Square Test' box down (click and drag on its title bar) so that the data behind can be seen.

It is important that the values are entered in the correct order. In the text box next to the label 'Values', enter the first value from the 'Poisson' column (0.1367 - the system will only accept four decimal places). Click on 'Add', and it will appear in the window below.

Repeat the process for all the other values in the 'Poisson' column.

Click on 'OK', and the output will appear in a separate window.
NPar Tests
Chi-Square Test
Frequencies

The value of chi-squared is 0.032.
How to calculate Q W3.2b in Minitab
Step 1: Using the data for the ladybirds, enter the raw numbers into the spreadsheet part of the Minitab window in the form of a summary results table.

Step 2: Next, we need to calculate the expected values. To do this, we will need to calculate the Normal distribution values based on the mean and standard deviation of our sample. The mean and variance are given in the question (1.99 m and 1.87 m2 respectively), so we enter them as constants.
Go to 'Calc', 'Calculator', enter 'k1' in the 'store result in variable' window, and put '1.99' in the expression window.

Click on 'OK'. The main display should be unchanged.
Go to the project manager (tab at bottom left), open the 'constants folder, and right-click on the word 'unnamed' to the left of k1. Type in the name 'mean'.

Minimise the project manager, then repeat the process to put the standard deviation in k2. However, this time, we have the variance, so the expression will be 'sqrt(1.87)'.

Give k2 the name 'standard deviation'.

We will also need the total number of yew seeds: Calculate this using the expression 'sum(c2)', place the result in k3, and give k3 the name 'total'.

The expected values can be found as follows: go to 'Calc', 'Probability Distributions', 'Poisson'.

Select 'Probability', put 'k1' in the 'mean' window, and put 'c1' in the 'input column' window, and 'probability' in the 'optional storage' window.

Click on 'OK'.

To find the expected numbers of seeds, we multiply the probabilities by the total number of seeds, as stored in k3. Go to 'Calc', 'Calculator', put 'number (e)' in the 'Store result in variable' window, and type in the expression 'c3*k3'.

Click on 'OK'.

Step 3: We can now do the test to compare the actual values (in column 2) and the expected ones from a Poisson distribution (in column 4).
First, we note that several of the expected values are less than 5, which violates one of the requirements of a chi-squared test. To mitigate this problem, we combine all measurements for distances greater than or equal to 5 m. First, we are going to create two new columns to contain the observed and expected values to be used in the test: call the, 'test (o)' and 'test (e)', and fill them with the numbers form columns 2 and 4 respectively. Go to 'Calc', 'Calculator', put 'test (o)' in the 'Store result in variable' window, and put 'c2' in the expression window.

Click on 'OK'.

Repeat for 'test (e)' and c4.

Remove the last three entries from columns 5 and 6: highlight the cells, and press 'delete'.

The last class in column 5 contains all seeds further than 4 m from the canopy edge.
First, enable commands by clicking in the session (upper) Minitab window, then going to 'Editor' and selecting 'Enable commands'.

At the 'MTB >' prompt, type 'let c5(6) = c2(6) + c2(7) + c2(8)', and press 'return'.

In cell c6(6), the expected value for distances greater than or equal to 5 m is the total (in k3) minus all the other expected values. Type in the command 'let c6(6) = k3 - sum(c6)', and press 'return'.

Next, calculate chi-squared. This is done by going to 'Calc', 'Calculator', typing 'k4' in the 'store results in variable' window, and 'sum((c5-c6)**2/c6)' in the 'Expression' window.

Then hit 'OK'.
Go to the project manager, open the constants folder, and rename k4 as 'chi-squared'.

The value of chi-squared is therefore 0.0313634.
4 |
5 |
6 |
Q W3.2eTherefore, do you reject the null hypothesis? [If you would like to save a record of your answer, please type it into this Word document] |
7 |