Skip to main content

Exercise 4

The aim of this interactive exercise is to illustrate how useful some statistical software packages can be when you have a large data set. Therefore, the calculations in this exercise are only illustrated using Excel, Minitab and SPSS. This example is also used in Chapter 6 interactive exercise 3 so you may already have it as a spreadsheet. If not, we suggest you save the spreadsheet to use in this other exercise.

Example W6.1: Gender and the girth of yew trees

As part of a larger investigation of seedling recruitment in Taxus baccata (yew) trees in the Wyre Forest, Shropshire, an undergraduate measured the height (m) and DBH (diameter at breast height, m) of all yew trees within the Wyre Forest. Yews are dioecious and found as separate male and female trees. Table W6.2 includes the full data set given in relation to the area within the forest indicated by the stand code, the gender of the tree, height (m) and DBH (m).

Table W6.2: DBH (m), height (m) and gender of Taxus baccata in the Wyre Forest, Shropshire 2005

Stand code

DBH (m) of male trees

Height (m)

DBH (m) of female trees

Height (m)

8065

0.43

11.8

0.47

11.3

 

0.23

5.2

-

-

 

0.25

6.7

-

-

Total number of trees

3

 

1

 

8066

0.72

13.2

0.42

4.7

 

0.51

14.8

0.35

9.3

 

0.68

12.6

0.34

5.2

 

0.59

14.5

0.66

10.4

 

0.49

9.7

0.40

8.5

 

0.57

9.9

0.33

9.0

 

0.38

7.6

0.37

7.9

 

0.79

7.4

0.66

8.4

 

0.72

8.0

0.77

8.8

 

0.61

6.4

0.26

8.9

 

0.79

9.3

0.41

8.2

 

0.82

10.0

0.43

8.3

 

0.48

8.9

0.15

1.8

 

0.48

14

0.30

7.9

 

0.50

10.4

0.54

15.0

 

0.53

10.2

0.39

9.8

 

0.45

8.5

0.63

9.7

 

0.72

9.3

0.37

9.7

 

0.60

11.6

0.64

9.9

 

0.64

13.5

0.64

9.0

 

0.69

6.6

0.53

9.0

 

0.20

7.4

-

-

 

0.45

10.0

-

-

 

0.41

9.7

-

-

Total number of trees

24

 

21

 

8067

0.55

16.8

0.73

13.5

 

0.58

13.0

0.83

16.0

 

0.39

17.0

0.39

6.5

 

0.41

8.8

0.29

8.6

 

0.30

5.7

0.95

9.7

 

0.45

7.5

0.24

4.7

 

0.46

5.8

0.53

7.2

 

0.32

4.8

0.69

8.4

 

0.35

7.0

0.49

7.7

 

0.32

6.7

0.35

8.2

 

0.37

11.2

0.28

8.8

 

-

-

0.42

9.8

Total number of trees

11

 

12

 

8068

0.72

13.4

0.90

15.5

 

0.49

8.5

0.67

8.2

 

0.45

8.5

0.74

12.1

 

0.49

16.5

0.73

11.0

 

0.52

11.2

0.58

8.7

 

0.51

9.6

0.96

14.4

 

0.56

12.2

0.90

11.6

 

0.22

8.5

0.28

11.6

 

0.35

10.0

0.60

11.8

 

0.36

11.8

0.52

12.0

 

0.48

10.4

0.41

9.8

 

0.31

10.0

0.57

14.5

 

0.38

12.5

0.58

13.5

 

0.26

10.6

0.31

10.1

 

0.41

10.7

0.62

10.5

 

0.33

11.5

0.27

7.3

 

0.17

6.3

-

-

 

0.30

13.5

-

-

 

0.48

7.0

-

-

 

0.67

15.0

-

-

 

0.56

13.9

-

-

 

0.64

9.6

-

-

 

0.40

10.5

-

-

 

0.35

11.5

-

-

 

0.29

7.0

-

-

 

0.54

12.4

-

-

 

0.36

11.0

   

Total number of trees

27

 

16

 

8069

1.02

13.0

0.21

6.0

 

0.55

6.5

0.41

6.0

 

0.28

5.5

0.55

6.9

 

0.46

8.5

0.62

10.1

 

0.53

7.6

0.89

9.5

 

0.50

10.0

0.53

7.6

 

0.78

8.5

0.57

10.7

 

0.53

11.5

0.59

10.9

 

0.43

11.5

0.58

12.5

 

0.59

12.4

0.76

12.4

 

0.19

10.4

0.59

8.0

 

0.62

11.4

0.67

13.0

 

0.42

16.8

0.70

12.2

 

0.35

8.1

0.61

14.6

 

0.73

10.8

0.58

13.0

 

0.74

10.5

0.71

9.6

 

0.40

10.0

0.60

6.7

 

0.78

11.0

-

-

 

0.44

8.6

-

-

 

0.40

9.2

-

-

 

0.66

12.7

-

-

 

0.37

13.0

-

-

 

0.10

9.6

-

-

 

0.60

16.2

-

-

 

0.97

11.2

-

-

 

0.60

11.4

-

-

 

0.81

11.0

-

-

 

0.76

9.7

-

-

 

0.42

11.0

-

-

 

0.50

6.2

-

-

 

0.39

10.0

-

-

 

0.50

11.8

-

-

 

0.29

8.2

-

-

 

0.79

14.0

-

-

Total number of trees

34

 

0.17

 

8070

0.60

10.9

0.26

3.6

 

0.46

11.9

0.47

6.5

 

0.69

12.8

0.62

8.5

 

0.53

12.5

0.59

12.5

 

0.75

17.0

0.44

9.0

 

0.85

18.3

0.28

6.8

 

0.51

8.5

0.54

8.0

 

0.33

10.5

0.28

9.9

 

0.82

8.0

0.56

14.0

 

0.52

12.0

0.63

16.0

 

0.63

10.5

-

-

 

0.74

7.6

-

-

 

0.27

7.5

-

-

 

0.21

6.5

-

-

 

0.66

11.0

-

-

 

0.12

0.7

-

-

 

0.62

8.7

-

-

 

0.73

10.0

-

-

Total number of trees

18

 

10

 

8071

0.19

15.2

0.48

10.5

 

0.49

13.1

0.49

13.0

 

0.78

14.9

-

-

Total no. of trees

3

 

2

 

8072

0.70

12.4

0.73

10.0

 

0.36

8.5

0.41

9.7

 

0.16

4.5

0.33

15.0

 

0.46

8.5

0.22

4.5

 

0.48

9.5

0.28

9.0

 

0.88

12.0

0.55

12.7

 

0.65

13.0

-

-

 

0.43

15.0

-

-

 

0.35

11.0

-

-

 

0.35

12.3

-

-

 

0.75

12.0

-

-

Total number of trees

11

 

6

 

8073

0.57

7.0

0.61

12.5

 

0.73

12.0

0.43

5.8

 

0.44

14.0

0.34

11.0

 

0.39

11.8

0.56

7.0

Total number of trees

4

 

4

 

8074

-

-

0.70

10.6

 

-

-

0.90

11.8

 

-

-

1.01

10.9

 

-

-

0.58

6.9

Total number of trees

0

 

4

 

8075

1.15

16.0

0.82

13.8

 

-

-

0.53

13.8

Total number of trees

1

 

2

 

8076

0.44

12.0

0.68

11.4

Total number of trees

1

 

1

 

8077

1.00

13.5

0.60

5.5

 

0.62

6.0

-

-

 

0.94

15.5

-

-

Total number of trees

3

 

1

 

8080

0.63

12.5

0.52

5.5

 

0.42

10.5

0.88

15.5

 

-

-

0.77

13.0

Total number of trees

2

 

3

 


1

Q W4.1

Clearly this is an extensive data set. As part of the evaluation of these data the student had planned to compare the DBH for the male and female trees using a z test (7.2). Using statistical software carry out a z test to test the hypotheses:

H0: There is no difference between the mean diameter at breast height (DBH, m) between the male and female yew trees in the Wyre forest.

H1: There is a difference between the mean diameter at breast height (DBH, m) between the male and female yew trees in the Wyre forest.

We have explained how to carry out a z test using Excel. In Minitab and SPSS we have used a t test which approximates to z when sample sizes are large.

Enter your answers into this table. Have you completed the table?

a) Yes
b) No

We have shown how to calculate the answers using the following software packages:

Excel

SPSS

Minitab

Complete the table before proceeding!
Check your answer

How to calculate Q W4.1 in Excel

Step 1: Enter the data set as given above into a new Excel spreadsheet (as shown below).

Excel: Step 1

Step 2: Find the variances of the two data sets using fx function.

Click in a cell beneath the data set for the DBH for male trees. Click on fx on the top tool bar and a dialogue box will open. Click on the top cell B2 and drag down over the entire data set and the cell locations will be shown in the dialogue box. The variance value is shown in the dialogue box. Click on OK and the value will be returned in the chosen cell.

Excel: Step 2

Step 3: Repeat this procedure for the DBH female tree data. Note that Excel can accommodate the empty cells.

Alternatively, Clicking on the cell where the result has just been returned, click on the small square on the right hand lower corner and drag across the row. The variance of each of the data sets will be given under each column respectively.

Step 3

F DBH represents the calculated F value for DBH

f ht represents the calculated F value for height

Step 4: The variances can then be checked for similarity using the F test.

Select an empty cell, click on it to highlight it. Put a '=' in the cell then click on the cell with the larger variance for DBH and then insert a division sign '/' and click on the cell with the smaller variance for DBH and press return. The Fcalculated value will be returned in the cell. Check this with the Fcritical value in the statistical table. The calculated value is less than the critical value so you may proceed with a z test.

Step 5: On the top tool bar, click on Tools, then Data Analysis and select z-Test: two Sample for Means from the tools list. Click OK.

Excel: Step 5

Step 6: Complete the dialogue box which opens.

Excel: Step 6

Ensure that the cursor is flashing in the Variable 1 Range: Input box. Input the cell references of the data by clicking on cell B1 (this includes the data or sample label) and dragging down the column to the last cell containing data. The area on the spreadsheet will now be highlighted and the cell references shown in the input box.

Step 7: Click in the box marked Variable 2 Range: and repeat the process with the second column of DBH data.

Step 8: The hypothesised mean difference under the null hypothesis is zero so enter 0 in the box below or leave it blank (as 0 will be assumed).

Step 9: The box marked Labels should be clicked, this will put a tick in the box which shows that the first cell for each data set contains a label and not data. If this box is not ticked, Excel will treat the material in the first cell as data and will not be able to complete the calculation. Note that it is useful to use the labels to identify your data. These labels are used by Excel to identify the output data.

Step 10: Copy in the variance values for the respective data sets into the boxes for Variance - Variable 1 variance (known).

Step 11: The default value for alpha (p) is 0.05 but this can be changed if required, but you will not normally require to do so.

Step 12: Next, select the output options. To return the output data below the input data, select 'Output Range:' and then click in the box (the cursor will now flash in the box). Scroll over an area where you want the results to be displayed. Note that you could just select a couple of cells - Excel will determine the actual size that it requires for the results table. Note too, that it is essential to click in the box as well as selecting the 'Output Range' button. If this is not done the location is entered into the 'Input Range' box and the analysis cannot be completed.

Step 13: You can choose to have the results entered on to a 'New Worksheet Ply', in which case the results will be given on a fresh sheet, accessed by the tabs at the bottom of the current sheet. Alternatively a 'New Workbook' can be selected. We have opted to put the results table beneath the original data on the spreadsheet.

Step 14: Click OK and the Results table will be returned.

Excel: Step 14

The results table shows first some summary data and then gives the calculated value for z (-0.375). Note this is a negative figure but the sign can be ignored as it only reflects the order in which the data was entered into the calculation.

Check the critical value for a two-tailed z test at p = 0.05 = 1.96. The result is not significant (NS) as zcalculated is less than zcritical and the null hypothesis cannot be rejected. The probability given as P(Z<=z) two-tailed also confirms this as the probability is 0.7076. We therefore conclude that there is no significant difference (z = 0.375, p = 0.05) in the mean DBH of male and female Taxus baccata in the Wyre Forest.

How to calculate Q W4.1 in SPSS

The current version of SPSS (11.5) does not appear to provide an easy method for performing a z test. However, SPSS will do an unmatched student's t-test. t approximates to z when sample sizes are large.

Step 1: Set up the variables.

When SPSS starts up, select 'variable view' using the tabs at bottom-left. You should get something like this:

SPSS: Step 1 (i)

For the first variable name, type in 'DBH', and for the second 'sex'. Default properties are set for each variable.

SPSS: Step 1 (ii)

DBH is measured in m to the nearest 0.01 m, so the decimals setting is OK.

Gender is going to be either 'male or 'female', so it needs to be a 'string' variable. Click in the 'type' cell in row 2, and a grey area should appear at the right-hand side of the cell. Clicking on this gives a dialogue box, in which you should click on the 'string' radio button.

SPSS: Step 1 (iii)

Click on 'OK'. Change to data view using the tabs at bottom-left, and enter the data. Lots of copying and pasting would seem to be a good idea in the second column.

SPSS: Step 1 (iv)

Step 2: Perform the test.

Go to 'Analyze', 'Compare Means', 'Independent samples T-Test'.

SPSS: Step 1 (v)

Click on 'dbh' to highlight it, the click on the appropriate arrow to transfer it to the 'test variable(s)' window. Similarly transfer 'sex' to the 'Grouping variable' window.

SPSS: Step 1 (v)

Click on 'Define groups', and enter 'male' in group 1, and 'female' in group 2.

SPSS: Step 1 (vi)

Click on 'continue'. Click on 'OK'. The results will appear in a separate window.

T-Test

SPSS: T-Test results

Step 3: Decide what the results mean.

Levene's test for equality of variances produces a test statistic of 0.200 and a p-value of 0.655, which means that there is no reason to assume unequal variances.

The value of t, assuming equal variances, is 0.372.

The column headed 'Sig. (2-tailed)' gives the p-value for this result. The value here is 0.710. Since this is greater than 0.05, we accept the null hypothesis, and conclude that there is no significant difference (t = 0.372, p = 0.05) in the mean DBH of male and female Taxus baccata in the Wyre Forest.

How to calculate Q W4.1 in Minitab

There does not seem to be an easy way for Minitab to perform a z-test.

However, Minitab will do an unmatched student's t-test. t approximates to z when sample sizes are large.

Step 1: Enter your data into the worksheet part of the Minitab display, using sensible headings for the columns.

Minitab: Step 1

Step 2: Perform the test. Go to 'Stat', 'Basic statistics', '2-sample t'.

Minitab: Step 2 (i)

Click on the radio button 'Samples in two different columns'. Click in the window for the first column, click on the column heading in the left-hand window, and click on 'select' to transfer it across. Repeat for the second column.

Minitab: Step 2 (ii)

Click on 'OK'. The results of the test will appear in the session window.

Two-Sample T-Test and CI: DBH (m) of female trees, DBH (m) of male trees

Minitab results

Step 3: Decide what the results mean

The value of t is given as 0.38, and the p-value is 0.708, which is much greater than 0.05. Therefore the data support the conclusion that there is no significant difference (t = 0.38, p = 0.78) between the DBH of male and female Taxus baccata in the Wyre Forest.

(This test does not assume equal variances, and so gives the same values as the SPSS test that doesn't make this assumption.)