Skip to main content

Exercise 1

This exercise uses data from a real undergraduate research project. The aim of the question is to test your understanding of the topics covered in 5.3 and 5.6.

Example W5.1: The movement of European house dust mites (Dermatophagoides pteronyssinus) on different types of carpet

A final year honours student was investigating the distribution of house dust mites in relation to different types of carpet. As part of her investigation she wanted to see if dust mites moved at the same rate through these various carpets. She obtained four different carpet tiles: short pile wool, deep pile wool, short pile synthetic and deep pile synthetic. The carpet tiles were inoculated with dust mites and after 4 weeks examined to see how far the mites had travelled (Table W5.1).

Table W5.1: The distance travelled by European house dust mites on four types of carpets after four weeks.

 

Number of dust mites between

0 - 4.9cm

Number of dust mites between

5 - 9.9cm

Number of dust mites between

10 - 14.9cm

Short pile wool

1280

11

0

Deep pile wool

840

34

0

Short pile synthetic

877

33

5

Deep pile synthetic

995

9

0



1

Q W1.1

The student wishes to test the hypothesis that there is no association between the types of carpet and the distance travelled and she is planning to use the r x c chi-squared test for association. (See Chapter 5, 5.3.). Does the data fit the criteria for using this test?

[If you would like to save a record of your answer, please type it into this Word document]

a) Yes
b) No
Correct. See the table below for a full explanation:

Criteria for using this test

Does the data meet these criteria

1. Wish to test for an association between two treatment variables.

Yes, the two variables are 'distance travelled by dust mites' and 'types of carpet'.

2. Have data that is organised into more than two categories for at least one of the variables and into two or more categories for the second variable.

Yes, there are three categories for the distance travelled and four types of carpet.

3. Have data that is counts or frequencies and is not percentages or proportions

Yes, the unit of measurement is the number dust mites.

4. Have observations that are independent of each other

Yes, each dust mite was recorded only once therefore each observation is independent of all other observations in the sample.

5. Have expected values that are more than 5.

Table W5.2 is the contingency table where the expected values have been calculated using the r x c chi-squared test for association. It is clear that the expected values in the column 10 - 14.9cm are all below 5. Therefore, this criterion is not met.

Table W5.2: Contingency table for the distance travelled by dust mites in four types of carpet in four weeks

 

Number of dust mites between

0 - 4.9cm

Number of dust mites between

5 - 9.9cm

Number of dust mites between

10 - 14.9cm

Total number of dust mites

Short pile wool

Observed

1280

11

0

1291

Short pile wool

Expected

3992 × 1291

4084

= 1261.9177

87 × 1291

4084

= 27.50171

1.58056

 

Deep pile wool

Observed

840

34

0

874

Deep pile wool

Expected

3992 × 874

4084

= 854.31146

18.61851

1.07003

 

Short pile synthetic

Observed

877

33

5

915

Short pile synthetic

Expected

894.38786

19.49192

1.12023

 

Deep pile synthetic

Observed

995

9

0

1004

Deep pile synthetic

Expected

981.38296

21.38786

1.22919

 

Total number of dust mites

3992

87

5

4084



Incorrect. See the table below for a full explanation:

Criteria for using this test

Does the data meet these criteria

1. Wish to test for an association between two treatment variables.

Yes, the two variables are 'distance travelled by dust mites' and 'types of carpet'.

2. Have data that is organised into more than two categories for at least one of the variables and into two or more categories for the second variable.

Yes, there are three categories for the distance travelled and four types of carpet.

3. Have data that is counts or frequencies and is not percentages or proportions

Yes, the unit of measurement is the number dust mites.

4. Have observations that are independent of each other

Yes, each dust mite was recorded only once therefore each observation is independent of all other observations in the sample.

5. Have expected values that are more than 5.

Table W5.2 is the contingency table where the expected values have been calculated using the r x c chi-squared test for association. It is clear that the expected values in the column 10 - 14.9cm are all below 5. Therefore, this criterion is not met.

Table W5.2: Contingency table for the distance travelled by dust mites in four types of carpet in four weeks

 

Number of dust mites between

0 - 4.9cm

Number of dust mites between

5 - 9.9cm

Number of dust mites between

10 - 14.9cm

Total number of dust mites

Short pile wool

Observed

1280

11

0

1291

Short pile wool

Expected

3992 × 1291

4084

= 1261.9177

87 × 1291

4084

= 27.50171

1.58056

 

Deep pile wool

Observed

840

34

0

874

Deep pile wool

Expected

3992 × 874

4084

= 854.31146

18.61851

1.07003

 

Short pile synthetic

Observed

877

33

5

915

Short pile synthetic

Expected

894.38786

19.49192

1.12023

 

Deep pile synthetic

Observed

995

9

0

1004

Deep pile synthetic

Expected

981.38296

21.38786

1.22919

 

Total number of dust mites

3992

87

5

4084



Check your answer


2

Q W1.2

What can she do?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

It is biologically sensible as well as mathematically helpful if she combines the data from the last two columns. See Chapter 5 (5.6). The revised contingency table shows you what the data will now look like and the new expected values (Table W5.3).

Table W5.3: Revised contingency table for the distance travelled by dust mites in four types of carpet in four weeks

 

Number of dust mites between

0 - 4.9cm

Number of dust mites between

5 - 14.9cm

Total number of dust mites

Short pile wool

Observed

1280

11

1291

Short pile wool

Expected

1261.9177

29.08227

 

Deep pile wool

Observed

840

34

874

Deep pile wool

Expected

854.31146

19.68854

 

Short pile synthetic

Observed

877

38

915

Short pile synthetic

Expected

894.38786

20.61215

 

Deep pile synthetic

Observed

995

9

1004

Deep pile synthetic

Expected

981.38296

22.61704

 

Total number of dust mites

3992

92

4084



Check your answer

3

Q W1.3a

Use the data in Table W5.3. to carry out the r × c chi-squared test for association. What hypotheses are being tested?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

H0: There is no association between the distance travelled by the house dust mites (cm) and the types of carpet.

H1: There is an association between the distance travelled by the house dust mites (cm) and the types of carpet.

Check your answer

4

Q W1.3b

What is chi-squared (calculated)?

[If you would like to save a record of your answer, please type it into this Word document]

45.54

The full calculation is as follows:

chi-squared calculation
Check your answer

There are also instructions on how to perform this calculation using the following software packages:

Excel

SPSS

Minitab

How to calculate Q W1.3b in Excel

Step 1: Put the data into the spreadsheet using appropriate row and column headings.

Excel: Step 1 (i)

(The '(o)' means 'observed', to distinguish them from the expected values to be calculated later.)

Calculate the totals for the rows and columns. Start with the rows: in cell d3, type the formula '=sum(b3:c3)', and click on the green tick or press 'return'.

Excel: Step 1 (ii)

Click on cell d3 to highlight it, then hover the cursor over the bottom-right corner of the cell. It should change from an open horizontal-vertical cross into an addition sign. When it does, hold down the left mouse button, move the cursor down into cell d6, and release the button. The formula will have been copied, and the total for the other rows calculated.

Excel: Step 1 (iii)

Now for the columns: start in cell b7, and enter the formula '=sum(b3:b6)'. Click on the green tick or press 'return', then drag this across into cells c5 and d5.

Excel: Step 1 (iv)

Step 2: Calculate the expected values.

Create four new rows for the expected values, using '(e)' to indicate expected values. (In this example, rows 9 to 12 will be used.)

You could create a formula to go in each cell, but it is quicker to create a formula to be dragged across the results space. Go to cell b9 and enter the formula '=b$7/$d$7*$d3. The 'b$7' will always refer to a column total in row 7; the '$d$7' will always refer to the grand total in cell d7; and the $d3 will always refer to a row total in column d. Click on the green tick or press 'return', then drag the formula across into cell c9. You can now drag the formula down to cover the whole of the result space.

Excel: Step 2

Step 3: Calculate chi-squared for this test for association. First, calculate the values of (obs - exp)2/exp for all possible combinations of pile length and fabric composition. Use rows 14 to 17, and enter the formula '=(b3-b9)^2/b9into cell b14. Click on the green tick, or press 'return'.

Excel: Step 2 (i)

Because the observed and expected values are in identically arranged arrays, this formula can be dragged across and down to cell c17 to calculate all the other values of (obs - exp)2/exp. (You will have to do the drag across first, then drag the whole row down.)

Excel: Step 2 (ii)

Next, add them all up to find the value of chi-squared. They are in a nice rectangular array, so a single 'sum' function will do the job. We shall use cell b19, and the formula '=sum(b14:c17)'.

Excel: Step 2 (iii)

Therefore, the value of chi-squared is 45.537953.

How to calculate Q W1.3b in SPSS

It seems that the only way to do this using SPSS is to enter each dust mite on a separate row - SPSS does not work well with contingency tables. Each mite would have a categoric variable for carpet type, and another for distance moved (perhaps 'near' and 'far'). The problem here is that we have over 4000 dust mites, so our results table would need over 4000 rows. This would involve an awful lot of typing (or copying and pasting - another feature that SPSS doesn't seem to support very well), so it is probably easier to use another package for this type of analysis.

If you really want to work through this example using SPSS, follow the instructions in the Statistics Software section of the Online Resource Centre for SPSS Box 5.4.

How to calculate Q W1.3b in Minitab

Step 1: Enter the data into the worksheet section of the Minitab screen. The letters 'o' in brackets at the end of the column names indicate observed values.

Minitab: Step 1

Step 2: Calculate the totals for the rows. Go to 'Calc', 'Calculator'; type 'c2 + c3' in the expression box, and type 'total (o)' in the 'Store result in variable' box.

Minitab: Step 2 (i)

Now click on 'OK'.

Minitab: Step 2 (ii)

Step 3: Calculate the totals for the columns. In c1(6), enter 'total' (leave row 5 blank to avoid confusion between data and totals).

Click in the 'Session' (top) window, go to 'Editor' and select 'Enable Commands'.

Minitab: Step 3 (i)

At the 'MTB >' prompt, type 'let c2(6) = sum(c2)'. This will add all the numbers in column 2, and place the result in cell 6 in column 2.

Minitab: Step 3 (ii)

Repeat the process for columns 3 and 4.

Minitab: Step 3 (iii)

Step 4: Calculate the expected values. Go to 'Calc', 'Calculator' and enter 'coastal (e)' into the 'Store Results in Variable' box. Enter 'c2(6)/c4(6)*c4' into the expression box.

Minitab: Step 4 (i)

Now click on 'OK'.

Minitab: Step 3 (ii)

Repeat the process for the expected values for 5.0-14.9 cm, using the column heading '5.0-14.9 cm (e)' and the expression 'c3(6)/c4(6)*c4'.

Minitab: Step 3 (iii)

Step 5: Calculate the test value of chi-squared. This is the sum of the terms like '(observed - expected)2/(expected), and in this case we have eight of them. The easiest way to do this is to calculate the eight parts separately, in two columns, and then add them all up.

Go to 'Calc, 'Calculator' and enter 'chi-sq. (near)' ('near' meaning 0.0-4.9 cm) into the 'Store results in variable' box, and type '(c2-c5)**2/c5' into the expression window.

Minitab: Step 5 (i)

Now click on 'OK'.

Minitab: Step 5 (ii)

Repeat the process for the far (5.0-14.9 cm) data, using the column heading 'chi-sq. (far)' and the expression '(c3-c6)**2/c6'.

Minitab: Step 5 (iii)

The total chi-squared is found by adding all these together. Go to 'Calc', 'Calculator' and type in the expression 'sum(c7)+sum(c8)'. Place this in a variable called simply 'chi-sq.'. (Actually, we only need the first four values in these columns, but the values in row 6 are both zero.)

Minitab: Step 5 (iv)

Click on 'OK'.

Minitab: Step 5 (v)

Therefore, the value of chi-squaredcalculated is 45.5380.

5

Q W1.3c

What are the degrees of freedom?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

The degrees of freedom (degrees of freedom) = (rows - 1)(columns - 1). You do not include the 'expected' rows and columns. So there are four rows and two columns. Therefore, degrees of freedom= (4 - 1)(2 - 1) = 3 × 1 = 3.

Check your answer

6

Q W1.3d

What is chi-squared (critical) at p = 0.05?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

chi-squared (critical) at p = 0.05 = 7.81Check your answer

7

Q W1.3e

Therefore, do you reject the null hypothesis?

[If you would like to save a record of your answer, please type it into this Word document]

a) Yes
b) No
Correct. chi-squared (calculated)(45.53795) is greater than chi-squared (critical)(7.81) at p = 0.05 and therefore we reject the null hypothesis. In fact at p = 0.001, chi-squared (critical)= 16.27 so we may reject the null hypothesis at this higher level of significance.Incorrect. chi-squared (calculated) (45.53795) is greater than chi-squared (critical)(7.81) at p = 0.05 and therefore we reject the null hypothesis. In fact at p = 0.001, chi-squared (critical)= 16.27 so we may reject the null hypothesis at this higher level of significance.
Check your answer

8

Q W1.3f

What does this mean in real terms?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

There is a very highly significant association (p = 0.001) between the distance travelled by the house dust mites and the types of carpet.

(If you are reporting the results for this calculation having used one of the statistical software packages your p value will be different as you will be reporting the exact p value at which the decision was made, not the threshold value).

Although this test does not tell you about this difference you can see by looking at the data in Table W5.1 that the dust mites travelled furthest in the short pile synthetic and deep pile wool carpets. It is difficult to see why this would be and further investigation is required.

Check your answer