Skip to main content

Exercise 2

This exercise uses data from a real undergraduate research project. The aim of the exercise is to enable you to integrate many of the topics we covered in chapters 1 to 5, as well as the summary included in appendix b: Which statistical test should I choose?

Example W5.2. Attitudes to over the counter genetic tests.

An undergraduate carrying out her third year honours research project investigated students attitudes to 'over the counter' genetic tests. She asked a group of Biology students and a group of Education (non Biology) students a series of closed questions. The first question made it clear whether the participant knew what was meant by 'genetic testing' (Table W5.4). Having then provided some additional information, the second question asked whether the participants agreed with the idea of over the counter genetic testing (Table W5.5).

Table W5.4: The number of Biology and Education (non-Biology) students who understood the term 'genetic testing'

 

Yes, the term was understood.

No, the term was not understood.

Biology students

16

4

Education (non Biology) students

8

12



Table W5.5: The number of students that agreed with the idea of offering over the counter genetic tests

 

Agreed

Disagreed

Biology students

6

14

Education (non Biology) students

2

18



1

Q W2.1

What is the type of hypothesis that the student is going to test when analysing the data in Table W5.4?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

The general hypothesis is: Does my data fit an expected ratio?

The more specific hypotheses are:

H0: There is no association between the number of students understanding the term 'genetic testing' and the course the students are taking.

H1: There is an association between the number of students understanding the term 'genetic testing' and the course the students are taking.

Check your answer

2

Q W2.2

What would be an appropriate statistical test? Why?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

A 2 × 2 chi-squared test for association using a Yates correction or a G test for association.

If you are not sure, read appendix b and the section at the beginning of Chapter 5 called 'How to choose the correct test'.

1. What type of investigation am I designing?

This is an experiment and the student is starting out with a question (hypothesis).

2. Which type of hypotheses am I testing?

We have already decided in our answer to Q W2.1 that the general hypothesis being tested is: Does my data fit an expected ratio?

How to choose the correct test.

If the student intends to test this type of hypothesis then from the information provided in B2.1 it is clear that she should consider using either the chi-squared test for association with a Yates correction (5.4.2) or the G test for association.

To check that the criteria for using these tests are met we refer to 5.3, 5.3.1 and 5.4.2.

Criteria for using this test. You:

Do the data meet these criteria?

1. Wish to test for an association between two treatment variables.

Yes. The two treatment variables are the groups of students and their responses to the question.

2. Have two categories for each variable. If yes, then the chi-squared test should be modified with a Yates correction.

There are two categories for each variable (Biology/Education and yes/no).

3. Have data that is counts or frequencies and is not percentages or proportions

The data are numbers of students (counts).

4. Have observations that are independent of each other

Each student is only included once. Therefore each observation is independent from all the others.

5. Have expected values that are more than 5.

Yes. We have calculated these in Table W5.6. On the 'expected' rows all these values are greater than 5.



Table W5.6: Contingency table for the number of students who understood the term 'genetic testing'

 

Yes

No

Total number of students

Biology students

Observed

16

4

20

Biology students

Expected

24/40 × 20 = 12

16/40 × 20 = 8

 

Education (non Biology) students

Observed

8

12

20

Education (non Biology) students

Expected

24/40 × 20 = 12

16/40 × 20 = 8

 

Total number of students

24

16

40



Check your answer

3

Q W2.3a

Use the chi-squared test for association with a Yates correction to test the hypothesis that there is no association between the number of students understanding the term 'genetic testing' and the course the students are taking. What is chi-squared (calculated)?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

chi-squared (calculated) = 5.10

Full calculation:

Chi-squared calculation

(The straight brackets indicate an absolute term. If the answer within the straight bracket is negative you can ignore this sign).

Chi-squared calculation

= 1.02083 + 1.53125 + 1.02083 + 1.53125

= 5.10417

Check your answer

We have also shown how to calculate this using the following software packages:

Excel

SPSS

Minitab

How to calculate Q W2.3a in Excel

Step 1: Enter the data into the spreadsheet using appropriate row and column headings.

Excel: Step 1 (i)

Calculate the row totals: use the formula '=b3+c3' in cell d3, and drag it down into cell d4.

Excel: Step 1 (ii)

Calculate the column totals: use the formula '=b3+b4' in cell b5, and drag it across into cells c5 and d5.

Excel: Step 1 (iii)

Step 2: Calculate the expected values.

In this case, the expected values are given by (column total/grand total) x row total.

Use two new rows (rows 7 and 8), and in cell b7 type the formula '=b$5/$d$5*$d3'. In this formula, 'b$5' always refers to a column total in row 5; '$d$5' always refers to the grand total in cell d5; and' $d3' always refers to a row total in column d.

Excel: Step 2 (i)

Drag this formula across and then down to populate the results space.

Excel: Step 2 (ii)

Step 3: Calculate the values of (obs-exp)2/exp - but remember to use the Yates' correction: (|obs-exp| - 0.5)2/exp, where the vertical bars mean 'take the absolute value of'. The 'absolute value' of a number (sometimes called its modulus) is the number, but ignoring any minus sign. In Excel, this is done using the 'abs' function.

Use rows 10 and 11, and in cell b10 type in the formula '=(abs(b3-b7)-0.5)^2/b7'. Click on the green tick, or press 'return'.

Excel: Step 3 (i)

Drag this across into cell c10, then drag the row down to fill the result space.

Excel: Step 3 (ii)

Add these together by typing the formula '=sum(b10:c11)' into cell b13. Click on the green tick, or press 'return'.

Excel: Step 3 (iii)

The value of chi-squaredcalculated is 5.104167.

How to calculate Q W2.3a in SPSS

Step 1: Set up the variables

When SPSS starts, select the 'Type in data' option. Then choose 'variable view' from the tabs at the bottom left.

SPSS: Step 1 (i)

You will see a screen something like this:

SPSS: Step 1 (ii)

Each row represents a variable for the analysis. We need one variable for the subject studied, and another to indicate whether the student understood 'genetic testing'. Enter 'subject' in the 'Name' field in row 1, and 'underst' in the 'Name' field in row 2.

SPSS: Step 1 (iii)

Both these are categoric variables, so we need to use value labels. Click in the 'values' cell in row 1, and then click in the grey area that appears at the right-hand side of the cell. You will get a dialogue box. Enter a value of 1, and a value label of 'biology', then click on 'add' to register this pair with the system.

SPSS: Step 1 (iv)

Repeat with the value-label pair of 2 and education, then click on 'OK'.

Since our values are integers, we should reduce the 'decimals' property to zero. Click in the 'decimals' cell of row 1, and use the 'up-and-down' arrows that appear to change the 'decimals' property to zero.

SPSS: Step 1 (v)

Repeat the process for the variable 'underst', using value labels of 'yes' and 'no', and also reducing the number of decimal places to zero.

SPSS: Step 1 (vi)

Change to data view by clicking on the tab at bottom left.

Step 2: Enter the data.

Check that value labels are enabled by going to 'view' and, if necessary, ensuring that there is a tick next to 'value labels'.

SPSS: Step 2 (i)

Click in the first cell of column 1, and a drop-down menu will become accessible from an arrow at the right-hand side of the cell. Use this to enter the word 'biology'.

SPSS: Step 2 (ii)

Repeat the process until the first 20 cells in the column have been filled. (It may be easier to do some copying and pasting.)

SPSS: Step 2 (iii)

Repeat for the next 20 cells, but insert the word 'education'. Then go to the second column, and insert 'yes' and 'no' in the appropriate locations.

SPSS: Step 2 (iv)

Go to 'Analyze', 'Descriptive Statistics', 'Crosstabs'. In the following dialogue box, click on 'subject' to highlight it, the click on the appropriate arrow to transfer it to the 'row(s) window.

SPSS: Step 2 (v)

Repeat the process to transfer 'underst' to the 'column(s)' window.

Click on 'Statistics', and make sure that 'Chi-square' is selected.

SPSS: Step 2 (vi)

Click on 'continue' and then on 'OK', and the results will appear in a separate window.

Crosstabs

SPSS results

The final table gives the results of the chi-squared test. The uncorrected value of chi-squared is 6.667, but with the Yates' correction it is 5.104.

How to calculate Q W2.3a in Minitab

Step 1: Enter the observed data into the worksheet window of Minitab.

Minitab: Step 1 (i)

Calculate the totals for the rows. Go to 'Calc', 'Calculator', enter 'total' in the 'Store result in variable' window, and type '=c2+c3' in the expression window.

Minitab: Step 1 (ii)

Click on 'OK'.

Minitab: Step 1 (iii)

Now for the column totals. Click in the Session (upper) window in Minitab, go to 'Editor' and select 'Enable Commands'.

Minitab: Step 1 (iv)

In column 1 cell 4, write 'total'.

At the 'MTB >' prompt, enter the command 'let c2(4) = sum(c2)'.

Minitab: Step 1 (v)

Repeat for columns 3 and 4, using the commands 'let c3(4) = sum(c3)' and 'let c4(4) = sum (c4)'.

Minitab: Step 1 (vi)

Step 2: Calculate the expected values. Go to 'Calc', 'Calculator'; enter 'Bio (e)' in 'Store result in variable', and type 'c2(4)/c4(4)*c4' in the expression window.

Minitab: Step 2 (i)

Click on 'OK'.

Minitab: Step 2 (ii)

Repeat the process for the Education students, using the variable 'Edu (e)' and the formula 'c3(4)/c4(4)*c4'.

Minitab: Step 2 (iii)

Step 3: Calculate the values of chi-squared using the Yates' correction. This involves calculating the absolute value of the difference between the observed and expected values using the 'absolute' operator in Minitab.

Go to 'Calc', 'Calculator', and insert 'chi-sq (Bio)' in the 'Store results in variable' window. Type '(absolute(c2-c5)-0.5)**2/c5' in the expression window.

Minitab: Step 3 (i)

Click on 'OK'.

Minitab: Step 3 (ii)

Repeat for the Education students, using the variable name 'chi-sq. (Edu)' and the formula '(absolute(c3-c6)-0.5)**2/c6'.

Minitab: Step 3 (iii)

Add the individual values of chi-squared together. The easiest way to do this is to add the column totals for columns 7 and 8, but both of them have a meaningless entry in row 6. Delete this by highlighting it and pressing 'delete'. Now go to 'Calc', Calculator'. In the 'Store result as variable' window, type 'chi-sq. total', and in the expression window type 'sum(c7)+sum(c8)'.

Minitab: Step 3 (iv)

Click on 'OK'.

Minitab: Step 3 (v)

Therefore, the value of chi-squaredcalculated is 5.10417.

4

Q W2.3b

What are the degrees of freedom?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

The degrees of freedom (symbol for degrees of freedom) are (rows - 1)(columns - 1). Remember to only include the observed rows and columns. Therefore symbol for degrees of freedom = (2 -1)(2 - 1) = 1 x 1 = 1. Check your answer

5

Q W2.3c

What is chi-squared critical at p = 0.05?

[If you would like to save a record of your answer, please type it into this Word document]

3.84

Check your answer

6

Q W2.3d

Therefore, do you reject the null hypothesis?

[If you would like to save a record of your answer, please type it into this Word document]

a) Yes
b) No
Correct. chi-squared calculated (5.10) is greater than chi-squared critical(3.84) at p = 0.05 and therefore we reject the null hypothesis. Incorrect. chi-squared calculated (5.10) is greater than chi-squared critical(3.84) at p = 0.05 and therefore we reject the null hypothesis.
Check your answer

7

Q W2.3e

What does this mean in real terms?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

There is a significant association (chi-squared calculated = 5.10, p = 0.05) between the number of students understanding the term 'genetic testing' and the course the students are taking.

(If you are reporting the results for this calculation having used one of the statistical software packages your p value will be different as you will be reporting the exact p value at which the decision was made not the threshold value).

Although this test does not tell us anything about the nature of this association it is clear from Table W5.4 that more Biology students understood the term 'genetic testing' than Education (non Biology) students. We would have been embarrassed if this hadn't been the outcome!

Check your answer

8

Q W2.4a

Carry out an r × c G test for association on the same data (Table W5.4). Do you come to the same conclusion?

Again, this calculation has been broken down into steps so you can check your work as you go.

Firstly, what is G?

[If you would like to save a record of your answer, please type it into this Word document]

6.90

Step

Calculation

i.

16 ln 16 = 16 × 2.77259 = 44.36142

4 ln 4 = 4 × 1.38629 = 5.54518

8 ln 8 = 8 × 2.07944 = 16.63553

12 ln 12 = 12 × 2.48491 = 29.81888

ii.

total = 96.36101

iii.

N ln N = 40 ln 40 = 40 × 3.68888 = 147.55518

iv.

20 ln 20 = 20 × 2.99573 = 59.91465

20 ln 20 = 20 × 2.99573 = 59.91465

24 ln 24 = 24 × 3.17805 = 76.27329

16 ln 16 = 16 × 2.77259 = 44.36142

total = 240.464

v.

96.36101 +147.55518 - 240.464 = 3.45219

vi.

G = 2 × 3.45219 = 6.90437



Check your answer

9

Q W2.4b

What is W?

[If you would like to save a record of your answer, please type it into this Word document]

1.04

Step

Calculation

i.

[ ( 1/20 + 1/20) × 40 ] - 1 = [ ( 0.05 + 0.05) × 40 ] - 1 = (0.1 × 40) - 1 = 4 - 1 = 3

ii.

[ ( 1/24 + 1/16) × 40 ] - 1 = [ (0.04167 + 0.0625) × 40 ] - 1

= (0.10417 × 40) - 1 = 4.16667 - 1 = 3.16666

iii.

3 × 3.16666 = 9.49999

iv.

6 × 40 × (2 - 1) (2 - 1) = 240 × 1 × 1 = 240

v.

W = 1 + (9.49999/240) = 1 + 0.03958 = 1.03958



Check your answer

10

Q W2.4c

What is Gcalculated?

[If you would like to save a record of your answer, please type it into this Word document]

6.64

Gcalculated = G / W = 6.9043732 / 1.0395833 = 6.6414814
Check your answer

There are also instructions on how perform this calculation using the following software packages:

Excel

SPSS

Minitab

How to calculate Q W2.4c in Excel

Step 1: Enter the data into the spreadsheet using appropriate row and column headings.

Excel: Step 1 (i)

Calculate the row totals: use the formula '=b3+c3' in cell d3, and drag it down into cell d4.

Excel: Step 1 (ii)

Calculate the column totals: use the formula '=b3+b4' in cell b5, and drag it across into cells c5 and d5.

Excel: Step 1 (iii)

Step 2: Calculate o ln(o) for all these values, where 'o' is an observed value, and 'ln' means 'the natural logarithm of'. This is done by the Excel function 'ln'. The easiest way to do this is to create another table of identical dimensions to the one we already have, and populate it with the numbers we need.

Excel: Step 2 (i)

Now calculate the values of o ln(o). We can do this by typing a formula into one cell and dragging it into all the others, because our grids are the same shape and size. In cell b9, type '=b3*ln(b3)', the click on the green tick or press 'return'.

Excel: Step 2 (ii)

Now drag this cell across the rest of the table. (You may have to drag across a row, and then drag the row down.)

Excel: Step 2 (iii)

Following the outline in the book, we next add together all the values of o ln(o) for the individual measurements, and place them in a convenient (labelled) cell, say b13. Type in the formula '=sum(b9:c10)', then click on the green tick or press 'return'.

Excel: Step 2 (iv)

o ln(o) for the grand total is stored in cell d11, so we simply note that it is there: we will need it soon. The next thing to do is to add the values of o ln(o) for the rows and columns together, and put them somewhere convenient (b14). Type the formula '=sum(b11:c11)+sum(d9:d10)' into cell b14, then click on the green tick or press 'return'.

Excel: Step 2 (v)

Taking values of o ln(o), we now need to find G = 2 x(measurements + grand total - rows & columns). Using cell b15, type '=2*(b13+d11-b14)', then click on the green tick or press 'return'.

Excel: Step 2 (vi)

Step 3: To find the Williams' correction, first work out 1/each row total and add these values together. Multiply by the grand total. Subtract 1. To do this, use a formula in a convenient cell, say, e14. The formula is '=(1/d3+1/d4)*d5-1'.

Excel: Step 3 (i)

Do the same thing for the columns. The formula is '=(1/b5+1/c5)*d5-1'.

Excel: Step 3 (ii)

Multiply these two together: the formula is '=e14*e15'.

Excel: Step 3 (iii)

Now we calculate 6n(rows-1)(columns-1), where n is the total number of observations, and 'rows' and 'columns' are the numbers of rows and columns. Use the formula '=6*d5*(2-1)*(2-1)'.

Excel: Step 3 (iv)

The Williams correction factor is W = 1 + (rows x cols)/(6n(r-1)(c-1). Use the formula '=1+e16/e17'.

Excel: Step 3 (v)

Gcalculated = G/W. The formula is '=b15/e18'.

Excel: Step 3 (vi)

The value of G is 6.64.

How to calculate Q W2.4c in Excel

SPSS does not do G tests directly.

How to calculate Q W2.4c in Minitab

Step 1: Enter the observed data into the worksheet window of Minitab.

Minitab: Step 1

Step 2: Calculate the totals for the rows. Go to 'Calc', 'Calculator', enter 'total' in the 'Store result in variable' window, and type '=c2+c3' in the expression window.

Minitab: Step 2 (i)

Click on 'OK'.

Minitab: Step 2 (ii)

Step 3: Now for the column totals. Click in the Session (upper) window in Minitab, go to 'Editor' and select 'Enable Commands'.

Minitab: Step 3 (i)

In column 1 cell 4, write 'total'.

At the 'MTB >' prompt, enter the command 'let c2(4) = sum(c2)'.

Minitab: Step 2 (ii)

Repeat for columns 3 and 4, using the commands 'let c3(4) = sum(c3)' and 'let c4(4) = sum (c4)'.

Minitab: Step 2 (iii)

Step 4: Find the values of O ln (O), where 'ln' means 'the natural logarithm of'.

Go to 'Calc', 'Calculator', and enter 'O ln O (yes)' in the 'Store result in variable' window. In the expression window, type 'c2*loge(c2)', where 'loge' means 'logarithm to base e', or natural logarithm.

Minitab: Step 4 (i)

Click on 'OK'.

Minitab: Step 4 (ii)

Repeat the process for the 'no' students, remembering that their data is in column 3.

Minitab: Step 4 (iii)

Next, calculate O ln O for the grand total (from cell 4 of column 4). Go to 'Calc', 'Calculator', enter 'k1' in the 'Store result as variable' window, and in the expression window type 'c4(4)*loge(c4(4))'.

Minitab: Step 4 (iv)

Click on 'OK'. There should be no change to the main screen.

Go to 'Project manager' (at bottom left) and open the 'constants' folder. Right-click on the word 'unnamed' next to 'k1', select 'rename' and name it 'o ln(o) (total)'.

Minitab: Step 4 (v)

(Note the typing error in the image above)

Minimise project manager.

Now calculate o ln(o) for the column totals. Click in the upper (Session) window, go to 'Editor' and select 'Enable commands'.

Minitab: Step 4 (vi)

In column 1 cell 6, type 'o ln(o)'.

At the 'MTB >' prompt, type 'let c2(6) = c2(4)*loge(c2(4))', and press 'return'.

Minitab: Step 4 (vi)

Repeat for column 3.

Minitab: Step 4 (vii)

Add up the o ln(o) for the columns, and store it in a variable. Go to 'Calc', 'Calculator', type 'k2' in the 'Store results in variable' window, and put 'c2(6)+c3(6)' in the expression window.

Minitab: Step 4 (viii)

Click on 'OK', and open the project manager. Rename K2 as 'o ln(o) (cols)'.

Minitab: Step 4 (ix)

Minimise project manager.

Calculate o ln(o) for the total of each row. Go to 'Calc', 'Calculator', enter 'o ln(o) (row total)' in the 'Store result in variable' window, and type in the expression 'c4*loge(c4)'.

Minitab: Step 4 (x)

Click on 'OK'.

Minitab: Step 4 (xi)

Next, we add the two row totals together, and store them as a constant. Go to 'Calc', Calculator', and type 'k3' in the 'store result in variable' window. Type 'c7(1) + c7(2)' in the expression window.

Minitab: Step 4 (xii)

Click on 'OK', go to project manager, and rename k3 as 'o ln(o) (rows)'.

Minitab: Step 4 (xiii)

Finally, we find the sum of all the o ln(o) for the individual measurements, and store that in another constant. This is quickest by summing columns, but there are meaningless numbers in row 4 of columns 5 to 7. Remove these by highlighting them and pressing 'delete'.

Go to 'Calc', 'Calculator', put k4 in the 'store result in variable' window, and type 'sum(c5) + sum(c6)' in the expression window.

Minitab: Step 4 (xiv)

Click on 'OK'.

Open the project manager, and rename k4 as 'o ln(o) (individuals)'.

Minitab: Step 4 (xv)

G = 2 x [o ln(o) (total) + o ln(o) (individuals) - o ln(o) (rows) - o ln(o) (columns)]

Go to 'Calc', 'Calculator', put k5 in the 'Store result in variable' window, and type '2*(k1+k4-k3-k2)' in the expression window.

Minitab: Step 4 (xvi)

Click on 'OK'.

Go to project manager, and rename k5 as 'G'.

Minitab: Step 4 (xvii)

Step 5: Now we need to apply the Williams correction.

First, we find 1/row total for each row, add them all together, multiply by the grand total, and subtract 1. This is slightly messy, but go to 'Calc', 'Calculator', put k6 in the 'store result in variable' window, and type '(1/c4(1)+1/c4(2))*c4(4)-1' in the expression window.

Minitab: Step 5 (i)

Click on 'OK'. Got to [project manager, and rename k6 as 'W (rows)'.

Minitab: Step 5 (ii)

Repeat for the column totals, using k7 and the formula '(1/c2(4)+1/c3(4))*c4(4)-1'.

Minitab: Step 5 (iii)

W is 1 + (W (rows) x W (cols))/(6 x total x (cols-1) x (rows-1). We have a 2 x 2 results table, an the total number of observations is in c4(4).

Go to 'Calc', 'Calculator', put k8 in the 'Store result in variable' window, and type '1+k6*k7/(6*c4(4)*(2-1)*2-1))'.

Minitab: Step 5 (iv)

Click on 'OK'. Go to project manager, and rename k8 as W.

Minitab: Step 5 (v)

Gcalculated = G/W. This is k5/k8: put the result in G(calculated).

Minitab: Step 5 (vi)

Click on 'OK'.

Minitab: Step 5 (vii)

The value of Gcalculated is therefore 6.64148.

11

Q W2.4d

What are the degrees of freedom?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

The degrees of freedom (degrees of freedom) are (rows - 1)(columns - 1). Therefore degrees of freedom = (2 -1)(2 - 1) = 1 × 1 = 1. Check your answer

12

Q W2.4e

What is Gcritical at p = 0.05?

[If you would like to save a record of your answer, please type it into this Word document]

3.84

Check your answer

13

Q W2.4f

Therefore, do you reject the null hypothesis?

[If you would like to save a record of your answer, please type it into this Word document]

a) Yes
b) No
Correct.

Gcalculated (6.64) is greater than Gcritical (3.84) at p = 0.05 and therefore we reject the null hypothesis.

Incorrect.

Gcalculated (6.64) is greater than Gcritical (3.84) at p = 0.05 and therefore we reject the null hypothesis.

Check your answer

14

Q W2.4g

What does this mean in real terms?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

There is a significant association (G calculated = 6.64, p = 0.05) between the number of students understanding the term 'genetic testing' and the course the students are taking.

(If you are reporting the results for this calculation having used one of the statistical software packages your p value will be different as you will be reporting the exact p value at which the decision was made not the threshold value).

Check your answer

15

Q W2.5

Compare the calculated values for the chi-squared test and the G test. How do they differ? Has this altered the outcome?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

When analyzing the same data from Table W5.4. the chi-squared calculated (5.10) is a smaller value to G calculated (6.64). On this occasion the outcomes are the same. It is clear that the G test is less conservative and will tend to result in the rejection of more null hypotheses than the chi-squared test. Check your answer

16

Q W2.6

Look at Table W5.5. At first glance it would appear that this data from the second question asked by the student can be analysed in the same way as the results from the first question (Table W5.4). Why is this not true?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

If you work out the expected values (Table W5.5) as you would if you were using a 2 × 2 chi-squared test for association you will see that two of the four values are less than 5 (Table W5.7 below). Since this is only a 2 × 2 contingency table it is not possible to combine a row or a column to increase the values (see 5.7 in the book).

This is a frequent difficulty with analysing questionnaire answers. For the first question (Table W5.5) the sample size of 40 was adequate since there were sufficient respondents in all categories. In this second question this is not the case. So here the same sample size is not adequate and the data cannot be analysed using this test. A G test may be used to test this data or refer to Sokal & Rohlf, 1981.

Table W5.7: A contingency table for the number of students who agreed with the idea of offering over the counter genetic tests

 

Agreed

Disagreed

Total number of students

Biology students

Observed

6

14

20

Biology students

Expected

8/40 × 20 = 4

32/40 × 20 = 16

 

Education (non Biology) students

Observed

2

18

20

Education (non Biology) students

Expected

8/40 × 20 =4

32/40 × 20 = 16

 

Total number of students

8

32

40



Check your answer