Skip to main content

Exercise 4

This exercise is based on research carried out by an undergraduate. The topics covered are those in 4.1 and appendix b.

Example W.4.4: The inheritance of coat colour in horses

One undergraduate had access to previously unanalysed stud records that included detailed descriptions of the colour and patterning of the horse coats at birth and at maturity. The data were examined and a number of genetic explanations proposed.

1

Q W4.1

Which statistical test(s) should be suitable for analysing these data?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

Chi-squared or G goodness of fit with or without the Yates correction.

To arrive at this conclusion, follow the steps outlined in appendix b:

B.1 What type of investigation am I designing?

This is an experiment, you are starting out with a question (hypothesis) (go to B.2).

B.2 Which type of hypotheses am I testing?

There are three types of hypotheses which you need to choose between. If you are not sure which type of hypotheses you will be testing read the information in B2.1 - B.2.3 before deciding. For more information about hypotheses and hypothesis testing read Chapter 4.

In this example the student developed a number of genetic proposals against which she wished to test the data. Each genetic proposal therefore provided an expected set of values. This student therefore wished to test the first type of hypotheses.

B.2.1 Does the data match an expected ratio?

The choice of statistical test is determined by the number of variables and number of categories for each variable.

In this example there is one treatment variable (coat colour) and there is a genetic reason (a priori expectation) for expecting certain ratios. The stud book listed a number of different crosses and not surprisingly the ratio of coat colours in the offspring varied. When these were grouped into similar crosses it was clear that for some crosses there were more than two coat colours in the offspring, for some there were only two coat colours in the offspring and for some there was only one. From the table we can see that for most of this data either a chi-squared or G goodness of fit test with or without a Yates correction appears to be the most appropriate group of tests. (For crosses where there is only one expected coat colour any deviation from this will automatically indicate that the 'expected ratio' is not correct. You do not need statistics to tell you this).

Experimental design

Test

You have one variable and you have an a priori reason for expecting certain outcomes from your investigation. The variable has more than two categories.

Chi-squared goodness of fit test (5.1.)

or

G goodness of fit test (5.5.1)

You have one variable and you have an a priori reason for expecting certain outcomes from your investigation. The variable has only two categories.

Chi-squared goodness of fit test with Yates correction (5.4.1.)

or

G goodness of fit test (5.5.5.)

You have one variable with two or more categories and you do not have an a priori expectation. You have more than two samples in your data set and wish to know if the samples are similar or different from each other.

Chi-squared test for heterogeneity (5.2.)

You do not have an a priori expectation. You have two variables. At least one of these variables has more than two categories. You wish to test for an association between the variables.

Chi-squared test for association (5.3)

or

G test for association (5.5.2)

You do not have an a priori expectation. You have two variables. Both variables have only two categories. You wish to test for an association between the variables.

Chi-squared test for association with Yates correction (5.4.2)

or

G test for association (5.5.2)



To use a chi-squared or G goodness of fit test you:

  1. Wish to compare your observed values to those predicted by an a priori expectation.
  2. Have one treatment variable.
  3. Have only one sample.
  4. Have data that falls into more than two categories .
  5. Have data that is counts or frequencies and is not percentages or proportions.
  6. Have observations that are independent.
  7. Have expected values that are more than 5.

From this it appears that most of these criteria are likely to be met. Criterion 7 cannot be confirmed until the data are collected. In this example stud records were pooled across a number of similar genetic crosses to ensure where possible that criterion 7 was met. Where criterion 7 was not met the hypotheses could not be tested.

Check your answer

2

Q W4.2

One explanation for some of the crosses predicted that the offspring should be present in the ratio 2 black: 1 chestnut. Write suitable hypotheses for this test.

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

H0: There is no difference between the number of offspring with black or chestnut coats from that predicted by the ratio 2 black:1 chestnut.

H1: There is a difference between the number of offspring with black or chestnut coats from that predicted by the ratio 2 black:1 chestnut.

Check your answer