Have a go at choosing what might be the correct test to analyse the data from your experiment. Explain your choice.
This is invariably the step that students find the hardest we therefore return to this in interactive exercises in chapters 4 - 8.
In this experiment there is one treatment variable (type of growth medium) and three samples (green compost only, green compost with 25% perlite and green compost with 50% perlite). The student recorded a number of different parameters for each treatment: time to emergence (when the cotyledons were distinguishable above the soil surface), percentage emergence, fresh and dry weight of above ground material (g), the number of true leaves and leaf area. Using the information in Chapter 4 and appendix b we can make some decisions about which are most likely to be appropriate statistical tests to use.
B.1. What type of investigation am I designing?
In this investigation we are starting out with a question so will be testing a hypothesis.
B.2. Which type of hypotheses am I testing?
There are three types of hypotheses which you need to choose between. If you are not sure which type of hypotheses you will be testing read the information in B.2.1 - B.2.3 of the book before deciding. For more information about hypotheses and hypothesis testing read Chapter 4.
In this example the student wishes to compare samples, he does not have an 'expectation' nor does he wish to test for an association between two or more variables. Therefore the general type of hypothesis for all his observations is:
Do samples come from the same or different populations?
B.2.3. Do samples come from the same or different populations?
There are many tests that will test this type of hypotheses. These tests fall into parametric tests (Chapter 7) to be used when you have Normally distributed data, and non-parametric tests (Chapter 8) when your data are not Normally distributed or you do not know the underlying distribution.
The next step then is to determine if the data are Normally distributed (see Box 3.2 in the book and in the Statistical Software section of the Online Resource Centre). There are 5 criteria to help you tell if your data are likely to be Normally distributed. Only the first can be applied at this planning stage. The remaining criteria are used to confirm your decision when you have actual data.
The first criterion is: a. Are the data measured on an interval scale and are therefore quantitative and continuous such as mm and grams?
With these various measures the answers are:
Data measured |
Quantitative and continuous? |
time to emergence |
Yes |
percentage emergence |
No |
fresh weight |
Yes |
dry weight |
Yes |
number of true leaves |
No |
leaf area |
Yes |
First let us consider those measures where the data may be parametric, though this will need confirming later when the data are available. If these parametric measures are found to produce non-parametric data then the tests identified for the non-parametric data may be used or the data may be transformed to Normalise it.
B.2.3.1. Parametric tests
From the table it appears that these data may be analyzed using a one-way parametric ANOVA and Tukey's test (7.5 and 7.6)
Experimental design |
Test |
You have one treatment variable. You are going to compare two samples. The data is unmatched. |
t or z test for unmatched data (7.1 or 7.2). |
You have one treatment variable. You are going to compare two samples. The data is matched. |
t or z test for matched data (7.3) |
You have one treatment variable. You are going to compare two or more samples. You wish to test general and specific hypotheses. |
One-way parametric ANOVA and Tukey's test (7.5 and 7.6) |
You have two treatment variables. Each variable has at least two categories or classes and all categories from one variable are combined with all categories from the second variable. You wish to test general and specific hypotheses. |
Two-way parametric ANOVA and Tukey's test (7.7. and 7.8.) |
You have two treatment variables. Each variable has at least two categories. One variable is randomised or nested with respect to the second variable. You wish to test general hypotheses. |
Two-way nested ANOVA (7.9.) |
You have three treatment variables. Each variable has at least two categories and all categories from each variable are combined with all other categories from the other variables. You wish to test general and specific hypotheses. |
Three-way parametric ANOVA (7.10.) |
|
None of the above
|
Chapter 8. and Sokal & Rohlf, 1981. |
If you wish to use the one-way parametric ANOVA the following criteria need to be met. You:
- Wish to test for differences in population means.
- Have one treatment variable.
- Have parametric data.
- Have an experimental design, which means that each item is assigned at random to the samples.
- Have samples where the variation is similar (homogeneous).
- Have the same number of replicates (observations) in each sample.
In our planning stages we can confirm that all criteria other than 5 have been met with our current design and criterion 5 will need to be checked when the data have been collected.
B.2.3.2. Non-parametric tests
In our design there are two measurements which we know are non-parametric scales (percentage emergence and number of true leaves). For the latter (number of true leaves) since the experimental design is the same as that for the parametric measures it is not surprising that the most appropriate analysis is the non-parametric equivalent i.e. either the non-parametric one-way ANOVA or the Scheirer - Ray - Hare test. Given the amount of data we recommend the latter.
Experimental design |
Test |
You have one treatment variable. You are going to compare two samples. The data is unmatched. You have 20 observations or less in each sample. |
Mann Whitney U test (8.1.) |
You have one treatment variable. You are going to compare two samples. The data is unmatched. The data is measured on a continuous scale and you have more than 30 observations in each sample. |
z test for unmatched data (7.1.) |
You have one treatment variable. You are going to compare two samples. The data is unmatched. You have more than 20 observations in each sample. |
Sokal & Rohlf, 1981. |
You have one treatment variable. You are going to compare two samples. The data is matched. You have less than 30 pairs of observations. |
Wilcoxon's rank paired test (8.2.) |
You have one treatment variable. You are going to compare two samples. The data is matched. You have more than 30 pairs of observations. |
z test for matched data (Chapter 7 (7.2)). |
You have one treatment variable. You are going to compare two or more samples. You wish to test general and specific hypotheses. |
One-way ANOVA (Kruskal Wallis test)( 8.3. and 8.4) |
You have more than one treatment variable. You are going to compare two or more samples. You wish to test general and specific hypotheses. You will be using a calculator. |
Two-way non parametric ANOVA (8.5. and 8.6) |
You have more than one treatment variable. You are going to compare two or more samples. You wish to test general hypotheses. You want to use a computer. |
Scheirer - Ray - Hare test (8.7.). |
To use the Scheirer-Ray-Hare test you:
- Wish to test for differences in population medians.
- Have two treatment variables each with at least two categories.
- The design is orthogonal.
- Have non-parametric data that can be ranked.
At this stage in our planning we can confirm that all these criteria are met. Therefore, unless our design changes we should be able to use this test to examine the hypotheses relating to the number of true leaves.
For the percentage emergence we will have three samples with data on the number of seeds emerging in growing medium A and the number of seeds not emerging in growing medium A (with the same for the media B and C). This data is most usefully analyzed using a chi-squared or G tests for association.
To use the chi-squared or G tests for association (5.3.1.) you:
- Wish to test for an association between two treatment variables.
- Have data that is organised into more than two categories for at least one of the variables and into two or more categories for the second variable.
- Have data that is counts or frequencies and is not percentages or proportions.
- Have observations that are independent of each other.
- Have expected values that are more than 5.
At this stage in our planning we can confirm that all the criteria other than criterion 5 are met. This final criterion can only be checked when the data are collected. However in the current design each type of compost should be present in 34 cells excluding the guard rows and therefore it is likely that this criterion will be met.