Skip to main content

Exercise 2

This exercise is based on research carried out by an undergraduate. The topics covered are those in 4.1 and appendix b.

Example W4.2: Levels of bioavailable heavy metals in sediments of a rural and an urban stream

The sediment from two similar streams one which ran through an urban area and one a rural area was sampled. 10 samples were taken from the centre of each stream at 1m intervals. The sediments were dried and after preparation were analysed using an atomic absorption spectrophotometer. The concentration (µg metal g-1 sediment) of lead, zinc, copper, nickel and cadmium were established in each sample.

1

Q W2.1

Which statistical test(s) should be suitable for analysing these data?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

t test for unmatched data.

To come to this conclusion you should follow the steps outlined in appendix b.

B.1 What type of investigation am I designing?

This is an experiment, you are starting out with a question (hypothesis) (go to B.2).

B.2 Which type of hypotheses am I testing?

There are three types of hypotheses which you need to choose between. If you are not sure which type of hypotheses you will be testing read the information in B2.1 - B.2.3 before deciding. For more information about hypotheses and hypothesis testing read Chapter 4.

In our example the student does not have an expectation. She only had one treatment variable for each metal tested (streams) and so did not want to test for an association. Therefore she wished to test for differences (Hypotheses type 3).

B2.3 Do samples come from the same or different populations?

These are the type of hypotheses that are most frequently investigated by undergraduates. There are many tests that will test this type of hypotheses. These tests fall into parametric tests (Chapter 7) to be used when you have Normally distributed data and non-parametric tests (Chapter 8) when your data are not Normally distributed. To tell if your data are Normally distributed refer to BOX 3.2.

In this example the observations recorded will be 'concentration of a particular metal' (µg metal g-1 sediment ). Although this is a derived variable it is an interval scale and therefore at this stage we should consider parametric tests.

B.2.3.1 Parametric tests

These are largely selected on the basis of the experimental design - how many variables, how many categories in each variable, how many replicates in each category. In this example there is one treatment variable (streams), two samples (an urban stream and a rural stream) and each with 10 observations. From the table it would appear that a t or z test for unmatched data may be appropriate.

Experimental design

Test

You have one treatment variable. You are going to compare two samples. The data is unmatched.

t or z test for unmatched data (7.1 or 7.2).

You have one treatment variable. You are going to compare two samples. The data is matched.

t or z test for matched data (7.3)

You have one treatment variable. You are going to compare two or more samples. You wish to test general and specific hypotheses.

One-way parametric ANOVA and Tukey's test (7.5 and 7.6)

You have two treatment variables. Each variable has at least two categories or classes and all categories from one variable are combined with all categories from the second variable. You wish to test general and specific hypotheses.

Two-way parametric ANOVA and Tukey's test (7.7 and 7.8)

You have two treatment variables. Each variable has at least two categories. One variable is randomised or nested with respect to the second variable. You wish to test general hypotheses.

Two-way nested ANOVA (7.9)

You have three treatment variables. Each variable has at least two categories and all categories from each variable are combined with all other categories from the other variables. You wish to test general and specific hypotheses.

Three-way parametric ANOVA (7.10)

None of the above

Chapter 8. and Sokal & Rohlf, 1981.



When we examine the criteria for the t and z tests for unmatched data it is clear that given there are only 10 observations in each sample we should consider the t test. To use the t test for unmatched data (7.2.1) you:

  1. Wish to test for differences in population means.
  2. Have one treatment variable and two samples.
  3. Have unmatched data.
  4. Have parametric data.
  5. Have fewer than 30 observations in each sample but the sample sizes need not be equal.
  6. Have homogeneous variances.

We can see that criteria 1, 2, 3 and 5 are met by the current design. It is not possible to confirm with any confidence that the data are parametric until they are available for checking. Similarly, you cannot carry out an F test to confirm that the variances are homogeneous until you have the data. At this point however it appears that a t test for unmatched data may be suitable. This test would be used to compare the concentration of each metal in the two streams in turn.

Check your answer

2

Q W2.2

When the data were collected it was found that they were not Normally distributed. What might be an appropriate non-parametric statistical test to use instead?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

Mann Whitney U test.

Full explanation of this answer:

We know from A2.1 that the student wishes to test for differences; she had one treatment variable (streams), two samples (an urban and a rural stream) and each had 10 observations. If these data are non-parametric then we should refer to 2.3.2 in appendix b.

B.2.3.2 Non-parametric tests

The choice of these tests is similar to choosing the parametric tests. Your selection depends on how many treatment variables you are planning to examine, how many categories in each variable, and how many replicates in each category.

In this example, if the data are non-parametric, then a Mann Whitney U test may be appropriate.

Experimental design

Test

You have one treatment variable. You are going to compare two samples. The data is unmatched. You have 20 observations or less in each sample.

Mann Whitney U test (8.1)

You have one treatment variable. You are going to compare two samples. The data is unmatched. The data is measured on a continuous scale and you have more than 30 observations in each sample.

z test for unmatched data (7.1)

You have one treatment variable. You are going to compare two samples. The data is unmatched. You have more than 20 observations in each sample.

Sokal & Rohlf, 1981.

You have one treatment variable. You are going to compare two samples. The data is matched. You have less than 30 pairs of observations.

Wilcoxen's rank paired test (8.2)

You have one treatment variable. You are going to compare two samples. The data is matched. You have more than 30 pairs of observations.

z test for matched data (Chapter 7 (7.2).

You have one treatment variable. You are going to compare two or more samples. You wish to test general and specific hypotheses.

One-way ANOVA (Kruskal Wallis test)( 8.3. and 8.4)

You have more than one treatment variable. You are going to compare two or more samples. You wish to test general and specific hypotheses. You will be using a calculator.

Two-way non parametric ANOVA (8.5. and 8.6)

You have more than one treatment variable. You are going to compare two or more samples. You wish to test general hypotheses. You want to use a computer.

Scheirer-Ray-Hare test (8.7).



The criteria for using a Mann Whitney U test (8.1.1) are that you:

  1. Wish to test for differences in population medians.
  2. Have one treatment variable and two samples.
  3. Have data that is non-parametric and unmatched.
  4. Have data that can be ranked (3.1 and 3.8.2).
  5. Have two samples which both have a similar shaped distribution. For example if one distribution is skewed to the left and the other to the right (3.4.4) then you should not use this test. If this does arise you could try transforming the data (3.9).
  6. Should not use this test if one sample has only one observation or if both samples have less than 5 observations each.
  7. Need not have equal sample sizes.

In the current design all criteria other than 5 are met. You must examine the data to confirm that this final criterion is met. Assuming this is the case then this appears to be an appropriate test to use. This test would be used to compare the concentrations of each metal in the two streams in turn.

Check your answer

3

Q W2.3

Write suitable hypotheses assuming these data are parametric and that you are comparing the concentrations of lead.

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

H0: There is no difference between the mean concentration of lead (µg lead g-1 sediment) in 10 samples each from an urban and a rural stream.

H1: There is a difference between the mean concentration of lead (µg lead g-1 sediment) in 10 samples each from an urban and a rural stream.

Check your answer

4

Q W2.4

In what way would these hypotheses differ if the test were a non-parametric one instead?

[If you would like to save a record of your answer, please type it into this Word document instead of the text box below]

The alternate non-parametric test was the Mann Whitney U (A2.2.). Therefore the hypotheses would be very similar however the 'mean' would be replaced with 'median'.

H0: There is no difference between the median concentration of lead (µg lead g-1 sediment) in 10 samples each from an urban and a rural stream.

H1: There is a difference between the median concentration of lead (µg lead g-1 sediment) in 10 samples each from an urban and a rural stream.

Check your answer