Zuur AF, Ieno EN, Elphick CS. 2010. A protocol for data exploration to avoid common statistical problems. Methods in Ecology & Evolution doi: 10.1111/j.2041-210X.2009.00001.x
These authors present a step-by-step guide and recommendations for data exploration, a procedure in analysis of statistical data that should be carried out before primary statistical techniques such as regression. The point of data exploration is to look for errors in measurement, calculation or data-entry, to remove outliers, and to ensure no critical assumptions are being violated. Data exploration is not an instantaneous process, and may take up to 50% of the time spent on data analysis.
Their Figure 1 shows the steps in data exploration. Not all steps need be conducted for every dataset, for example, PCA is not sensitive to normal distribution, so the construction of histograms to evaluate normality is not necessary. On the other hand, almost all statistical techniques are very sensitive to violations of the assumption of independence.
(To avoid potential copyright issues, I have not pasted Fig. 1 from the paper here)
Figure 1 from Zuur et al. (2010). The procedures in italics are described in detail in this paper.
This paper was assigned reading for a course I am taking, Plant Sciences 813, Statistical Methods in the Life Sciences. I think the advice and instructions here will be useful.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment