Tuesday, September 29, 2009

Martin 2002

Martin AP. 2002. Phylogenetic approaches for describing and comparing the diversity of microbial communities. Applied and Environmental Microbiology 68: 3673-3682.

This author presents a synthesis of a set of statistical techniques for detailed analysis of biodiversity in the context of microbial communities. One new test, the P-test (for phylogenetics) is combined with the FST test to generate inferences about the quantified levels of difference in community composition when examining multiple microbial communities.

A review of existing methods for quantifying diversity is provided first, rapidly pointing out the not-unlikely circumstances under which inter-community differences would be either under- or over-estimated in the absence of explicit phylogenetic inference. Other types of phylogenetic inference in this context are examined, but one main problem with techniques such as the Shannon-Wiener index is its dependency on accurate information about frequency of taxa. The P-test, novel to this paper as far as I can tell, avoids this pitfall, and instead is based on an examination of the covariance between a phylogeny and the distribution of taxa in communities.Figure 3 from Martin (2002). The basis of the P test is the covariance between which community a sequence was found in, and the positions of sequences on the phylogenetic tree.

The P test is combined with the FST test to examine the partitioning of sequence variation between communities. A P test on its own is not particularly informative, because it says little about how variation is partitioned between communities vs. the total pool.The 2x2 grid of comparison of P test and FST test results, from Figure 4 of Martin (2002). Each possible outcome of significance for the two tests allows inference about the evolutionary and ecological history of a particular situation of microbial communities.

The raw data for the P test is sequence data, typically 16S rDNA. This author advocates whole-gene sequences for comparison, to provide the maximum data and maximum compatibility between different studies, but acknowledges the trade-off between sequence length and number of sequences that can be produced. These are also the raw data for FST, but how those raw data are treated before going into each test varies.

Under the P test, the sequence data are used to construct a phylogeny, incorporating all sequence data from all communities. This phylogeny is set to equal total branch lengths from the root to the tips (the tips being the currently-measured sequences), and a null model of branching through time (lineage-per-time) is built. Then the community occurrence of each sequence is mapped onto the phylogeny, and the covariance calculated.

The FST test takes in Theta values as its meat of calculation. Theta is the total genetic variation in a sample, and in FST the grand total theta for all communities combined is compared to the average within-community theta for all communities under consideration.

This combined approach is intended to be complimentary to existing methods of examining microbial diversity, such as methods for estimating species richness, and methods for examining microbial phylogenies. I think the author’s own words at the beginning of the discussion section provide a good summary:

“In this study I used standard quantitative methods of analysis borrowed from population genetics and systematics for describing and comparing microbial communities. Information gained from analysis of DNA sequences provided the basis for statistical analysis of communities in ways that advance inferences about the processes that may govern the compositions and functions of microbial communities. Furthermore, the analytical approaches advocated here make it possible to accomplish broad comparisons of ecological communities. For instance, a comparison of lineage-per-time plots across a diverse set of ecosystems might reveal differences in the phylogenetic compositions of ecological communities that would be invisible with standard ecological statistics that ignore the magnitude of genetic differences among sampled sequences.”

I think I would like to use this approach in the analysis of microbial communities I will conduct based on soil samples from the polar desert. This method seems at this point like a useful way to quantify diversity across the gradient of latitude I will be covering.

No comments: