Signature genes as a phylogenomic tool.
until further notice
SourceMolecular Biology and Evolution, 25, 8, (2008), pp. 1659-1667
Article / Letter to editor
Display more detailsDisplay less details
Molecular Biology and Evolution
SubjectIGMD 8: Mitochondrial medicine; NCMLS 2: Metabolism, transport and motion; NCMLS 4: Energy and redox metabolism; UMCN 5.3: Cellular energy metabolism; NCMLS 2: Metabolism, transport and motion
Gene content has been shown to contain a strong phylogenetic signal, yet its usage for phylogenetic questions is hampered by horizontal gene transfer and parallel gene loss and until now required completely sequenced genomes. Here, we introduce an approach that allows the phylogenetic signal in gene content to be applied to any set of sequences, using signature genes for phylogenetic classification. The hundreds of publicly available genomes allow us to identify signature genes at various taxonomic depths, and we show how the presence of signature genes in an unspecified sample can be used to characterize its taxonomic composition. We identify 8,362 signature genes specific for 112 prokaryotic taxa. We show that these signature genes can be used to address phylogenetic questions on the basis of gene content in cases where classic gene content or sequence analyses provide an ambiguous answer, such as for Nanoarchaeum equitans, and even in cases where complete genomes are not available, such as for metagenomics data. Cross-validation experiments leaving out up to 30% of the species show that approximately 92% of the signature genes correctly place the species in a related clade. Analyses of metagenomics data sets with the signature gene approach are in good agreement with the previously reported species distributions based on phylogenetic analysis of marker genes. Summarizing, signature genes can complement traditional sequence-based methods in addressing taxonomic questions.
Upload full text