Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes—including bacteria, archaea, viruses, and microbial eukaryotes— provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing.
Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact.
The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed.
The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin, since not only are their genomes available, but they are also accompanied by data on environment and physiology that can be used to understand the resulting data. As single cell isolation methods improve, there should be a shift toward incorporating uncultured organisms and communities into this effort.
Efforts to sequence cultivated isolates should target characterized isolates from culture collections for which biochemical data are available, as well as other cultures of lasting value from personal collections. The genomes of type strains should be among the first targets for sequencing, but creative culture methods, novel cell isolation, and sorting methods would all be helpful in obtaining organisms we have not yet been able to cultivate for sequencing. The data that should be provided for strains targeted for sequencing will depend on the phylogenetic context of the organism and the amount of information available about its nearest relatives.
Annotation is an important part of transforming genome sequences into useful resources, but it represents the most significant bottleneck to the field of comparative genomics right now and must be addressed. Furthermore, there is a need for more consistency in both annotation and achieving annotation data. As new annotation tools become available over time, re-annotation of genomes should be implemented, taking advantage of advancements in annotation techniques in order to capitalize on Large-Scale Sequencing: The Future of Genomic Sciences? 2 the genome sequences and increase both the societal and scientific benefit of genomics work.
Given the proper resources, the knowledge and ability exist to be able to select model systems, some simple, some less so, and dissect them so that we may understand the processes and interactions at work in them. Colloquium participants suggest a five-pronged, coordinated initiative to exhaustively describe six different microbial ecosystems, designed to describe all the gene diversity, across genomes. In this effort, sequencing should be complemented by other experimental data, particularly transcriptomics and metabolomics data, all of which should be gathered and curated continuously.
Systematic genomics efforts like the ones outlined in this document would significantly broaden our view of biological diversity and have major effects on science. This has to be backed up with examples. Considering these potential impacts and the need for acquiescence from both the public and scientists to get such projects funded and functioning, education and training will be crucial. New collaborations within the scientific community will also be necessary.