Biodefense research is a high national priority, not only for the sake of improving security in the face of threats from infectious disease, but also for the insights that biodefense and its related bioinformatics offer to the basic science of understanding pathogens. A colloquium was convened to bring together experts in biodefense, bioinformatics, molecular biology, microbiology, information technology, and other fields to discuss ways in which bioinformatics practice and support can be changed to improve the speed of discovery for the benefit of national security and of scientific progress. Participants made specific recommendations for improving Data Description and Repositories, Algorithm and Software Resources, and Infrastructure Resources.
Many features of databases, including their funding, consistency, and usability, pose problems for bioinformaticists. It is critical that bioinformatics database resources remain in the government-funded domain. Moreover, a plan for data sharing should be defined for any new bioinformatics database resources that are developed, and researchers who make use of such a database should be held to a contract of collaboration with the database.
Efforts to develop improved ontologies (constructs for annotating data in a semantic and computationally usable form) to meet the special needs of biodefense should be nurtured and intensified to fill gaps in the breadth of pathogen-specific ontologies. While additional work is needed on ontologies in many aspects of biological science, ontologies relevant to biodefense are needed to accelerate the construction of integrated databases relevant to both biodefense and human health programs.
Researchers working in biodefense and bioinformatics need more and better software tools for making phylogenetic assignments, strain typing, and for detecting engineered organisms. In the future, as sequencing technology improves and the rate by which data are produced continues to climb, all of our current big, centralized, monolithic systems for handling and storing data will, inevitably, break. We should implement scalable new approaches and move to more distributed systems now.
There is a deficit in statistical training for bioinformaticists that represents a significant, discreditable weakness in the field. Educators must begin to provide appropriate statistical training for all experimentalists.
The submission tools of the National Center for Biotechnology Information (NCBI) are not adequate for the size and complexity of the data sets users want to submit, making the submission of sequence data to NCBI’s GenBank a bottleneck in the process of publishing bioinformatics results. Although it is clear that sequence management is in need of a new paradigm, the characteristics of a new paradigm are not apparent. There may be a role for Google and its various tools and resources to enable information to be shared in a more open and scalable manner. Scaling NCBI to meet the needs of biological data for the next decade should be a high priority.