[Home]   [Full version]  

Future of biology rests in harnessing data avalanche

Sep 04 ,General Science



Full size image
(PhysOrg.com) -- Like most sciences, biology is inundated with data. However, a group of researchers warns in a Nature feature that the avalanche of biological information is at the point where the discipline may be unable to reach its full potential without improvements for curating data into on-line databases. The commentary appears in the September 4, issue of the journal and outlines specific remedies to harness the information overload.

By July 2008, data-extractors or curators had indexed over 18 million articles in PubMed and sequences of over 260,000 organisms into GenBank. Both are examples of databases where biological information is stored for public access. Data curation is very labor intensive.

“There is a lack of standardization or consistency in the way scientists report their findings in different journals,” remarked corresponding author Sue Rhee of the Carnegie Institution’s Department of Plant Biology and principal investigator of The Arabidopsis Information Resource (TAIR). “In some cases the researchers don’t even specify the species of a gene under study. That leaves biocurators, who have advance degrees in biology, and expertise with databases and scripting languages, to read the full text and transfer the essence of the information into specific fields in the database. They spend a lot of time just figuring out the basics. And that leaves a lot of room for error.”

Curation is not just a data organization tool. Such input has become essential to biological research. The authors note that eleven different databases had ľ of a million visitors who viewed 20 million pages in just one month. And with inference programs that feed on the curated data, researchers can now tap into other work that relates to theirs and use that data in their own experiments—a huge advancement that is accelerating the pace of biology. “With this vast universe of information, the whole nature of experimentation is changing,” continued Rhee. “But the field is being held back with the curation backlog.”

The group of authors outlined a series of solutions to the problem. The first is to have authors input their data directly into databases upon acceptance in refereed journals. This step has already begun with Plant Physiology and TAIR. When a manuscript in accepted, researchers now fill in a web form about Arabidopsis genes. Second, the commentators urge the biological community to adopt standard reporting formats that are universally agreed upon. And third, curation needs to be elevated by academic institutions and funding agencies. There should also be incentives for researchers to curate their own data, such as increases in academic recognition, career advancement, and funding. They additionally suggest that “community annotation” could be modeled after large-scale astronomy projects like the Sloan Digital Sky Survey, or the Galaxy Zoo, where 80,000 astronomers and interested amateurs classified one million galaxies in less than three weeks.

“The effort and cost required to curate the data is small compared with the cost of carrying out the research in the first place, yet this additional step adds tremendously to the value of the research results to society,” commented Eva Huala, director of TAIR.

Wolf Frommer, acting director of Carnegie’s Department of Plant Biology noted that “advances in our understanding of biology will affect our food supply, our health-care system, the development of remedies for climate change, and many other aspects of daily life. Basic and applied research have to go hand in hand with curation of databases so that humanity can adapt to the quickly changing world as fast as possible.”

Provided by Carnegie Institution

Related stories:

Studies offer guide as protein interaction mapping comes of age
During the past 20 years, researchers have identified thousands of cell protein interactions, with the ultimate goal of inventorying all that occur within cells of various organisms - a comprehensive catalogue known as the interactome. Such information will be critical to understanding the basic mechanics of cellular life, and how malfunctions in these processes contribute to cancer.
Team finds breast cancer gene linked to disease spread
A team of researchers at Princeton University and The Cancer Institute of New Jersey has identified a long-sought gene that is fatefully switched on in 30 to 40 percent of all breast cancer patients, spreading the disease, resisting traditional chemotherapies and eventually leading to death.
Researchers Get to Root of Nematode Genome
(PhysOrg.com) -- North Carolina State University scientists and colleagues have completed the genome sequence and genetic map of one of the world's most common and destructive plant parasites – Meloidogyne hapla, a microscopic, soil-dwelling worm known more commonly as the northern root-knot nematode.
Structure of key epigenetics component identified
Scientists from the Structural Genomics Consortium (SGC) have determined the 3D structure of a key protein component involved in enabling "epigenetic code" to be copied accurately from cell to cell.
MSU to create genomic clearinghouse for biofuel crops
Michigan State University scientists, armed with a half-million-dollar federal grant, are creating an easily accessible, Web-based genomic database of information on crops that can be used to make ethanol.
WikiPathways gives the people the power to curate
The exponential growth of diverse types of biological data presents the research community with an unprecedented challenge to keep the flood of biological data as accessible, up-to-date, and integrated as possible.
Tips on how to build a better home for biological parts
Researchers at the Virginia Bioinformatics Institute (VBI) at Virginia Tech have compiled a series of guidelines that should help researchers in their efforts to design, develop and manage next-generation databases of biological parts. The stakes are high: the concept of biological parts is essential if methods developed in other fields of engineering are to be applied to biology.
Methane Formation in the Oceans: New Pathway Discovered
(PhysOrg.com) -- A new pathway for methane formation in the oceans has been discovered, with significant potential for advancing our understanding of greenhouse gas production on Earth, scientists believe.

News discussion:

General Science news

[Home]   [Full version]