Bioinformatics key to using genetic and metabolite data for breeding

As scientists collaborate on a global effort to map RTB genes and metabolites and link that information to traits, they are increasingly reliant on bioinformatics – the use of computers to organize, analyze and share biological data. The effort requires the creation of web platforms that will allow researchers in different countries to upload, access, and analyze data, and facilitate the use of that information for crop improvement.

The field of bioinformatics is relatively young and rapidly evolving, driven largely by advances in genomics (the use of DNA sequencing to map a species’ genome and document genetic variation). Gene sequencing produces vast amounts of data, which researchers need significant computer power to manage, clean, and assemble before they can even begin to associate it with traits. They then need interactive platforms to share the information with other scientists.

The bioinformatics component of the current RTB genomic research is especially challenging because the genomics are being complimented by metabolomics (the study of the metabolites involved in cellular processes), and data from both areas will be used for phenomics (the study of how genes, the metabolic processes they control and the environment determine phenotype, or traits).

The effort involves vast amounts of data – for example, there are approximately 31,000 genes in cassava, 37,000 in banana and 39,000 in potato, whereas each of those crops may contain as many as 20,000 metabolites. Genetic sequences and metabolite profiles will be completed for between 1,000 and 2,000 different accessions by the end of 2004, data that will then be associated with phenotype information from greenhouses and different field environments.

“Having large amounts of data will allow us to link as much of the genomic and metabolomic information as possible to phenotype,” said Luis Augusto Becerra Lopez-Lavalle, a molecular geneticist at the International Center for Tropical Agriculture (CIAT) and the theme leader for RTB’s theme two (development of improved varieties). “The challenge is that we are generating all this data, but we need to manage it. We need to convert big data into smart data, so that we can translate them into breeding gain. To do that, we need many brains working together.”

One of the principal brains focused on RTB genomic data management is that of Manuel Ruiz, a researcher at the French Agricultural Research Centre for International Development (CIRAD) who is a visiting scientist at CIAT. His team was involved in the first complete sequencing of the banana genome at CIRAD and is supporting the data management side of ongoing gene sequencing and metabolite profiling of different banana varieties.

Manuel RuizOne of Ruiz’s missions at CIAT is to transfer some of the lessons learned from banana to the other RTB crops. He explained that one of his first projects is to develop bioinformatics tools for correlating the metabolite and genetic data being generated for banana.

“The idea is to develop something generic while working on banana that can be used for cassava and other crops,” said Ruiz, adding that for cassava, he will need to collaborate with the Next Generation Cassava Breeding Project, based at Cornell University.

“One of our ideas is to develop platforms to facilitate association studies using information from genes and metabolites. There currently aren’t many tools to do that, so we need to develop some,” he said.

Ruiz helped to develop, and serves as the scientific manager of the South Green bioinformatics platform – a clearinghouse of tools and databases for genomic research in Mediterranean and Southern Hemisphere crops that was created by CIRAD, Bioversity International, The French National Institute for Agricultural Research (INRA), Montpellier SupAgro university and the Institute of Research for Development (IRD).

“We didn’t try to reinvent the wheel, or develop everything ourselves,” Ruiz said of South Green, explaining that they usually tried to adapt existing software. He said that even though metabolic association studies are relatively new, significant progress has been made on crops such as rice. He and Becerra identified advanced research institutions working on rice as potential strategic partners for RTB’s genomic and metabolomic work.

SouthGreen

Ruiz observed that the initial challenges are managing all the data and helping scientists in different parts of the world to access and analyze it. He noted that one big issue is disk storage capacity, since the ongoing genetic sequencing and metabolite profiling are producing terabytes of data. He added that cloud technology could be one solution for helping researchers access data.

“We need high-power capacity for data storage and analysis. We need to use multiple servers with nodes in order to perform many tasks in parallel. We are now more focused on the genetic information, but we have to prepare for the metabolic information that is being generated,” he said.

Once the initial data management issues are resolved, and researchers begin to link genes and metabolites to crop traits, Ruiz will be involved in the development of web platforms to help breeders use that information for crop improvement.

Ruiz observed that the way to deal with all these challenges is to collaborate with scientists at other institutions involved in similar research. He hopes that such collaboration will not only help RTB meet the current bioinformatic challenges, but that the different genomic platforms will eventually all become connected and that the tools developed for one will be shared with the others.

“For bioinformaticians to survive, we need to collaborate,” he said. “We can’t work alone. It’s impossible.”

See also: Bioversity International and CIRAD organized a crash course in banana bioinformatics in November 2013