Skip to Main Content

Purdue-USDA team develops fast-track process for genetic improvement of plant traits

Researchers interested in improving a given trait in plants can now identify the genes that regulate the trait’s expression without doing any experiments.

Purdue University’s Kranthi Varala, and 10 co-authors published the details of the new web-based regulatory gene discovery tool in the April 23 issue of Proceedings of the National Academy of Sciences. Varala has a patent pending on the results that relates to economically important seed oil biosynthesis.

The Purdue-USDA team sought to build a resource that learns, from large amounts of publicly available data, to quickly identify what special genes called transcription factors regulate the expression of a given trait in various plant species.

“Every study focuses on a handful of them,” said Varala, assistant professor of horticulture and landscape architecture. “Our premise was that if we can put all of it into a single analysis, then we can use this data to build something global.”

Arabidopsis served as the PNAS study’s model plant, “but this approach has nothing specific to Arabidopsis,” Varala said. “The approach is general enough that you could start with a corn dataset. You could do it with rice, with tomato, whatever crop you’re working on as long as you have thousands of gene expression measurements that people have done. And there are over a dozen species now where we have tens of thousands of gene-expression studies.”

To prove the system works, the team focused on a genetic pathway that regulates how plants make and store oil in their seeds. The team picked that trait because of its importance in food and biofuel production, and because more than 300 of the genes involved are already known. 

By genetically manipulating a plant’s transcription factors, researchers can increase or decrease the amount of oil produced in its seeds.

Arabidopsis seedlings being cultivated for research to study the effects of specific genes on traits such as rate of growth, plant size etc. Arabidopsis seedlings being cultivated for research to study the effects of specific genes on traits such as rate of growth, plant size etc.

Like other researchers, Varala has pursued many projects over the years where his goal was to identify the genes and regulators involved in solving one problem. This meant conducting careful, time-consuming experiments. But the data generated fell short of providing all the answers he sought. He compared it to working an equation knowing only three of the 10 factors involved.

“You can’t solve the equation,” he said. Likewise, Varala often wanted to ask more questions than the data could answer. That motivated him to build a framework that uses all possible data to ask those questions without having to do all the relevant experiments to obtain a list of candidates that then need genetic validation.

“I’m trying to short-circuit the initial data collection phase,” Varala said, so that scientists can focus on conducting the genetic validations. But to do so, his team had to begin with a dataset based on 18,000 individual studies.

Varala and his team analyzed this massive dataset using the Bell and the now-retired Brown supercomputers at Purdue’s Rosen Center for Advanced Computing. The team built a machine-learning framework to speed the process for others.

It would be impossible for one person to do this manually. A team could do it, but that would introduce biases in how group members process the data. The machine-learning classifier operates without bias.

The novelty of the approach is that instead of pulling data related to all organs, it focuses on organ-specific datasets. Independent gene networks regulate these organs — leaves, roots, shoots, flowers and seeds.

“Instead of using all organs, we said, within the seed experiments that people have done over the years, can we use all the data to learn something that’s happening in the seed and not necessarily the root or the leaf or the flower? That improved our approach a lot.”

The team used a computational method called the inference approach to predict what transcription factors were going to regulate the seed oil biosynthesis process in Arabidopsis. 

“The ones we know help us validate that our approach is working correctly. The ones that we don’t know are good candidates for finding out new biology,” Varala said. “This purely computational approach knows nothing about seeds or oil or anything like that. We gave it a list of genes and it was able to rediscover the known ones without knowing any biological context.”

The lead author, Rajeev Ranjan , a postdoctoral researcher in the department of horticulture and landscape architecture at Purdue, took the other 12 of the top 20 and asked if those predictions are true. “We were able to generate mutant lines for 11 of those 12. Five of those 11 do change the seed oil content,” he said. “Further, we also showed that overexpression of one factor increases seed oil up to 12%.”

Rajeev Ranjan, a postdoctoral researcher in horticulture and landscape architecture, analyzes genetically modified Arabidopsis seeds that have higher oil content to confirm that other agronomically important traits, including seed size and seed per fruit, are not negatively affected. Rajeev Ranjan, a postdoctoral researcher in horticulture and landscape architecture, analyzes genetically modified Arabidopsis seeds that have higher oil content to confirm that other agronomically important traits, including seed size and seed per fruit, are not negatively affected.

The eight known regulatory genes, added to the eight new ones, showed that the inference approach accurately identified 13 of the top 20 candidates. The strength of the approach is working only from a list of genes, it can predict with high accuracy which ones will regulate a trait of interest.

“It took a long time to do because it’s a long, complicated process, and there was no guarantee that it would work,” said Varala of the four-year project. “Nothing on this scale had been attempted before.”

Varala has disclosed the innovation to the Purdue Innovates Office of Technology Commercialization, which has applied for a patent to protect his intellectual property.

This research was supported by the U.S. Department of Energy Office of Science.

Featured Stories

Yunmei Huang: Purdue’s Esri student of the year
Yunmei Huang: Purdue’s Esri student of the year

What if? It’s a question that drives innovation, and one that inspires Yunmei Huang, a PhD...

Read More
Dairy cows
Milk and motorsports: dairy’s lasting legacy in the Indy 500

Sunday, Hoosiers and racing fans from across the country gathered to watch the iconic Indy 500...

Read More
group of awardees
2025 Department of Biochemistry Spring Awards

The following awards, scholarships, and recognitions are presented by the department each Spring.

Read More
Jim Forney portrait
Professor Jim Forney Retires After 36 Years

Professor Jim Forney retired in Fall 2024 after 36 years at Purdue. During that time, he has...

Read More
powder being poured into a beaker
2024-25 Department of Biochemistry Publications

Publications associated with Department of Biochemistry faculty, postdoctoral researchers, and...

Read More
Carlos Corvalan, associate professor of food science and the project’s supervisor works on screen in lab at Purdue.
From lab to table: Purdue Food Science research predicts texture with machine learning

The creaminess of custard. The fizz of foam. The slurpability of soup. Texture is just as...

Read More
To Top