The most valuable commodity of the 21st century might just be data. Those who hold the most data will run the most successful businesses, produce the most effective social benefits and make the most transformative discoveries. That tremendous value, however, can only be extracted by people with the skills to transform vast and complex data into insight and application.
At Purdue’s Data Mine, the focus is on creating this critical next generation of data scientists who will explore data for industry, government and academia. Launched in 2018, The Data Mine provides an immersive educational environment where data science students across majors live, learn and build this growing discipline together.
The College of Agriculture has participated in The Data Mine from the beginning. Agriculture is one of 20 Data Mine learning communities available to the approximately 1,000 undergraduate and graduate students in the program, and students representing many of the college’s majors have joined with the Data Mine to bolster their data science skills. Large agricultural firms have partnered with the Data Mine to give students real-world opportunities to expand and deploy their data toolkit.
“I think there’s a big opportunity for data science in agriculture,” said Mark Daniel Ward, director of The Data Mine and professor of statistics and (by courtesy) of agricultural and biological engineering, computer science, mathematics and public health. “We sense the excitement from the companies and the faculty, and across all of the different agricultural domains. And I think the students see themselves in a space where they want to study food science or agronomy and still have a very data-centric kind of career. It’s an opportunity to carve out something that other potential job candidates aren’t going to have.”
NEW WAYS OF LEARNING
Farmers used data to manage production long before “data science” was a buzzword. They make decisions based on weather reports, market prices and crop yields, not to mention many qualitative observations that aren’t easily described in numbers. But the proliferation of new technologies and companies providing informatics services has made data literacy a critical skill, from the family farm to larger operations and retailers.
“As resource managers, farmers are constantly making management decisions, but where do they get the information to make those decisions?” asks Bruce Erickson, agronomy education distance and outreach director. “What’s changed now is that farmers can extract more information from their own production. They may use yield monitors, variable rate fertilizer maps, soil test information, aerial imagery from satellites or drones. This has all been feeding the need for agricultural data scientists.”
To create the educational experiences students need to become data experts, the College of Agriculture developed a new cross-disciplinary minor in data-driven agriculture, as well as multiple courses in data science for students interested in these topics. In the first of these courses, developed by Jeffrey Holland, professor of entomology, students learn how to code, clean data, look for trends and patterns, and generate and test their hypotheses. The Data Science for Agriculture course, which Erickson leads, centers those methods in the context of agriculture applications, including units in food science, animal science, forestry, entomology and economics.
THE DATA SCIENCE LABOR GAP
Since 1997, the Departments of Agricultural Economics and Agronomy, working with CropLife, have led a regular survey of agricultural retailers, assessing their views on the use of technology and data in their businesses. The results track the growth of interest in these approaches, with the adoption of technologies such as yield monitoring, satellite imagery and auto-steering vehicles growing five- to twenty-fold over the last two decades.
“Since 2017 the use of data to make decisions by agricultural retailers — the people who sell fertilizer, seeds, feed and pesticides to farmers — has doubled,” says Bruce Erickson, agronomy education distance & outreach director, who leads this effort. “We want to make sure that we’re adequately preparing our students for what they will face in the workplace.”
Erickson also led a team that surveyed agribusiness companies on their labor needs in data science: How easy is it to find qualified candidates with these skills? Respondents said they typically require two to three months or more to fill precision farming positions for agronomists, technical support and precision sales specialists.
Filling these workforce gaps could help build acceptance of agricultural technologies that assist with crop management decisions, such as site-specific pest and nutrient management, on-the-go soil and plant sensors, robotics and drone imagery, which have grown slowly in the U.S. and globally. Students get hands-on exposure to many of these approaches in Erickson’s data-driven agriculture course.
“Many people have expressed that they’re disappointed in how little adoption there has been of some of these technologies,” Erickson says. “Large, mechanized farms around the world are often the first to adopt new technology, but a good portion of the world’s agriculture is accomplished on smaller farms, common in Africa and Asia and parts of Central and South America. Many of these smallholder farms are not mechanized, which usually introduces digital agriculture to a farm, and so they’re missing out. There’s been a real frustration as to how you get those farmers to be able to use some of the benefits of digital agriculture, and data scientists can help make these tools more accessible.”
Both courses are enriched by the model of The Data Mine, where Holland advises for the agriculture learning community. With the students living and learning together, education goes beyond the classroom, seeping into daily conversations between students studying digital agriculture and data science students from other disciplines in the program. The program, which started with 20 statistics students in 2013, grew to 600 last year and has approximately 1,000 students enrolled this academic year, Ward says.
“They have this community of like-minded students working on the same sorts of problems in the same sorts of classes,” Holland says. “It’s pushing the cohort model one step further, in that they’re not just taking the same classes in the same year, they’re actually living in the same building. So they can go off to dinner together, where they’re solving data problems while they’re eating, or they can have meetings in the lobby areas and lounges. It really immerses them a lot more.”
The course offerings are also supported by a speaker series on topics such as data pipelines and data-driven agriculture, presented virtually last year due to the pandemic. Now that in-person instruction has resumed, faculty and outside speakers will deliver talks in the Data Mine dining court, offering informal opportunities to hear about the latest research and meet experts in the field.
“Part of our education needs to be asking the question, What is the future of agriculture?” says Dennis Buckmaster, professor of agricultural and biological engineering and Dean’s Fellow for digital agriculture. “We still have students coming to college where their vision of agriculture is the way that we’ve always done things. And the only way they’ll know a different way is if we teach them the different way.”
One student who came to The Data Mine with few preconceptions about agriculture is Cai Chen, a third-year undergraduate who grew up in suburban Long Island in New York. His parents own a hibachi restaurant, and Chen became interested in the business behind who grows and supplies the food that arrived at his parents’ kitchen. Combined with hobbies of gardening and hydroponics — he once tried to grow spinach and tomatoes in his basement — the interest led Chen to choose Purdue and agriculture for his college education.
Chen also paired his agricultural economics major with a minor in computer science, sensing an opportunity for those skills in the agriculture industry. Learning about The Data Mine before his second year turned that hunch into actionable reality, giving him the opportunity to immediately join a project with corporate partner Beck’s Hybrids while he simultaneously started absorbing data science basics such as Python and SQL through required 1-credit hour seminars.
Though his first-year experience with The Data Mine was virtual, Chen still felt welcomed into a community that combined and supported his interests, with regular team check-ins and online office hours with employees from Beck’s Hybrids, Ward, Holland and other faculty. Chen was part of a Beck’s team that transformed a spreadsheet of numbers into an interactive dashboard to monitor shipping logistics for its seed operations. He spent this summer working with Archer-Daniels-Midland in their grain merchandising division.
“I think data is definitely a huge part of the agriculture industry, and whatever I do, I’m going to take my experience in data science with me,” Chen says. “Doing computer science and data science will put me in a different category, where I’m not competing with people who grew up on a farm. That’s the thing with data science — you might not be from an agriculture background, but being part of The Data Mine really helps me differentiate my application from other candidates.”
- The Purdue Data Mine is an innovative, immersive learning community of approximately 1,000 undergraduate and graduate students studying data science in a variety of majors, including agriculture, and living and learning in a shared environment.
- The College of Agriculture is a core participant in The Data Mine as the college expands its academic programs for data science and computational approaches.
- Students take new courses on data science basics and data-driven agriculture and hear seminars from visiting speakers based in academia and industry. They also work on real-world projects with students from other data science fields and companies from the corporate partners program, including agriculture leaders such as Beck’s Hybrids, Bayer Crop Science, John Deere, Caterpillar, Elanco, Corteva and Halderman Companies.
- Projects include helping visualize supply chain logistics, building new models for predicting crop phenotypes from genetic and environmental information, and extracting insights from drone and satellite images.
- Corporate participants praise The Data Mine for helping students develop the data science skills that are in urgent demand by today’s agriculture industry.
Douglas Abney, who graduated in 2021 in agricultural economics and is now pursuing a master’s degree in the same field, arrived at data science from the other direction. Abney grew up in agriculture, via his family’s beef cattle farm in Bargersville, Indiana, and his grandfather, Scott Abney, professor emeritus of botany and plant pathology in the College of Agriculture. Abney initially thought he wanted to follow his father into engineering with an agriculture focus, but he quickly found himself drawn more to the business side of the field, as well as the coding and data analysis skills he learned in his first year of classes — even writing his own software at home to track his fishing results.
When Abney was looking for a capstone project for his senior year, he came across The Data Mine and joined Chen on the Beck’s Hybrids team.
“It was an opportunity for me to get industry experience while also learning something,” Abney says. “Before I get into my career, I want to learn all these technical skills so I don’t have to learn them on the job. If I could really get deeply involved in data science and agriculture, it helps me for what my future career goals are after school.”
Last summer, he worked on a web scraping project collecting data across the internet on the agriculture jobs market, assessing what skills are in high demand in today’s industry. The research has given Abney insight into the labor landscape he hopes to join after graduate school.
A PATHWAY TO INDUSTRY
The value of these collaborations between Data Mine students and companies cuts both ways, says Emma Alexander (BS ’18, mechanical engineering), who now works as a digital product owner at John Deere and has advised several projects.
“We exist within a kind of siloed industry, and taking these very specific problems to students who have no idea about the business context drives them to ask questions that even cause us to kind of rethink, ‘Well, what is the real problem here? How can we address that?’” Alexander says. “Leveraging the diversity of background and ways of thinking in those teams is really eye-opening for us, especially since a lot of the problems we presented to the students are problems that our teams could pick up and continue to address internally.”
The corporate partners program has been one of The Data Mine’s huge successes, growing to 50 corporate partners projects with companies in manufacturing, transportation, shipping, drug development and communications, as well as national laboratories and the Wabash Heartland Innovation Network. But the largest representation is from the agriculture industry, including companies such as Beck’s Hybrids, Bayer Crop Science, John Deere, Caterpillar, Elanco, Corteva and Halderman Companies, who compete with Big Tech for data science students.
“Every industry is touched by the needs of data science, and it can be challenging to compete with large companies like Apple and Facebook and Google to recruit talent,” says Maggie Betz (BS ’18, actuarial science), managing director of corporate partnerships for The Data Mine. “Students know of these large tech firms and think they want to work there until they come to The Data Mine, where they spend months or years working with agriculture companies. They fall in love with the culture and the types of projects and realize they can have a successful career in an industry they weren’t initially familiar with.”
Brad Fruth, director of innovation at Beck’s Hybrids, agrees. “It has been really cool to walk alongside these students and show them opportunities in an area that they’re normally not thinking about,” he says. “We’re showing them we have huge, exciting problems in agriculture, and you can make a huge impact in Indiana and the world.”
When they join The Data Mine, students can select projects sponsored by corporate partners, forming teams that may include others from agriculture but also students from science, business, engineering and polytechnic majors. Projects range from farm-specific topics such as finding the best soil environments to test seed growth to broader challenges such as using satellite images to detect the boundaries of golf courses, dynamically mapping supply chain logistics, and using machine learning to detect patterns in employee absenteeism.
Most projects use a combination of internal company data and public data sources to address the challenge laid out by the partner. Students are encouraged to find new data sources that could help the research and do the hard work of wrangling the datasets to work together — one of the most demanding and important steps in any data science project. Brian Dilkes, professor of biochemistry and co-advisor on a Data Mine project with Adam Scott of Bayer Crop Science, says the opportunity to sharpen data science skills with real proprietary data is immensely valuable for students.
“First, you have an incredible in-kind donation of data that would be way too expensive to generate for the purpose of doing the experiment,” Dilkes says. “Then the students go into satellite data, weather station data, soil data, all sorts of pieces of information to try and get estimates of what the environment was in those locations, so we can determine whether any of these predict a performance in that year. Eventually, we may even be able to go in and see how well adding this information might have improved decision-making processes at Bayer, which I think would be really cool.”
At the end of the spring semester, when the students presented the results of their work to a team of Bayer employees, Dilkes knew that it was a success from the number of screenshots he could hear participants taking on the web conference.
“The slides were going to be available anyway; it’s not like they were going to disappear,” Dilkes says. “But they were so excited. That kind of immediate haptic feedback was awesome. They nailed it.”
The success of The Data Mine has drawn notice from other schools and funders, and Ward is helping expand the model across the state, beginning with Purdue University Fort Wayne and Purdue Polytechnic Institute. But even as the program grows larger, the focus remains on the individual students, the inspiration for Ward’s alternate explanation of the name Data Mine.
“It’s the sense of, I can take the data and make it mine. I can take this experience and make it very much my own,” Ward says. “There is that sense of ownership — student-oriented, student-facing, student-driven data.”
Purdue Agriculture, 615 Mitch Daniels Blvd, West Lafayette, IN 47907-2053 USA, (765) 494-8392
© 2024 The Trustees of Purdue University | An Equal Access/Equal Opportunity University | USDA non-discrimination statement | Integrity Statement | Copyright Complaints | Maintained by Agricultural Communications
Trouble with this page? Disability-related accessibility issue? Please contact us at firstname.lastname@example.org so we can help.