Published in Macalester Today
BY ERIN PETERSON
ILLUSTRATIONS BY THOM SEVALRUD/i2iART.COM
BIG DATA IS GIVING SCIENTISTS ENTIRELY NEW WAYS to understand ourselves and the world around us. Thanks to a new grant, Macalester students will get a jump on this important scientific movement.
When the U.S. Human Genome Project began in 1990, it took scientists 13 years to determine the sequence of all 3 billion chemical base pairs that make up human DNA. Last February, a company in the United Kingdom announced that its technology could finish the same task in an afternoon.
The speed at which we are generating new data is staggering. And there’s no question that our world is awash in data in ways that would have seemed inconceivable just a decade or two ago, from Google searches that turn up billions of results to millions of newly digitized health records. These large, complex data sets are too big to handle with typical database tools such as spreadsheets, but they offer tantalizing opportunities for new discoveries and innovation. Indeed, the U.S. government recently committed $200 million to focus on big data projects. But the data itself means nothing without the ability to understand and interpret it.
Unlocking the power of these numbers—finding trends, sussing out quirks—requires programs and tools that go beyond the typical quantitative skills traditionally taught in the college classroom. But that needs to change, says biology professor Paul Overvoorde. “To handle these increasingly complex and big data sets, students need to be able to manipulate, reformat, recategorize, and explore them,” he says. “This is going to be a fundamental skill they’re going to need as they move forward in their careers.”
Howard Hughes Medical Institute (HHMI) is betting on Macalester to develop curricula and programs that do just that—and not only for its own students but potentially for schools across the nation. HHMI, known for providing science education grants to schools that pursue large and challenging ideas, recently awarded Macalester a four-year, $1.3 million grant to develop ambitious teaching and research programs linked to big data. “You could call it venture capital,” says Helen Warren, director of corporate and foundation relations. “With fewer restrictions [than other grants], we have the chance to experiment. But we’ll do so using ways of learning that have been proven over time.”
Macalester will be one of the first schools in the nation to bring cutting-edge data computation classes to all its science majors. And through additional fellowships, more than 50 students will be able to use those skills on real-world research projects located around the globe. With several other top liberal arts schools looking on, Macalester’s work has the potential to have national impact.
Teaching the Grammar of Data
Twenty years ago, science students could get by with a working knowledge of a spreadsheet program. Those days are long gone, says Danny Kaplan, DeWitt Wallace Professor of Mathematics and Computer Science. “Excel isn’t going to cut it,” he says. “In today’s world, students can’t escape big data. Though it won’t be easy to teach it, it will only get harder as they move into their professional training.”
To that end, Kaplan and computer science professor Libby Shoop have developed a one-credit class called Data Computation Fundamentals, which is being offered beginning this semester. Though Kaplan doesn’t pretend the course can address all the complexities of specific software packages, he does hope it will provide a framework that students can apply when they come across databases or data-reliant programs in biology, chemistry, and physics. “We believe we can give students that grammar of data that they need to use these modern capabilities,” he says.
For example, a student who wants to compare the results from a genome-scale experiment done with yeast to those in a large, public database will need an understanding of data structure to do so. They will be able to download data, analyze the information, and then draw conclusions based on the comparisons.
Building a course like this may seem only logical, but so far, few have tried. Most computational courses today are graduate-level courses designed for specialists, so Macalester’s smaller and more generalized approach is unique, according to provost Kathleen Murray. “The way we’re thinking about approaching the course could become a model for other schools,” she says.
Indeed, five other schools, including Smith and Grinnell Colleges, have signed on as members of the Computation and Visualization Consortium, a project led by Macalester that will help faculty from all the colleges develop, refine, and share computation course ideas and materials. Kaplan is also developing the data computation fundamentals course with an eye toward online learning, so that students at Macalester and elsewhere could take the class when it’s most convenient for them.
The goal, says Overvoorde, is to develop courses that will last long beyond the four years of grant funding. “When you build a network of colleagues like this, it not only increases the innovation, but improves the chances of sustainability,” he says.
Big Data, Bold Research
To help students make the most of new coursework, Macalester has developed four new research opportunities with the help of HHMI grant funding. As a result, dozens of students will have the opportunity to apply their computational knowledge to research on campus, in the Twin Cities, and halfway around the world.
Perhaps the most ambitious is the Global Health Scholars Program, led by biology professors Liz Jansen and Devavani Chatterjea. It began last fall with an upper-level biology and chemistry course called Projects in Global Health, in which a dozen students studied topics ranging from tuberculosis diagnostics to meningitis. They each tackled a specific project—such as differing rates of tuberculosis infection among HIV-positive and non-HIV-positive patients—by analyzing current data and primary literature. They hope to come up with new insights based on their research.
In January, the students headed to Uganda, where they presented reports to a group of physicians, scientists, and government officials. Students’ work, believes Jansen, will have a real impact. “Our collaborators may have access to the same literature we do, but our students have the time to do a deep dive into the technical literature and synthesize it in a way that hasn’t been done before,” she says. In future years, students may tackle some of the reams of data that these same experts have on paper—but haven’t yet been committed to electronic databases.
Meanwhile, the Macalester–HHMI Data Researchers program will allow rising juniors and seniors to do big data summer research. Although the work can be done at Macalester or any major research institution, says Overvoorde, some students may choose to pursue, for example, work already underway at Macalester’s Katharine Ordway Natural History Study Area in Inver Grove Heights. There biology professors Mark Davis, Jerald Dosch, Mike Anderson, and Sarah Boyer are working on a collaborative project. “Students collect data about tree size and growth to look at the rates of carbon sequestration and the impact, potentially, on global warming,” he says. Combined with the work of researchers from more than 50 other schools, the Ecological Research as Education Network database provides a rich source of information that can be mined in many ways.
Another program, the Hughes Young Investigators program, is designed to give on-campus research opportunities to a diverse group of students the summer after their freshman year. The Academic Health Sciences Summer Research Program will send students planning to attend medical school to the University of Minnesota, where they’ll team up with physicians who also run independent research labs.
Creating and exploring big data sets offers one of the most irresistible opportunities for innovation and discovery in the sciences today. But it’s clear that the work being done now is just scratching the surface of possibility. From the Census to Facebook to Amazon, the ways we can learn from the deluge of data are growing.
Today, the focus on teaching big data skills is intended primarily for science students, but as time goes on there will likely be interest from other fields as well. Social scientists, including sociologists, economists, and anthropologists, are starting to use these tools as well. And a small but growing interest in the “digital humanities” also may find increasing value in big data projects, says Murray. “The notion of how we think about data is going to permeate all of higher ed,” she says. “And the work we’re doing through the grant will prepare a whole cohort of students for the kind of analysis that will be necessary in the work they do beyond Macalester.”
Minneapolis writer Erin Peterson is a regular contributor to Macalester Today