Data Science course encourages non-data-science students to apply skills to real-world data sets

A new course co-sponsored by the computer science and statistics departments, Compsci/Stats 190F: Foundations of Data Science, isn’t open to computer science or statistics students. The class is designed to teach non-computer-science and non-math/statistics students the basics of data science, and encourage them to think about how data analysis can solve problems in their own field of study.

“There’s a huge forecasted gap in data science expertise,” said Benjamin Marlin, associate professor in the College of Information and Computer Sciences, who co-taught the class along with Pat Flaherty, assistant professor in the Department of Mathematics and Statistics. “There’s a push towards encouraging students with diverse backgrounds to increase their familiarity with data science topics.” 


“For students in economics or biology, for example, there’s a lot of [data science work] that they will see as they go forward through their courses or their careers,” Marlin elaborated. Many CS190F students come from the sciences and social science disciplines.


Instead of focusing on a single piece of software, students learn to build analytical code from scratch. This not only enables students to perform custom analyses, but helps them to understand how those analyses are carried out. 


The class moves quickly. Students begin by learning Python from scratch, along with the statistical analysis techniques that correspond to their new coding skills. Midway through the course, students know more than enough coding to get by, and in turn focus their time on statistical analysis techniques, including confidence intervals, regressions, and classifications. Students in last year’s class worked with nearest-neighbor classifications which, according to Marlin, are typically taught to junior-year computer science students.



Students in the CS190F also have the opportunity to use their skills on real-world data sets. According to Marlin, students work with data relevant to a variety of fields, including finance, political science and public health—in one exercise, analyzing the relationship between smoking and birth weight. Working with premade databases of real and “fake” news, students are taught to identify similarities and differences between the two.


The class, which ran for the first time in the fall of 2018, is based on the “Data 8” course developed at the University of California, Berkeley. Professors Marlin and Flaherty have worked since 2016 to bring the class to UMass.


The course materials are housed entirely online. A grant from Google for cloud compute time allows students to do coursework from anywhere. Access to UMass computer labs mean students don’t even need their own laptop.


The Data 8 format provides companion courses in other majors to help students “connect the dots” more quickly. Marlin is working with professors in other departments to bring those companion courses to UMass.


CS190F is geared towards freshmen, who will have a full three years of college to consider data science applications to their chosen field. Upperclassmen, who may have begun to consider the data science connections to their field, can still benefit from the Python instruction and advanced statistical techniques taught in the course.


There is a tentative plan to offer the course again in the Fall semester.