The University of Massachusetts Amherst
University of Massachusetts Amherst

Search Google Appliance


Data Science for the Common Good

The Center for Data Science partners with public interest organizations to help solve problems through Data Science for the Common Good (DS4CG).

Jump to this year's projects.


How DS4CG Works

For each DS4CG project, computer science graduate students are selected to participate in a collaborative engagement, fully funded by CDS. The students work in small teams at UMass on a defined project under the supervision of CDS faculty and communicate regularly with partner organizations. 


For Potential Partners

Is my organization a good fit for a data science project? 

There are no fixed criteria for determining whether an organization is right for the program. If you are looking for innovative ways to address a challenging problem using data, we can probably help.


What exactly do you mean by ‘data science’? 

Data science is the practice of collecting, manipulating, and distilling data to produce insight and drive decision making. DS4CG projects can take many forms, with a variety of deliverables — common tasks include predictive modeling, data clustering, and anomaly detection. 


How can I find out more? 

The first step is to set up a conversation with us to discuss your organization, the types of data you collect, and the strategic challenges you face. Working together, we can help you identify and scope potential project ideas. Once we have identified a project outline, we will identify a team and map out project milestones and deliverables.  


Contact us: 



Current Projects

For summer of 2019 we are sponsoring projects with the following partners:



The Nature Conservancy in Massachusetts

This project applies automated image classification methods to filter out non-useful images captured by wildlife cameras. Drawing on a corpus of previously tagged, or “labeled” images, the team will construct a statistical model that captures the concept of a “useful” image using supervised learning techniques commonly applied to computer vision applications.


Once an accurate model has been learned, it can be used to probabilistically classify new images and determine (within reasonable accuracy bounds) whether the image is useful for further study or inclusion. By filtering the captured wildlife images in this manner, we hope to drastically reduce the amount of manual inspection required to process the automated images being captured going forward. Lastly, we will work with The Nature Conservancy domain experts to determine the most useful way to retrain and deploy the image classifier for future use. This might include the creation of a web-based tool or other user interface that will facilitate bulk-classification of large groups of new images.


Springfield Public Schools

The UMass team will perform exploratory data analysis for the benefit of Springfield Public Schools, including the creation of predictive models, data clustering, and anomaly detection methods. The team will focus on understanding the factors that contribute to post-secondary school success, as well as identifying possible interventions throughout student careers.


Charles River Watershed Association

The Charles River Watershed Association (CRWA) works with government agencies and private citizens toward the restoration and protection of the Charles. Through their work, CRWA’s scientists inform policies that promote responsible  watershed management and a healthy river ecosystem. This project will utilize statistical analysis for predicting different aspects of water quality, based on historical measurements taken by a network of citizen scientists. The team will explore different approaches to longitudinal analysis that will illustrate how water measures in different localities are changing over time as well as the statistical relationships between locations.


Greater Holyoke YMCA

As the population of Holyoke has changed over time, the Greater Holyoke YMCA (Holyoke Y) has sought to evolve with it. For this project, we will investigate the drivers of Holyoke Y membership and program participation through exploratory data  analysis and visualization. One of the key areas of investigation will involve modeling of different membership subgroups with the goal of predicting membership churn and identifying possible interventions.


After gaining some insight into the drivers of membership and participation for the Holyoke Y, the team will expand its analysis to include data from other YMCA localities, in order to compare and contrast YMCA offerings in different communities and gauge the effectiveness of different program offering and membership structures.


Metropolitan Area Planning Council

The Metropolitan Area Planning Council (MAPC) is the regional planning agency for the 101 cities and towns that make up Greater Boston. Recently, MAPC collaborated with Northeastern University to develop innovative sample reweighting techniques to make better use of data resources available through the US Census Bureau. This project will build off the previous work to create a robust and versatile scenario analysis tool, in order to better forecast characteristics of the future population and workforce and their effects on housing and transportation needs.


Massachusetts Department of Public Health

Using a variety of public health data maintained by the Massachusetts Department of Public Health, we will help create robust and versatile risk assessment scores for different communities. These scores might be risk-specific (e.g., diabetes, HIV), or rolled into a “community health index” to indicate the overall health assessment of the locality. The primary goal of this effort is to improve current scoring techniques and facilitate the discovery of actionable insights.