DS4CG 2022 Projects

Mass Challenge

Partner: MassChallenge

MassChallenge is a non-profit startup accelerator with over 10 years of experience connecting innovative entrepreneurs with community partnerships and financial resources. In this partnership with MassChallenge, students create models to predict applicant success and provide feedback to the entrepreneurs in order to improve their odds of a successful application. Another goal of the project is to develop a recommendation model that provides a “dating app-style” mentor-mentee matching for startups registered in a MassChallenge program.

Media Ecosystems Analysis Group

Partner: Media Ecosystems Analysis Group

The Media Ecosystems Analysis Group performs quantitative media research across digital spaces, and supports multiple non-profits to leverage media insights. Students on this team are developing a pipeline for hate speech detection in videos using text transcribed from YouTube, which has become notorious for harboring online hate speech and harmful content. Integrating state-of-the-art natural language processing and information retrieval techniques, they will design an annotation process and a classifier model pipeline for categorizing videos for anti-immigrant extreme speech with highly interpretable, interactive dashboards.

UMass Rescue Lab

Partner: UMass Rescue Lab

UMass Rescue Lab is the premier computer science research group focused on rescuing children from internet-based victimization. Students on this team will create a web application that helps parents protect their children from technology-driven exploitation. The project uses application reviews to identify potential dangers to children in the forms of exploitation and abuse through state-of-the-art NLP techniques and interactive data visualizations.

Data Profiling for Fairness

Algorithmic fairness is one of the most important topics in the last decade of data science, and has attracted significant attention from industry and academia. Students on this team will investigate potential issues in data-driven systems that contribute to algorithmic fairness, in order to help data practitioners understand and identify these issues in their data and the related machine learning tasks. 


Partner: Red Cross Netherlands

Timely reliable damage assessment of buildings and infrastructure in the wake of natural disasters is crucial to enable governments to make emergency declarations, and organize response and recovery efforts. To aid this effort, student on this project leverage modern machine learning techniques to rapidly analyze the before and after satellite images of affected areas to assess damage. Together with the Red Cross, the goal is to build a robust tool that is adaptable to detecting buildings and inflicted damage across the globe.


Herring Project

Partner: MIT Sea Grant group

Accurate and efficient stock assessment methods of commercially relevant fish species are extremely important toward sustainable fisheries management. Currently used manual techniques are highly inefficient, time-consuming, and not incredibly accurate. Students in this group will partner with the MIT Sea Grant group to automate the detection and counting of herring fish species in image and video data for efficient fishery management. The goal is to build an end-to-end platform that takes video inputs, applies state-of-the-art computer vision techniques, and outputs count of herring fishes moving upstream.

 Scholarship America

Partner: Scholarship America

Scholarship America manages the scholarship process by mediating between donors and recipients. Students on this project will apply machine learning methods within supervised learning to answer questions on graduation outcomes of scholarship recipients.

DS4CG 2021 Projects

Georeferencing of Historical Imagery

Partner: UMass Libraries and UMass Department of Environmental Conservation

Aerial images can provide useful insights into how land and waterways change over time. To be able to compare images from different sources and time periods, georeferencing (identifying the exact latitude and longitude) is required. This time-consuming task can currently only be done by trained human analysts. Master’s students Collin Giguere and Sowmya Vasuki Jallepalli developed an image-processing system to automatically georeference historical images.

Save the Whales

Partners: Aarhus Institute of Advanced Studies, UMass Amherst Biology department, and Woods Hole Oceanographic Institution

When whales are encountered near the surface of the ocean, it can be useful to be able to quickly assess their health, for conservation, evaluation, and other purposes. Master’s students Chhandak Bagchi and Gizem Cicekli developed a tool for identifying a whale from an aerial image, automatically marking its head and tail, and estimating its size.

Analyzing the Influence of Social Determinants on COVID-19 Indicators in Massachusetts

Partner: MA Department of Public Health

The MA Department of Public Health is responsible for overseeing the COVID-19 response in Massachusetts, including vaccination rates. Master’s students Anushka Basu, Disha Singh, and Ian Birle created a tool that takes disparate data and merges it into a comprehensive dataset that the Massachusetts Department of Public Health can use to analyze vaccine uptake trends.

Improving the Regional Transport Network

Partner: Pioneer Valley Transit Authority

The Pioneer Valley Transit Authority (PVTA) is the largest regional transport authority in Massachusetts. Master’s students Anuksha Basu, Ian Birle, and Disha Singh developed models to help the PVTA determine how micro-transit (smaller scale vehicles that run on demand) can more effectively serve areas with low ridership.

Supporting Electric Vehicle Planning in African Cities

Partner: World Resources Institute

The Word Resources Institute is a global research organization that works on development issues such as energy efficiency and climate change. There is often a lack of critical data for developing countries on topics such as vehicle purchases and usage patterns. Master’s student Tanmay Agrawal and doctoral student Bob Muhwezi developed a tool to analyze aerial images to determine vehicle usage and density.

Tracking Threats to Digital Security

Partner: The Center for Digital Resilience

The Center for Digital Resilience (CDR) is a nonprofit organization that provides cybersecurity assistance to activists and nongovernmental organizations. Master’s students Paige Gulley and Virginia Partridge developed models to automatically analyze and categorize reports of cybersecurity incidents, as well as redact sensitive information, to aid CDR in sharing threat intelligence data with a global network of collaborators.

DS4CG 2020 Projects

Appalachian Mountain Club

AMC is one of the leading science-based environmental conservation organizations on the East Coast and has ambitious goals to reduce its carbon footprint. The AMC project focuses on developing and testing new methods of measuring and predicting carbon emissions associated with guest travel and operation of AMC facilities, such as lodges, camps, cabins, and staff quarters. The fellows will develop models and visualizations of guest usage and staff operations, and implement prototype software modules that support decision-making related to energy savings, enabling AMC to both reduce and offset their carbon footprint in a data-driven manner.


AuCoDe is a startup company that automatically detects and analyzes online controversies. The AuCoDe team will explore applications of controversy detection technology on misinformation surrounding the COVID-19 pandemic found in public forums. By examining news coverage and public social media discourse using AuCoDe’s controversy detection technology, the DS4CG team will identify signals to detect, track, and understand the dynamics of coronavirus-related misinformation online and better inform the public.

Department of Veterans Affairs

The Department of Veterans Affairs (VA) operates one of the largest integrated healthcare systems in the United States. While it maintains a large repository of electronic health records, this raw data does not lend itself to analysis and research purposes. The VA project will focus on developing and validating algorithms to automatically extract characteristics and features from the data, such as diseases, treatments, and biomarkers. The project may also involve processing narrative data from physician notes. This work will help the VA generate insights into the treatment and care of patients.

UMass Classics: Pompeii

Associate Professor Eric Poehler of the UMass Classics department has thousands of photographic images of frescoed walls in Pompeii, the ancient city that was buried after the eruption of Mt. Vesuvius nearly 2,000 years ago. Each image has captions describing the objects included and other features. The Pompeii team will develop models to identify objects in the images, and then search images for objects that may not be mentioned in the captions. The team will also work on detecting unlabeled objects in images such as scaffolding or unwanted signage. The project results will vastly increase researchers’ ability to analyze and understand the archeology of Pompeii.

DS4CG 2019 Projects

The Charles River Watershed (CRWA) team analyzed 25 years’ worth of water-quality data collected by citizen scientists, to identify patterns and trends in the levels of E. coli, phosphorus, and chlorophyll. The results will inform policies that promote responsible watershed management and a healthy river ecosystem.

The Greater Holyoke YMCA team analyzed membership and program participation data in order to predict membership churn. The results will help predict members at risk of dropping their membership, giving the YMCA the opportunity to proactively assess and respond.

The Massachusetts Department of Public Health (DPH) team aggregated data from disparate sources to develop risk assessment scores at the city/town level. The results will help the DPH deploy resources more effectively and efficiently, in areas that need them most.

The Metropolitan Area Planning Council (MAPC) team worked on enhancing a forecasting technique originally developed at Northeastern University to project scenarios about future populations, enhancing its efficiency, ease of use, and breadth of applicability. The results will help MAPC to better serve the cities and towns that rely on their expertise for municipal planning.

The Nature Conservancy (TNC) team devised a tool using a computer-vision algorithm to automatically detect whether or not a photograph contains an animal. TNC can apply this tool to photographs captured by remote motion-sensitive cameras placed in the wild, in order to monitor wildlife corridors and better guard against animal-vehicle collisions on roadways.

The Springfield Public Schools (SPS) team combined student data with college-enrollment data from a national clearinghouse to identify factors contributing to identify factors contributing to post-secondary school success.The results will help SPS fulfill their mission of graduating students who are college and career ready.

Thank you to our program sponsors!