RECENT DS4CG PROJECTS
Analyzing Political News Framing: A Dashboard for Cross-Spectrum News Insights
iNaturalist GeoModel Annotation Tool
Generating Metrics for High-Performance Computing Clusters
Detecting Extreme Speech in YouTube Videos
Extracting Bylines from Media in Multiple Languages
Analyzing Energy Usage with Predictive Modeling
Analysis of #StopAsianHate on Twitter
Tracking Bird Migration Patterns with Machine Learning & AI
Integrating DISCount for Disaster Relief
Red Cross: Satellite Imagery for Disaster Assessment
Save the Whales
Supporting Electric Vehicle Planning in African Cities
The Nature Conservancy: Animal Image Detection for Wildlife Cameras
Georeferencing of Historical Imagery
Detecting and Counting of Herring Fish Species in Image and Video Data
ALL DS4CG PROJECTS
DS4CG 2024
iNaturalist
Partner: iNaturalist
iNaturalist enables people to record and share observations of nature, such as occurrences of any living organisms, with other users. This team helped improve Geomodels that predict species’ geographic ranges and deploy them in a way that is easy-to-use for biologists, ecologists, and citizen scientists.
Doctors Without Borders
Partner: Doctors Without Borders
Doctors Without Borders delivers emergency medical aid to people in crisis, with humanitarian projects in more than 70 countries. This team of DS4CG students explored how artificial intelligence tools such as computer vision, optical character recognition, and semantic search can aid in data collection from clinics and hospitals, assist improving data entry quality, and help remove obstacles to data entry.
MEDIA CLOUD
Partner: Media Cloud
Media Cloud, an open source media analysis platform, leveraged students’ NPL expertise to improve ways to study news, journalism, and social media across different languages and cultures.
DS4CG 2023
MEDIA CLOUD EXTREME SPEECH
Partner: Media Cloud
The team analyzed ways that extreme and hateful speech could be detected on YouTube, creating an evaluation dataset of YouTube videos from specific types of content creators and comparing how features from the audio and transcribed text in a video can be used to flag extreme speech using machine learning.
MEDIA CLOUD BYLINE EXTRACTION
Partner: Media Cloud
Media Cloud’s Search tool provides researchers with a way to analyze digital news content, seeing what topics are being covered around the globe, but due to the varied style and content of online news articles, they have not yet been able to include author information on their platform. The DS4CG project takes the first step to adding that feature, creating a dataset, evaluation framework and benchmarking the performance of existing tools for author and byline extraction.
CO-INSIGHTS
Partner: Co-Insights
This project completed a longitudinal analysis of the #StopAsianHate Twitter hashtag, investigating who used it, what topics of discussion and offline events it was associated with, supporting Co-Insights’ broader goal of studying the spread of misinformation in Asian-American digital communities and developing culturally appropriate interventions.
RED CROSS
Partner: Red Cross Netherlands
The focus of the second summer of our partnership with the Red Cross was to integrate DISCount, UMass Computer Vision Lab’s new approach to estimating counts in object detection, into the Red Cross’ workflow for counting damaged buildings after a disaster, saving time and effort in determining the disaster’s impact and severity, and ultimately helping the Red Cross deliver aid more quickly.
ROOST CANADA
Partner: Environment and Climate Change Canada
The Roost Canada project stemmed from an ongoing effort in the US to scale a specialized detection algorithm that identifies roosts of birds from RADAR data. This research plays a crucial role in studying declining bird populations. Supported by Environment and Climate Change Canada, who supplied the RADAR datasets, the project’s primary challenge was to effectively adapt and process the Canadian data formats to the detection algorithm and qualitatively analyze the performance of the algorithm.
MASSACHUSETTS DIVISION OF ASSET MANAGEMENT AND MAINTENANCE
Partner: Massachusetts Division of Asset Management and Maintenance
DCAMM is responsible for managing resources in various state buildings like state hospitals, prisons, universities, community colleges, office buildings. This project analyzed 5 year energy usage of 279 utility meters in 23 academic buildings. Using this data, time-series prediction models were developed for 12-month energy consumption of various utilities (electricity, steam, natural gas, water) by building. Prediction is the first step towards data-driven efficient management of energy resources and energy conservation.
UNITY
Partner: Unity
Unity is a collaborative, multi-institutional high-performance computing cluster, primarily used for research computing. The Unity project focused on generating useful metrics and analysis for Unity by building a pipeline to a database that could power a live dashboard for Unity’s admin staff. Metrics included unnecessarily idle GPUs, daily and weekly node usage, total resource usage, and wait time. Additionally, a prediction model for wait time at job submission time was built.
DS4CG 2022
MASS CHALLENGE
Partner: MassChallenge
MassChallenge is a non-profit startup accelerator with over 10 years of experience connecting innovative entrepreneurs with community partnerships and financial resources. In this partnership with MassChallenge, students create models to predict applicant success and provide feedback to the entrepreneurs in order to improve their odds of a successful application. Another goal of the project is to develop a recommendation model that provides a “dating app-style” mentor-mentee matching for startups registered in a MassChallenge program.
MEDIA ECOSYSTEMS ANALYSIS GROUP
Partner: Media Ecosystems Analysis Group
The Media Ecosystems Analysis Group performs quantitative media research across digital spaces, and supports multiple non-profits to leverage media insights. Students on this team are developing a pipeline for hate speech detection in videos using text transcribed from YouTube, which has become notorious for harboring online hate speech and harmful content. Integrating state-of-the-art natural language processing and information retrieval techniques, they will design an annotation process and a classifier model pipeline for categorizing videos for anti-immigrant extreme speech with highly interpretable, interactive dashboards.
UMASS RESCUE LAB
Partner: UMass Rescue Lab
UMass Rescue Lab is the premier computer science research group focused on rescuing children from internet-based victimization. Students on this team will create a web application that helps parents protect their children from technology-driven exploitation. The project uses application reviews to identify potential dangers to children in the forms of exploitation and abuse through state-of-the-art NLP techniques and interactive data visualizations.
DATA PROFILING FOR FAIRNESS
Algorithmic fairness is one of the most important topics in the last decade of data science, and has attracted significant attention from industry and academia. Students on this team will investigate potential issues in data-driven systems that contribute to algorithmic fairness, in order to help data practitioners understand and identify these issues in their data and the related machine learning tasks.
RED CROSS
Partner: Red Cross Netherlands
Timely reliable damage assessment of buildings and infrastructure in the wake of natural disasters is crucial to enable governments to make emergency declarations, and organize response and recovery efforts. To aid this effort, student on this project leverage modern machine learning techniques to rapidly analyze the before and after satellite images of affected areas to assess damage. Together with the Red Cross, the goal is to build a robust tool that is adaptable to detecting buildings and inflicted damage across the globe.
HERRING PROJECT
Partner: MIT Sea Grant group
Accurate and efficient stock assessment methods of commercially relevant fish species are extremely important toward sustainable fisheries management. Currently used manual techniques are highly inefficient, time-consuming, and not incredibly accurate. Students in this group will partner with the MIT Sea Grant group to automate the detection and counting of herring fish species in image and video data for efficient fishery management. The goal is to build an end-to-end platform that takes video inputs, applies state-of-the-art computer vision techniques, and outputs count of herring fishes moving upstream.
SCHOLARSHIP AMERICA
Partner: Scholarship America
Scholarship America manages the scholarship process by mediating between donors and recipients. Students on this project will apply machine learning methods within supervised learning to answer questions on graduation outcomes of scholarship recipients.
DS4CG 2021
Georeferencing of Historical Imagery
Partner: UMass Libraries and UMass Department of Environmental Conservation
Aerial images can provide useful insights into how land and waterways change over time. To be able to compare images from different sources and time periods, georeferencing (identifying the exact latitude and longitude) is required. This time-consuming task can currently only be done by trained human analysts. Master’s students Collin Giguere and Sowmya Vasuki Jallepalli developed an image-processing system to automatically georeference historical images.
Save the Whales
Partners: Aarhus Institute of Advanced Studies, UMass Amherst Biology department, and Woods Hole Oceanographic Institution
When whales are encountered near the surface of the ocean, it can be useful to be able to quickly assess their health, for conservation, evaluation, and other purposes. Master’s students Chhandak Bagchi and Gizem Cicekli developed a tool for identifying a whale from an aerial image, automatically marking its head and tail, and estimating its size.
Analyzing the Influence of Social Determinants on COVID-19 Indicators in Massachusetts
Partner: MA Department of Public Health
The MA Department of Public Health is responsible for overseeing the COVID-19 response in Massachusetts, including vaccination rates. Master’s students Anushka Basu, Disha Singh, and Ian Birle created a tool that takes disparate data and merges it into a comprehensive dataset that the Massachusetts Department of Public Health can use to analyze vaccine uptake trends.
Improving the Regional Transport Network
Partner: Pioneer Valley Transit Authority
The Pioneer Valley Transit Authority (PVTA) is the largest regional transport authority in Massachusetts. Master’s students Anuksha Basu, Ian Birle, and Disha Singh developed models to help the PVTA determine how micro-transit (smaller scale vehicles that run on demand) can more effectively serve areas with low ridership.
Supporting Electric Vehicle Planning in African Cities
Partner: World Resources Institute
The Word Resources Institute is a global research organization that works on development issues such as energy efficiency and climate change. There is often a lack of critical data for developing countries on topics such as vehicle purchases and usage patterns. Master’s student Tanmay Agrawal and doctoral student Bob Muhwezi developed a tool to analyze aerial images to determine vehicle usage and density.
Tracking Threats to Digital Security
Partner: The Center for Digital Resilience
The Center for Digital Resilience (CDR) is a nonprofit organization that provides cybersecurity assistance to activists and nongovernmental organizations. Master’s students Paige Gulley and Virginia Partridge developed models to automatically analyze and categorize reports of cybersecurity incidents, as well as redact sensitive information, to aid CDR in sharing threat intelligence data with a global network of collaborators.
DS4CG 2020 Projects
Appalachian Mountain Club
AMC is one of the leading science-based environmental conservation organizations on the East Coast and has ambitious goals to reduce its carbon footprint. The AMC project focuses on developing and testing new methods of measuring and predicting carbon emissions associated with guest travel and operation of AMC facilities, such as lodges, camps, cabins, and staff quarters. The fellows will develop models and visualizations of guest usage and staff operations, and implement prototype software modules that support decision-making related to energy savings, enabling AMC to both reduce and offset their carbon footprint in a data-driven manner.
AuCode
AuCoDe is a startup company that automatically detects and analyzes online controversies. The AuCoDe team will explore applications of controversy detection technology on misinformation surrounding the COVID-19 pandemic found in public forums. By examining news coverage and public social media discourse using AuCoDe’s controversy detection technology, the DS4CG team will identify signals to detect, track, and understand the dynamics of coronavirus-related misinformation online and better inform the public.
Department of Veterans Affairs
The Department of Veterans Affairs (VA) operates one of the largest integrated healthcare systems in the United States. While it maintains a large repository of electronic health records, this raw data does not lend itself to analysis and research purposes. The VA project will focus on developing and validating algorithms to automatically extract characteristics and features from the data, such as diseases, treatments, and biomarkers. The project may also involve processing narrative data from physician notes. This work will help the VA generate insights into the treatment and care of patients.
UMass Classics: Pompeii
Associate Professor Eric Poehler of the UMass Classics department has thousands of photographic images of frescoed walls in Pompeii, the ancient city that was buried after the eruption of Mt. Vesuvius nearly 2,000 years ago. Each image has captions describing the objects included and other features. The Pompeii team will develop models to identify objects in the images, and then search images for objects that may not be mentioned in the captions. The team will also work on detecting unlabeled objects in images such as scaffolding or unwanted signage. The project results will vastly increase researchers’ ability to analyze and understand the archeology of Pompeii.
DS4CG 2019 Projects
Charles River Watershed
The Charles River Watershed (CRWA) team analyzed 25 years’ worth of water-quality data collected by citizen scientists, to identify patterns and trends in the levels of E. coli, phosphorus, and chlorophyll. The results will inform policies that promote responsible watershed management and a healthy river ecosystem.
Greater Holyoke YMCA
The Greater Holyoke YMCA team analyzed membership and program participation data in order to predict membership churn. The results will help predict members at risk of dropping their membership, giving the YMCA the opportunity to proactively assess and respond.
Massachusetts Department of Public Health
The Massachusetts Department of Public Health (DPH) team aggregated data from disparate sources to develop risk assessment scores at the city/town level. The results will help the DPH deploy resources more effectively and efficiently, in areas that need them most.
Metropolitan Area Planning Council
The Metropolitan Area Planning Council (MAPC) team worked on enhancing a forecasting technique originally developed at Northeastern University to project scenarios about future populations, enhancing its efficiency, ease of use, and breadth of applicability. The results will help MAPC to better serve the cities and towns that rely on their expertise for municipal planning.
The Nature Conservancy
The Nature Conservancy (TNC) team devised a tool using a computer-vision algorithm to automatically detect whether or not a photograph contains an animal. TNC can apply this tool to photographs captured by remote motion-sensitive cameras placed in the wild, in order to monitor wildlife corridors and better guard against animal-vehicle collisions on roadways.
Springfield Public Schools
The Springfield Public Schools (SPS) team combined student data with college-enrollment data from a national clearinghouse to identify factors contributing to identify factors contributing to post-secondary school success.The results will help SPS fulfill their mission of graduating students who are college and career ready.
Thank you to our program sponsors!
Interested in learning about other projects that CDS works on?