Detecting Extreme Speech in YouTube Videos

  • Post category:Projects

Partner: Media Cloud DS4CG 2023. With the current surge in multimodal media shared online, particularly from platforms like YouTube and Instagram, the need for multimodal hate speech detection systems has grown. We created an evaluation dataset of YouTube videos from specific types of content creators and compared how features from the audio and transcribed text in a video can be used to flag extreme speech using machine learning.

Continue ReadingDetecting Extreme Speech in YouTube Videos

Extracting Bylines from Media in Multiple Languages

  • Post category:Projects

Partner: Media Cloud DS4CG 2023. In an increasingly digital landscape for media across the globe, the need for content moderation, misinformation detection, and bias analysis has risen. In partnership with Media Cloud, we evaluated existing tools for extracting author names in news articles across 10 languages, which allow researchers to analyze media based on the author. We designed a pipeline to execute byline extraction from the evaluated tools and determined that a multistep approach of heuristic and machine learning models will lead to the best byline extraction tool.

Continue ReadingExtracting Bylines from Media in Multiple Languages

Analysis of #StopAsianHate on Twitter

  • Post category:Projects

Partner: Co-Insights DS4CG 2023. This project takes a longitudinal approach to analyzing the #StopAsianHate hashtag on Twitter in order to understand changes in hashtag usage. We developed a model that converts text into embeddings, clusters the embeddings into groups, and links the similar groups to reveal data about context surrounding #StopAsianHate. By analyzing the main accounts driving conversation and identifying transitions in hashtags to discussions, we can better understand social media discussion mechanisms. 

Continue ReadingAnalysis of #StopAsianHate on Twitter

Reddit Map

  • Post category:Projects

Reddit Map is an open-source tool that makes navigating Reddit data easier by displaying clusters of communities with overlapping community members.

Continue ReadingReddit Map

Data Profiling for Fairness

  • Post category:Projects

As part of the Data Science for the Common Good summer program, students investigated potential issues in data-driven systems that contribute to algorithmic fairness, in order to help data practitioners understand and identify these issues in their data and the related machine learning tasks. 

Continue ReadingData Profiling for Fairness

Detecting Objects in the Murals of Pompeii

  • Post category:Projects

Associate Professor Eric Poehler of the UMass Classics department has thousands of photographic images of frescoed walls in Pompeii, the ancient city that was buried after the eruption of Mt. Vesuvius nearly 2,000 years ago. Each image has captions describing the objects included and other features. The Pompeii team will develop models to identify objects in the images, and then search images for objects that may not be mentioned in the captions. The team will also work on detecting unlabeled objects in images such as scaffolding or unwanted signage. The project results will vastly increase researchers’ ability to analyze and understand the archeology of Pompeii.

Continue ReadingDetecting Objects in the Murals of Pompeii