Generating Metrics for High-Performance Computing Clusters

  • Post category:Projects

Partner: Unity DS4CG 2023. Unity is a collaborative, multi-institutional high-performance computing cluster, primarily used for research computing. The Unity project focused on generating useful metrics and analysis for Unity by building a pipeline to a database that could power a live dashboard for Unity’s admin staff. Metrics included unnecessarily idle GPUs, daily and weekly node usage, total resource usage, and wait time. Additionally, a prediction model for wait time at job submission time was built.

Continue ReadingGenerating Metrics for High-Performance Computing Clusters

Detecting Extreme Speech in YouTube Videos

  • Post category:Projects

Partner: Media Cloud DS4CG 2023. With the current surge in multimodal media shared online, particularly from platforms like YouTube and Instagram, the need for multimodal hate speech detection systems has grown. We created an evaluation dataset of YouTube videos from specific types of content creators and compared how features from the audio and transcribed text in a video can be used to flag extreme speech using machine learning.

Continue ReadingDetecting Extreme Speech in YouTube Videos

Extracting Bylines from Media in Multiple Languages

  • Post category:Projects

Partner: Media Cloud DS4CG 2023. In an increasingly digital landscape for media across the globe, the need for content moderation, misinformation detection, and bias analysis has risen. In partnership with Media Cloud, we evaluated existing tools for extracting author names in news articles across 10 languages, which allow researchers to analyze media based on the author. We designed a pipeline to execute byline extraction from the evaluated tools and determined that a multistep approach of heuristic and machine learning models will lead to the best byline extraction tool.

Continue ReadingExtracting Bylines from Media in Multiple Languages

Analyzing Energy Usage with Predictive Modeling

  • Post category:Projects

Partner: Massachusetts Division of Capital Asset Management & Maintenance DS4CG 2023. DCAMM is responsible for managing resources in various state buildings like state hospitals, prisons, universities, community colleges, office buildings. This project analyzed 5 year energy usage of 279 utility meters in 23 academic buildings. Using this data, time-series prediction models were developed for 12-month energy consumption of various utilities (electricity, steam, natural gas, water) by building. Prediction is the first step towards data-driven efficient management of energy resources and energy conservation.

Continue ReadingAnalyzing Energy Usage with Predictive Modeling

Analysis of #StopAsianHate on Twitter

  • Post category:Projects

Partner: Co-Insights DS4CG 2023. This project takes a longitudinal approach to analyzing the #StopAsianHate hashtag on Twitter in order to understand changes in hashtag usage. We developed a model that converts text into embeddings, clusters the embeddings into groups, and links the similar groups to reveal data about context surrounding #StopAsianHate. By analyzing the main accounts driving conversation and identifying transitions in hashtags to discussions, we can better understand social media discussion mechanisms. 

Continue ReadingAnalysis of #StopAsianHate on Twitter

Tracking Bird Migration Patterns with Machine Learning

  • Post category:Projects

Partner: Environment Canada DS4CG 2023. Aerial insectivore populations have been declining in Canada, prompting the need for a model that can predict bird migration patterns. When a roost of birds take off together, they appear as a distinct shape in weather radar data. By adapting algorithms to fit Canadian weather radar data, we created a model that accurately predicts migration patterns. In addition, the model can even detect roosts that were previously missed by EC. 

Continue ReadingTracking Bird Migration Patterns with Machine Learning