Partner: Media Cloud

Over the past few decades, the rapid expansion of digital media has transformed how information is shared and consumed. However, this growth presents challenges such as content moderation, misinformation detection, and addressing media bias. Categorizing articles by authors or agencies has become a critical step in tackling these issues, especially for both high- and low-resource settings.

This DS4CG project evaluated existing and newly implemented tools for extracting author names from news articles. Using Media Cloud’s article archive, 100 documents from 10 languages were sampled and annotated by volunteers fluent in each language, following guidelines developed with the Media Cloud team. A pipeline was designed to test these tools, and their performance was assessed using five NLP metrics.