The University of Massachusetts Amherst
University of Massachusetts Amherst

Search Google Appliance


Graduate Students Spend the Summer Tackling Common Good Data Science Problems

Once again this summer, teams of graduate students participated in the center’s Data Science for the Common Good (DS4CG) program and tackled a variety of interesting and challenging data science problems. Working remotely proved no hindrance to the teams’ abilities to deliver quality analysis and insight for our partners. CDS is grateful to our industry affiliates, who provided volunteer professional data scientists to advise our teams throughout the summer. It was a great opportunity for our industry partners to participate in common good projects while being exposed to domains and data science problems outside the normal scope of their day-to-day business problems.


The Appalachian Mountain Club was faced with the problem of measuring the carbon footprint of their operations, including guest travel and stays at their 24 eco-lodging facilities scattered throughout the mid-Atlantic states. The team, led by CDS Executive Director Brant Cheikes, drew on the experience of CICS graduate students. as well as a graduate student from the UMass Clean Energy Extension who provided domain expertise quantifying greenhouse gas (GHG) emissions. The team built and delivered a prototype software system which could ingest data from AMC’s guest reservation system to estimate the GHG footprint associated with guest travel. In addition, the team built a second analytic tool prototype in Excel for AMC to use to estimate the GHG footprint of its core operations. Together, these tools will help AMC develop new ways to offset its GHG emissions, for example, by offering guests the option to purchase carbon offsets. The analysis and results provide a novel framework for customer-facing institutions to begin taking steps toward net-zero carbon emissions while keeping the customer an active participant in the process.


Funded by an NSF grant, a DS4CG team engaged with AuCoDe, a startup that is developing technology to identify and analyze online controversy and misinformation. Led by CDS Technical Director of Community Initiatives Matthew Rattigan, the team worked to develop prototype predictive models for detecting COVID-19 related misinformation found in public forums such as Twitter. Drawing on cutting edge natural language processing techniques paired with AuCoDe’s existing previously developed methodologies, the team was able to successfully model several common subjects of pandemic-related misinformation. Additionally, the group helped to develop semi-supervised methodologies for labeling data in domains where no ground truth exists. The work will continue in the form of an independent study overseen by Rattigan and CICS Faculty Chair James Allan.  


The team supporting the Boston branch of the U.S. Department of Veteran’s Affairs adapted to changing data availability related to COVID-19. Unable to obtain the required in-person data clearance, the team, led by CDS Senior Data Scientist Tom Bernardin quickly pivoted to a readily available oncology dataset. They developed a range of predictive models for disease progression in multiple myeloma patients. Using recently developed neural network architectures for proportional hazard models, the team was able to improve on existing benchmarks in the literature. They delivered a set of models along with an early draft of a paper that is currently being revised for publication.


The Pompeii Artistic Landscape Project, headed by UMass classics Associate Professor Eric Poehler,  worked with a team of graduate students with expertise in computer vision. The team, also led by Senior Data Scientist Tom Bernardin, solved several critical data challenges that were inhibiting progress on the publication and dissemination of thousands of images of the ancient city of Pompeii for use by researchers around the world. They developed a novel approach to match thousands of high resolution images without metadata to low resolution images containing the missing information using convolutional neural networks. Separately, the team also trained a neural network to detect images that were not suitable for publication. The results of the team’s models have been incorporated into the larger project’s back-end system and workflow.


The DS4CG program is actively soliciting proposals from common good partners for projects to be performed in Summer 2021.  If you would like to submit a proposal, or know of an organization that might be interested in working with us, please send us an email.  We are looking forward to another round of challenging projects, talented students, and lively participation from our industry partners next summer.