University of Massachusetts Amherst

Search Google Appliance

Links

Demographic Dialectal Variation in Social Media and Structured Prediction Models for RNN based Sequence Labeling in Clinical Text

DS Tea
October 24, 4:00pm
Computer Science Building, Room 150/151

**** Please note that this event is on MONDAY rather than DS Tea's typical Tuesday meeting time ****
Where: Computer Science Building Rooms 150 & 151

Speaker: Su Lin Blodgett (MS/PhD student advised by Professor Brendan O'Connor) 
Title: Demographic Dialectal Variation in Social Media: A Case Study of African-American English 
Abstract: Though dialectal language is increasingly abundant on social media, few resources exist for developing NLP tools to handle such language. We conduct a case study of dialectal language in online conversational text by investigating African-American English (AAE) on Twitter. We propose a distantly supervised model to identify AAE-like language from demographics associated with geo-located messages, and we verify that this language follows well-known AAE linguistic phenomena. In addition, we analyze the quality of existing language identification and dependency parsing tools on AAE-like text, demonstrating that they perform poorly on such text compared to text associated with white speakers. We also provide an ensemble classifier for language identification which eliminates this disparity and release a new corpus of tweets containing AAE-like language. 


Speaker: Abhyuday Jagannatha (PhD student advised by Professor Hong Yu)
Title: Structured prediction models for RNN based sequence labeling in clinical text
Abstract: Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.


Speaker Bios:
 
Su Lin Blodgett is a second-year MS/PhD student advised by Dr. Brendan O’Connor in the Statistical Social Language Analysis Lab. Her research explores the use of statistical text analysis to answer social science questions, with a current focus on social media text. Before coming to UMass, she earned a B.A. in mathematics from Wellesley College. 

Abhyuday Jagannatha is a PhD student advised by Dr. Hong Yu in the BioNLP Lab at UMass CS. His work focuses on modeling various learning problems in the domain of natural language understanding, and information extraction. He currently works on Medical Text corpora, using Machine Learning to extract usable structured information from noisy unstructured text and for predictive modeling of patient health.