Computer science graduate students Haw-Shiuan Chang, Amol Agrawal, Ananya Ganesh, Anirudha Desai and Vinayak Mathur, with professor Andrew McCallum, director of the Center for Data Science, and Alfred Hough, lead artificial intelligence (AI) researcher at Lexalytics Inc. of Boston, presented a paper at the annual conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies on June 6 in New Orleans.
Chang delivered the paper, “Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings,” which describes a set of algorithms for natural language processing that match or exceed the state of the art on several evaluation tasks. The algorithms offer much more computational efficiency at discovering the multiple meanings of words that have several meanings, such as “bank,” from a large body of unannotated text, the researchers point out.
The algorithms can also assign each meaning a “word embedding,” that is, a numeric representation of its meaning, and provide related words for each sense to make the meanings clear. Finally, the new algorithms can identify the correct sense of a word that has many meanings in new sentences.
Lexalytics says the new approach to word-sense induction comes from its Magic Machines AI Labs launched in 2017 in partnership with the Center for Data Science and Northwestern University’s Medill School of Journalism, Media and Integrated Marketing Communications.
The Center for Data Science, begun in 2015 in the College of Information and Computer Sciences, is nationally recognized for its research activities, and has one of the highest ranked and most competitive graduate research programs in the nation. Industry partners include companies such as Amazon, Google, IBM, MassMutual, Microsoft, Oracle and Pratt & Whitney, plus a growing number of local startups. Its research initiatives include work in areas that include health care, education, workforce analytics, energy, agriculture, conservation, machine learning, information integration, scalable systems, computer vision and human language technology.
Lexalytics, which processes billions of words every day in more than 20 languages, is a leader in translating text for social media monitoring, reputation management and voice of the customer programs. Its Intelligence Platform uses leading AI and natural language processing to allow users to quickly, easily and cost-effectively create custom analytics solutions to address their data problems.