University of Massachusetts Amherst

Search Google Appliance


Data Science Foundations

Machine learning––the synthesis of computer science algorithms, statistics and optimization––  addresses the problem of using data to automatically improve a program’s behavior, and is at the heart of many information technologies. Our faculty have invented state-of-the-art ML methods in learning for sequence labeling, face recognition, recommendation systems, causality, and decision- making under uncertainty.  Our machine learning software is used by hundreds of companies, including Google, Oracle, Yahoo and IBM.  Current work includes deep learning, interactive learning, decision processes, graphical models, topic models, probabilistic databases, and large-scale parallel-distributed machine learning.  We have strong connections to Math & Statistics.


Data gathering and interpretation––a challenging prerequisite to data analysis in many applications.  Our faculty have been critically important to the development of methods for leveraging wearable health sensors, smart metering, radar, RFID, web spidering, and social media.  We also have invented state-of-the-art methods of understanding and interpreting complex data in human natural language, images, video, social networks, and temporal and relational data.  Our open-source software for natural language processing is used by Oracle and hundreds of other companies.  


Data quality and cleaning.  Poor data quality––due to faulty sensing, unreliable sources, and integration conflicts––is estimated to cost the US economy more than $600 billion per year.  UMass researchers have developed methods to boost the effectiveness and reliability of data-driven systems by automatically identifying, repairing, and diagnosing the causes of data errors at scale, as well as disambiguation/entity resolution, data fusion, and managing uncertainty in data––all at large scale.


Data protection and privacy.  Because much of data science is focused on data collected about individuals, data protection and privacy are key concerns.  If data is used improperly, or released inappropriately, it can cause significant harm to individuals.  UMass faculty have developed security tools for managing big data, and have developed state-of-the-art algorithms for studying collections of data in a manner which respects individual privacy.


Big and fast data analytics. As data is being generated at an unprecedented rate in enterprise businesses and scientific applications, there is a strong demand for analyzing data at scale while producing timely (real-time) answers and insights. Our faculty have developed new data processing infrastructures, leveraging new hardware trends, such as multicore CPUs and inexpensive computer clusters, and have developed new software solutions, ranging from parallel-distributed processing, to low-latency analytics, and to advanced operators to support data exploration and visualization.


Theory for Data Science helps us understand fundamental limitations as well as surprising provable guarantees and capabilities of new algorithms for processing data.  UMass faculty have developed precise theory for processing massive data sets and data streams, clustering, approximation algorithms, and coding.  We also develop algorithms for large-scale analytics, including pattern matching, graph problems, and resource allocation.