University of Massachusetts Amherst

Search Google Appliance


Masters Concentration in Data Science

The Computer Science Masters with a Concentration in Data Science was created to help meet the need for expanded and enhanced training in the area of data science. It requires coursework in Theory for Data Science, Systems for Data Science, Data Analysis and Statistics.


Aerial photo of computer science buildingThe Masters Concentration in Data Science teaches you to develop and apply methods to collect, curate, and analyze large-scale data, and to make discoveries and decisions using those analyses.


Requirements and Admissions


Who should apply?

Students require a bachelor’s degree and a solid undergraduate background in computer science.




The Masters Degree is a total of 30 credits and is usually completed in two years.  Four Data Science core courses (12 credits) including one each from the areas of Theory for Data Science, Systems for Data Science, and Data Analysis, and one additional core course from any area. Two courses (6 credits) taken from among a set of courses designated as satisfying the Data Science Elective requirement. One course (3 credits) taken from among a set of courses satisfying the Data Science Probability and Statistics requirement.  



Useful Links

The full-time graduate program admission deadlines are:

  • October 1 for Spring enrollment (Master's Program only)
  • December 15 for Fall enrollment

Courses offered Fall 2018

COMPSCI 514: Algorithms for Data Science

With the advent of social networks, ubiquitous sensors, and large-scale computational science, data scientists must deal with data that is massive in size, arrives at blinding speeds, and often must be processed within interactive or quasi-interactive time frames. This course studies the mathematical foundations of big data processing, developing algorithms and learning how to analyze them. We explore methods for sampling, sketching, and distributed processing of large scale databases, graphs, and data streams for purposes of scalable statistical description, querying, pattern mining, and learning. Was COMPSCI 590D. Undergraduate Prerequisites: COMPSCI 240 and COMPSCI 311. 3 credits

COMPSCI 520/620: Software Engineering: synthesis and development

Introduces students to the principal activities involved in developing high-quality software systems in a variety of application domains. Topics include: requirements analysis, formal and informal specification methods, process definition, software design, testing, and risk management. The course will pay particular attention to differences in software development approaches in different contexts.

COMPSCI 585: Introduction to Natural Language Processing

Natural Language Processing (NLP) is the engineering art and science of how to teach computers to understand human language.  NLP is a type of artificial intelligence technology, and it's now ubiquitous -- NLP lets us talk to our phones, use the web to answer questions, map out discussions in books and social media, and even translate between human languages.  Since language is rich, subtle, ambiguous, and very difficult for computers to understand, these systems can sometimes seem like magic -- but these are engineering problems we can tackle with data, math, machine learning, and insights from linguistics.  This course will introduce NLP methods and applications including probabilistic language models, machine translation, and parsing algorithms for syntax and the deeper meaning of text.  During the course, students will (1) learn and derive mathematical models and algorithms for NLP; (2) become familiar with basic facts about human language that motivate them, and help practitioners know what problems are possible to solve; and (3) complete a series of hands-on projects to implement, experiment with, and improve NLP models, gaining practical skills for natural language systems engineering

COMPSCI 589: Machine Learning

This course will introduce core machine learning models and algorithms for classification, regression, clustering, and dimensionality reduction. On the theory side, the course will focus on understanding models and the relationships between them. On the applied side, the course will focus on effectively using machine learning methods to solve real-world problems with an emphasis on model selection, regularization, design of experiments, and presentation and interpretation of results. The course will also explore the use of machine learning methods across different computing contexts including desktop, cluster, and cloud computing. The course will include programming assignments, a midterm exam, and a final project. Python is the required programming language for the course.

COMPSCI 590S: Systems for Data Science

In this course, students will learn the fundamentals behind large-scale systems in the context of data science. We will cover the issues involved in scaling up (to many processors) and out (to many nodes) parallelism in order to perform fast analyses on large datasets. These include locality and data representation, concurrency, distributed databases and systems, performance analysis and understanding. We will explore the details of existing and emerging data science platforms, including map-reduce and graph analytics systems like Hadoop and Apache Spark.

COMPSCI 611: Advanced Algorithms

Principles underlying the design and analysis of efficient algorithms. Topics to be covered include: divide-and-conquer algorithms, graph algorithms, matroids and greedy algorithms, randomized algorithms, NP-completeness, approximation algorithms, linear programming.

COMPSCI 646: Information Retrieval

The course will cover basic and advanced techniques for text-based information systems. Topics covered include retrieval models, indexing and text representation, browsing and query reformulation, data-intensive computing approaches, evaluation, and issues surrounding implementation. The course will include a substantial project such as the implementation of major elements of search engines and applications.

COMPSCI 682: Neural Networks

This course will focus on modern, practical methods for deep learning. The course will begin with a description of simple classifiers such as perceptrons and logistic regression classifiers, and move on to standard neural networks, convolutional neural networks, and some elements of recurrent neural networks, such as long short-term memory networks (LSTMs). The emphasis will be on understanding the basics and on practical application more than on theory. Most applications will be in computer vision, but we will make an effort to cover some natural language processing (NLP) applications as well, contingent upon TA support. The current plan is to use Python and associated packages such as Numpy and TensorFlow. Prerequisites include Linear Algebra, Probability and Statistics, and Multivariate Calculus. Some assignments will be in Python and some in C++. 3 credits.

COMPSCI 689: Machine Learning

Machine learning is the computational study of artificial systems that can adapt to novel situations, discover patterns from data, and improve performance with practice. This course will cover the popular frameworks for learning, including supervised learning, reinforcement learning, and unsupervised learning. The course will provide a state-of-the-art overview of the field, emphasizing the core statistical foundations. Detailed course topics: overview of different learning frameworks such as supervised learning, reinforcement learning, and unsupervised learning; mathematical foundations of statistical estimation; maximum likelihood and maximum a posteriori (MAP) estimation; missing data and expectation maximization (EM); graphical models including mixture models, hidden-Markov models; logistic regression and generalized linear models; maximum entropy and undirected graphical models; nonparametric models including nearest neighbor methods and kernel-based methods; dimensionality reduction methods (PCA and LDA); computational learning theory and VC-dimension; reinforcement learning; state-of-the-art applications including bioinformatics, information retrieval, robotics, sensor networks and vision.

COMPSCI 701: Advanced Topics in Computer Science

This is a 6 credit reading course corresponding to the masters project. The official instructor is the GPD although the student does the work with and is evaluated by the readers of his or her master s project. 6 credits.