The University of Massachusetts Amherst
University of Massachusetts Amherst

Search Google Appliance


Masters Concentration in Data Science

The Computer Science Masters with a Concentration in Data Science was created to help meet the need for expanded and enhanced training in the area of data science. It requires coursework in Theory for Data Science, Systems for Data Science, Data Analysis and Statistics.


Aerial photo of computer science buildingThe Masters Concentration in Data Science teaches you to develop and apply methods to collect, curate, and analyze large-scale data, and to make discoveries and decisions using those analyses.


Requirements and Admissions


Who should apply?

Students require a bachelor’s degree and a solid undergraduate background in computer science.




The Masters Degree is a total of 30 credits and is usually completed in two years.  Four Data Science core courses (12 credits) including one each from the areas of Theory for Data Science, Systems for Data Science, and Data Analysis, and one additional core course from any area. Two courses (6 credits) taken from among a set of courses designated as satisfying the Data Science Elective requirement. One course (3 credits) taken from among a set of courses satisfying the Data Science Probability and Statistics requirement.  



Useful Links

The full-time graduate program admission deadlines are:

  • October 1 for Spring enrollment (Master's Program only)
  • December 15 for Fall enrollment

Courses offered Spring 2019

COMPSCI 514: Algorithms for Data Science

With the advent of social networks, ubiquitous sensors, and large-scale computational science, data scientists must deal with data that is massive in size, arrives at blinding speeds, and often must be processed within interactive or quasi-interactive time frames. This course studies the mathematical foundations of big data processing, developing algorithms and learning how to analyze them. We explore methods for sampling, sketching, and distributed processing of large scale databases, graphs, and data streams for purposes of scalable statistical description, querying, pattern mining, and learning. Was COMPSCI 590D. Undergraduate Prerequisites: COMPSCI 240 and COMPSCI 311. 3 credits

COMPSCI 574/674: Intelligent Visual Computing

Intelligent visual computing is an emerging new field that seeks to combine modern trends in machine learning, computer graphics, computer vision to intelligently process, analyze and synthesize 2D/3D visual data. The course will start by covering 2D image and 3D shape representations, classification and regression techniques, and the fundamentals of deep learning. The course will then provide an in-depth background on analysis and synthesis of images and shapes with deep learning, in particular convolutional neural networks, recurrent neural networks, memory networks, auto-encoders, adversarial networks, reinforcement learning methods, and probabilistic graphical models. Students will complete 5 programming assignments in Matlab/Octave and work on a course project related to visual computing with machine learning. This course counts as a CS Elective toward the CS major (BA/BS).

COMPSCI 589: Machine Learning

This course will introduce core machine learning models and algorithms for classification, regression, clustering, and dimensionality reduction. On the theory side, the course will focus on understanding models and the relationships between them. On the applied side, the course will focus on effectively using machine learning methods to solve real-world problems with an emphasis on model selection, regularization, design of experiments, and presentation and interpretation of results. The course will also explore the use of machine learning methods across different computing contexts including desktop, cluster, and cloud computing. The course will include programming assignments, a midterm exam, and a final project. Python is the required programming language for the course.

COMPSCI 590V: Data Visualization and Exploration

In this course students will learn the fundamental principles of exploring and presenting complex data, both algorithmically and visually.  We will cover systems infrastructure for collating large data, basic visualization of summary statistics, algorithms for exploring patterns in the data (such as rule-mining, graph analysis, clustering, topic models and dimensionality reduction), and artistic and cognition aspects of data presentation (including interactive visualization, human perception, decision-making).  Domains will include numeric data, relational data, geographic data, graphs and text.  Hands-on labs and projects will be performed in Python and D3.

COMPSCI 611: Advanced Algorithms

Principles underlying the design and analysis of efficient algorithms. Topics to be covered include: divide-and-conquer algorithms, graph algorithms, matroids and greedy algorithms, randomized algorithms, NP-completeness, approximation algorithms, linear programming.

COMPSCI 677: Distributed and Operating Systems

This course provides an in-depth examination of the principles of distributed systems in general, and distributed operating systems in particular. Covered topics include processes and threads, concurrent programming, distributed interprocess communication, distributed process scheduling, virtualization, distributed file systems, security in distributed systems, distributed middleware and applications such as the web and peer-to-peer systems. Some coverage of operating system principles for multiprocessors will also be included. A brief overview of advanced topics such as cloud computing, green computing, and mobile computing will be provided, time permitting.


COMPSCI 683: Artificial Intelligence

In-depth introduction to Artificial Intelligence focusing on techniques that allow intelligent systems to reason effectively with uncertain information and cope limited computational resources. Topics include: problem-solving using search, heuristic search techniques, constraint satisfaction, local search, abstraction and hierarchical search, resource-bounded search techniques, principles of knowledge representation and reasoning, logical inference, reasoning under uncertainty, belief networks, decision theoretic reasoning, planning under uncertainty using Markov decision processes, multi-agent planning, and computational models of bounded rationality.

COMPSCI 690D: Deep Learning for Natural Language Processing

This course offers an introduction to the models and principles behind state-of-theart deep learning techniques applied to natural language processing problems. It is intended for graduate students in computer science and linguistics who are (1) interested in learning about cutting-edge research progress in NLP and (2) familiar with machine learning fundamentals. We will cover a variety of models, including vector-based word representations, basic neural network architectures (e.g., convolutional, recurrent), and more advanced variants of these networks that are especially useful for NLP (e.g.,attention-based or memory-augmented). We will also see these models in action on a variety of NLP tasks, including text classification,question answering, and text generation. Coursework includes reading recent research papers, programming assignments, and a final project. 3 credits.

COMPSCI 690OP: Optimization in Computer Science

Much recent work in computer science in a variety of areas, from game theory to machine learning and sensor networks, exploits sophisticated methods of optimization. This course is intended to give students an in-depth background in both the foundations as well as some recent trends in the theory and practice of optimization for computer science. There is currently no course in the department that covers these topics, and yet it is critical to a large number of research projects done within the department.

COMPSCI 701: Advanced Topics in Computer Science

This is a 6 credit reading course corresponding to the masters project. The official instructor is the GPD although the student does the work with and is evaluated by the readers of his or her master s project. 6 credits.

COMPSCI 745: Advanced Systems for Big Data Analytics

This course covers advanced topics on big data analytics systems. The course first covers the design and implementation of scalable, low-latency data analytics systems including parallel databases, MapReduce, BigTable, distributed stream systems, and Spark for unified data analytics. It then covers advanced analytics including online analytics, online aggregation, data exploration, automated insight discovery, and explanation discovery. Finally, the course examines special topics including systems for machine learning and machine learning for systems. The course workload includes paper reviews for each class, one paper presentation, programming exercises using Spark, and a research project related to the course material. The prerequisite is a graduate course on the principles and implementations of data management systems, an equivalent of COMPSCI 645, and basic understanding of machine learning techniques. Students with other backgrounds are asked to contact the instructor for approval for enrollment. This is an online course where the instructor and students will meet through video-conferencing.