University of Massachusetts Amherst

Search Google Appliance


Certificate in Statistical and Computational Data Science

There are three pillars to Data Science: statistical skills, computer science and domain expertise. This certificate is offered jointly through the Statistics and Computer Science departments. The program blends topics in statistical methods, statistical computing, machine learning and algorithm development to train students to become effective data scientists for any domain. Additional skills that students will develop include the ability to work with large databases, and to manage and evaluate data sets and create meaningful output that can be used in effective decision making. More information hereQuestions about this program should be directed to



The Certificate is a total of 15 credits and can be completed in one year. It consists of at least two computer science courses and two statistics courses.


Useful Links

Certificate Courses offered Spring 2019

COMPSCI 514: Algorithms for Data Science

With the advent of social networks, ubiquitous sensors, and large-scale computational science, data scientists must deal with data that is massive in size, arrives at blinding speeds, and often must be processed within interactive or quasi-interactive time frames. This course studies the mathematical foundations of big data processing, developing algorithms and learning how to analyze them. We explore methods for sampling, sketching, and distributed processing of large scale databases, graphs, and data streams for purposes of scalable statistical description, querying, pattern mining, and learning. Was COMPSCI 590D. Undergraduate Prerequisites: COMPSCI 240 and COMPSCI 311. 3 credits

COMPSCI 590V: Data Visualization and Exploration

In this course students will learn the fundamental principles of exploring and presenting complex data, both algorithmically and visually.  We will cover systems infrastructure for collating large data, basic visualization of summary statistics, algorithms for exploring patterns in the data (such as rule-mining, graph analysis, clustering, topic models and dimensionality reduction), and artistic and cognition aspects of data presentation (including interactive visualization, human perception, decision-making).  Domains will include numeric data, relational data, geographic data, graphs and text.  Hands-on labs and projects will be performed in Python and D3.

COMPSCI 690D: Deep Learning for Natural Language Processing

This course offers an introduction to the models and principles behind state-of-theart deep learning techniques applied to natural language processing problems. It is intended for graduate students in computer science and linguistics who are (1) interested in learning about cutting-edge research progress in NLP and (2) familiar with machine learning fundamentals. We will cover a variety of models, including vector-based word representations, basic neural network architectures (e.g., convolutional, recurrent), and more advanced variants of these networks that are especially useful for NLP (e.g.,attention-based or memory-augmented). We will also see these models in action on a variety of NLP tasks, including text classification,question answering, and text generation. Coursework includes reading recent research papers, programming assignments, and a final project. 3 credits.