University of Massachusetts Amherst

Search Google Appliance


Certificate in Statistical and Computational Data Science

There are three pillars to Data Science: statistical skills, computer science and domain expertise. This certificate is offered jointly through the Statistics and Computer Science departments. The program blends topics in statistical methods, statistical computing, machine learning and algorithm development to train students to become effective data scientists for any domain. Additional skills that students will develop include the ability to work with large databases, and to manage and evaluate data sets and create meaningful output that can be used in effective decision making. More information hereQuestions about this program should be directed to



The Certificate is a total of 15 credits and can be completed in one year. It consists of at least two computer science courses and two statistics courses.


Useful Links

Certificate Courses offered Spring 2018

COMPSCI 589: Machine Learning

This course will introduce core machine learning models and algorithms for classification, regression, clustering, and dimensionality reduction. On the theory side, the course will focus on understanding models and the relationships between them. On the applied side, the course will focus on effectively using machine learning methods to solve real-world problems with an emphasis on model selection, regularization, design of experiments, and presentation and interpretation of results. The course will also explore the use of machine learning methods across different computing contexts including desktop, cluster, and cloud computing. The course will include programming assignments, a midterm exam, and a final project. Python is the required programming language for the course.

COMPSCI 590D: Algorithms for Data Science

Big Data brings us to interesting times and promises to revolutionize our society from business to government, from healthcare to academia. As we walk through this digitized age of exploded data, there is an increasing demand to develop unified toolkits for data processing and analysis. In this course our main goal is to rigorously study the mathematical foundation of big data processing, develop algorithms and learn how to analyze them. Specific Topics to be covered include: 1) Clustering 2) Estimating Statistical Properties of Data 3) Near Neighbor Search 4) Algorithms over Massive Graphs and Social Networks 5) Learning Algorithms 6) Randomized Algorithms. This course counts as a CS Elective toward the CS major. 3 credits.

COMPSCI 590V: Data Visualization and Exploration

In this course students will learn the fundamental principles of exploring and presenting complex data, both algorithmically and visually.  We will cover systems infrastructure for collating large data, basic visualization of summary statistics, algorithms for exploring patterns in the data (such as rule-mining, graph analysis, clustering, topic models and dimensionality reduction), and artistic and cognition aspects of data presentation (including interactive visualization, human perception, decision-making).  Domains will include numeric data, relational data, geographic data, graphs and text.  Hands-on labs and projects will be performed in Python and D3.

COMPSCI 690N: Advanced Natural Language Processing

This course covers a broad range of advanced level topics in natural language processing. It is intended for graduate students in computer science who have familiarity with machine learning fundamentals. It may also be appropriate for computationally sophisticated students in linguistics and related areas. Topics include probabilistic models of language, computationally tractable linguistic representations for syntax and semantics, neural network models for language, and selected topics in discourse and text mining. After completing the course, students should be able to read and evaluate current NLP research papers. Coursework includes homework assignments and a final project.

STATISTC 535: Statistical Computing

The course will introduce computing tools needed for statistical analysis including data acquisition from database, data exploration and analysis, numerical analysis and result presentation.  Advanced topics include parallel computing, simulation and optimization, and package creation.  The class will be taught in a modern statistical computing language.