University of Massachusetts Amherst

Search Google Appliance


Amit Sharma - Causal data mining: Estimating causal effects at scale

DS Seminar
March 1
Computer Science Building, Room 150/151

 Amit Sharma

Title: Causal data mining: Estimating causal effects at scale

Abstract: Identifying causal effects is an integral part of scientific inquiry, spanning a wide range of questions such as understanding behavior in online systems, effect of social policies, or risk factors for diseases. In the absence of a randomized experiment, however, traditional methods such as matching or instrumental variables fail to provide robust estimates because they depend on strong assumptions that are never tested.

In this talk, I first show that many of the strong assumptions are testable and propose a data mining framework  for causal inference from observed data: instead of relying on untestable assumptions, we develop tests for valid experiment-like data---a "natural" experiment---and estimate causal effects only from subsets of data that pass those tests. I will present two such methods. The first utilizes auxiliary data from large-scale systems to automate the search for natural experiments. Applying it to estimate the additional activity caused by Amazon's recommendation system, I find over 20,000 natural experiments, an order of magnitude more than those in past work. These experiments indicate that less than half of the click-throughs typically attributed to the recommendation system are causal; the rest would have happened anyways. The second method proposes a general Bayesian test that can be used for validating natural experiments in any dataset. For instance, I find that a majority of natural experiments used in recent studies in a premier economics journal are likely invalid. More generally, the proposed framework presents a viable way of doing causal inference in large-scale datasets with minimal assumptions.

Bio: Amit Sharma is a postdoctoral researcher at Microsoft Research, New York. His research focuses on understanding the underlying mechanisms that shape people's activities online, with a particular emphasis on the effect of recommendation systems and social influence. More generally, his work contributes to methods for causal inference from large-scale data, combining principles from Bayesian graphical models, data mining and machine learning. He completed his Ph.D. in computer science at Cornell University. He is a recipient of the 2012 Yahoo! Key Scientific Challenges Award and the 2009 Honda Young Engineer and Scientist Award.