Bandit learning is characterized by the tension between long-term exploration and short-term exploitation. However, as has recently been noted, in settings in which the choices of the learning algorithm correspond to important decisions about individual people (such as criminal recidivism prediction, lending, and sequential drug trials), exploration corresponds to explicitly sacrificing the well-being of one individual for the potential future benefit of others. This raises a fairness concern. In such settings, one might like to run a “greedy” algorithm, which always makes the (myopically) optimal decision for the individuals at hand — but doing this can result in a catastrophic failure to learn. In this paper, we consider the linear contextual bandit problem and revisit the performance of the greedy algorithm. We give a smoothed analysis, showing that even when contexts may be chosen by an adversary, small perturbations of the adversary’s choices suffice for the algorithm to achieve “no regret”, perhaps (depending on the specifics of the setting) with a constant amount of initial training data. This suggests that “generically” (i.e. in slightly perturbed environments), exploration and exploitation need not be in conflict in the linear setting.
Steven Wu is currently a Post-Doctoral Researcher at Microsoft Research in New York City, where he is a member of the Machine Learning and Algorithmic Economics groups. He will be joining the Department of Computer Science and Engineering at the University of Minnesota as an Assistant Professor starting in fall 2018. He received his Ph.D. in Computer Science from the University of Pennsylvania in 2017, under the supervision of Michael Kearns and Aaron Roth. His doctoral dissertation “Data Privacy Beyond Differential Privacy” received the 2017 Morris and Dorothy Rubinoff Dissertation Award. His research focuses on algorithm design under different social constraints. In particular, his primary research interest is on data privacy, specifically differential privacy, where he builds tools for data analysis under the constraint of privacy preservation. His recent research also studies algorithmic fairness, especially in the context of machine learning, where he investigates how we can prevent bias and unfairness in algorithmic decision making. He examines problems in these areas using methods and models from machine learning theory, economics, optimization, and beyond.