University of Massachusetts Amherst

Search Google Appliance


Alessandro Epasto (Google NYC) - Mining Graphs at Scale: Ego-Networks, Clusters and Privacy

Data Science Tea
September 19
Computer Science Building, Room 150/151



Graph analysis is a fundamental primitive in several applications of data mining and machine learning. We present recent advances in this area based on the analysis of local structures in graphs known as ego-nets (i.e. the subgraph induced by the neighborhood of each node). In particular, we present ego-splitting, a novel framework for detecting clusters in complex networks which leverages ego-network analysis to de-couple overlapping clusters. Ego-splitting is a highly scalable and flexible framework, with provable theoretical guarantees that allows to reduce the complex overlapping clustering problem to a simpler and more amenable non-overlapping (partitioning) problem. This allows us to scale overlapping community detection to graphs with tens of billions of edges outperforming previous solutions based on ego-nets analysis.


We will also show how similar ego-net based techniques allow improvements in the important task of graph ranking with applications to friend suggestions. Finally, time permitting, we will briefly discuss some interesting issues in social networks analysis raised by user privacy concerns and some potential algorithmic approaches on how to address them.


Based on joint work with Silvio Lattanzi, Renato Paes Leme, KDD’17 and other papers VLDB'16, KDD’15 (best paper award)


Bio: Alessandro Epasto is a research scientist at Google NYC in the graph mining team (part of the Google NYC Algorithms and Optimization team) lead by Vahab Mirrokni. Before joining Google in 2016 he was a postdoc at Brown University advised by prof. Eli Upfal. He received a Ph.D. in computer science from Sapienza University of Rome where he was advised by prof. Alessandro Panconesi and supported by the Google European Doctoral Fellowship. His research interests include algorithmic problems in large-scale data mining.