Abstract: Word embeddings are a powerful approach for capturing semantic similarity among terms in a vocabulary. Exponential family embeddings extend the idea of word embeddings to other types of high-dimensional data such as count data from a recommendation system or real-valued data from neural recordings. Exponential family embeddings have three ingredients; embeddings as latent variables, a predefined conditioning set for each observation called the context, and a conditional likelihood from the exponential family. The embeddings are inferred with a scalable algorithm based on stochastic gradient descent. In this talk, I discuss three highlights of the exponential family embeddings model class: (A) The approximations used for existing methods such as word2vec can be understood as a biased stochastic gradients procedure on a specific type of exponential family embedding model. (B) By choosing different likelihoods from the exponential family we can generalize the task of learning distributed representations to different application domains. (C) Finally, the probabilistic modeling perspective allows us to incorporate structure and domain knowledge in the latent space. With dynamic embeddings, we can study how word usage changes over time and structured embeddings allow us to learn embeddings that vary across related groups of data. Key to the success of our method is that the groups share statistical information and we develop three sharing strategies: dynamic modeling, hierarchical modeling, and amortization.
Speaker Bio: As a computer science PhD student at Columbia University, Maja Rudolph studies probabilistic modeling and approximate inference. Together with her advisor David Blei, she works on embedding models and explores how they can be used to find rich, interpretable structure in large data sets. In 2013, she obtained a BS in mathematics from MIT.