Abstract: Word embeddings are the most widely used kind of distributional meaning representations in both industrial and academic NLP systems, and they can make dramatic difference in the performance of the system. However, the absence of a reliable intrinsic evaluation metric makes it hard to choose between dozens of models and their parameters. This work presents Linguistic Diagnostics (LD), a new methodology for evaluation, error analysis and development of word embedding models that is implemented in an open-source Python library. In a large-scale experiment with 14 datasets LD successfully highlights the differences in the output of GloVe and word2vec algorithms that correlate with their performance on different NLP tasks.
Bio: Anna Rogers is a post-doctoral associate in the Text Machine Lab of Computer Science Department, University of Massachusetts Lowell. Prior to that, she received her Ph.D. degree in computational linguistics from University of Tokyo (Japan). Her research interests lie at the intersection of linguistics, natural language processing, and machine learning. Current projects span intrinsic evaluation of word embeddings, sentiment analysis, temporal and analogical reasoning.