Abstract: The performance of recognition systems has grown by leaps and bounds these last 5 years. However, modern recognition systems still require thousands of examples per class to train. Furthermore, expanding the capabilities of the system by introducing new visual concepts again requires collecting thousands of examples for the new concept. In contrast, humans are known to quickly learn new visual concepts from as few as 1 example, and indeed require very little labeled data to build their powerful visual systems from scratch. The requirement for large training sets also makes it infeasible to use current machine vision systems for rare or hard-to-annotate visual concepts or new imaging modalities.
I will talk about some of our work on reducing this need for large labeled training sets. I will describe novel loss functions for training convolutional network-based feature representations so that new concepts can be learned from a few examples, and ways of hallucinating additional examples for data-starved classes. I will also discuss our attempt to learn feature representations without any labeled data by leveraging motion-based grouping cues. I will end with a discussion of where we are and thoughts on the way forward.
Bio: Bharath Hariharan is an assistant professor at Cornell. Before joining Cornell, he spent two years as a postdoc in Facebook AI Research after obtaining a PhD from UC Berkeley with Jitendra Malik. At Berkeley, he was the recepient of the Microsoft Research fellowship. His interests are in all things visual recognition. Of late, he has become bothered by the reliance on massive labeled datasets and the scalability of such datasets to harder problems such as visual reasoning. His current work is on building recognition systems that learn with less data and / or output a much deeper understanding of images.