Doctoral students in the Computer Vision Lab at the UMass Amherst College of Information and Computer Sciences have had three papers accepted to CVPR 2020 (Computer Vision and Pattern Recognition), the country’s premier computer vision conference, organized by The Computer Vision Foundation and the IEEE Computer Society.
Doctoral student Zezhou Cheng, along with alumnus Pankaj Bhambhani (‘18MS), advisors Daniel Sheldon and Subhransu Maji, and collaborators from the University of Washington, University of North Carolina Asheville, and Cornell University, authored a paper titled “Detecting and Tracking Communal Bird Roosts in Weather Radar Data.” Each night, certain breeds of birds roost together in large numbers at nighttime roosting locations. Their morning departure is often visible in weather radar images. The paper describes a machine learning system to detect and track roost signatures in weather radar data. In past research in this area, labels collected from previous studies had systematic differences in labeling style, which presented a signficant challenge. The researchers contribute a latent variable model and EM (expectation-maximization) algorithm to learn a detection model together with models of labeling styles for individual annotators. By properly accounting for these variations a significantly more accurate detector is developed. The resulting system detects previously unknown roosting locations and provides comprehensive spatio-temporal data about roosts across the US. This data will provide biologists important information about the poorly understood phenomena of broad-scale habitat use and movements of communally roosting birds during the non-breeding season.
Chenyun Wu, a doctoral student, authored a paper titled “PhraseCut: Language-based Image Segmentation in the Wild” along with advisor Subhransu Maji and collaborators from Adobe Research, Language-based image segmentation refers to the ability to detect an object or region within an image. For example, detecting “the girl holding an umbrella” in a photo, and cropping out the region in the photo that corresponds to the descriptor. This task is interesting because it requires an understanding of both the image and the language description, as well as the correlations between them. It also has a wide range of applications such as human-robot interaction and automatic photo editing. When you tell a robot to “pick up the plastic bottle on the left,”or you want a photo editing tool to automatically “remove the blue car in the background,” the first task for the robot or the photo editing tool is to understand where is “the plastic bottle” or ”the car.” While past research in this area has included only around 80 categories, this research proposes a new model on a large-scale dataset of over a thousand object categories. The new modelincludes foreground objects (such as people and animals), as well as background objects (such as trees, clouds, streets), and allows referencing multiple targets in one description (such as “all players in white shirts.”). The model also includes an “attention mechanism” that learns to leverage predictions on frequent concepts (such as “person”) with rare concepts (such as “policeman”).
Doctoral student Matheus Gadelha, along with advisors Rui Wang and Subhransu Maji, and collaborators from Adobe Research, authored a paper titled “Learning Generative Models of Shape Handles.” Editing a three-dimensional (3D) image requires skills and training that few people have, due to the fact that the shapes are represented in a manner that is not amenable to editing and creation tasks. In this paper, the researchers propose an algorithm to teach computers to generate 3D data represented as a set of shape handles -- simple shape proxies that can be easily edited by users. For example, the user can create a complete chair by simply describing one of its parts and letting the algorithm propose multiple options that both satisfy the user's constraints and create a reasonable output. Similarly, if the user wants to edit an existing airplane, it does not have to change every single part of the object, just change a single one and let the system automatically modify the rest of the object to fit the user's editing. The researchers also present state-of-the-art results for shape parsing--given a shape scanned from a 3D sensor, their technique can process this raw representation and turn it into one amenable to editing and manipulation tasks.
The CVPR conference is scheduled to take place June 14-19 in Seattle, Washington, or virtually.