Andrei Barbu - Home

This work is combines different sources of knowledge in order to understand, manipulate, and describe the world. This is what humans do, the difference between us and machines is that we are somehow able to grasp the big picture when we look at an image, rather than get swamped in the details.

In order to demonstrate our ability to combine knowledge from multiple sources and modalities we develop an approach to recognizing, discussing, and manipulating 3D part-based objects. We combine knowledge from

multiple shape detectors,
possible parts,
physics,
multiple images from different viewpoints,
multiple images from different stages of assembly, and
natural language sentences.

In the process we reason about physical concepts such as occlusion and support, and about high-level human concepts such as windows, doors, and walls. A single unified approach to this problem is used in multiple ways: to recognize a structure, to describe it in natural language, to understand natural-language descriptions about it, to understand our confidence in our estimate of the structure, and to plan how we might increase that confidence (moving the robot camera, partially disassembling the structure, or asking a question).

This work has been published in:

Siddharth Narayanaswamy, Andrei Barbu, Jeffrey Mark Siskind, 'A visual language model for estimating object pose and structure in a generative visual domain', IEEE International Conference on Robotics and Automation, May 2011.
Siddharth Narayanaswamy, Andrei Barbu, Jeffrey Mark Siskind, 'Seeing unseeability to see the unseeable', Advances in Cognitive Systems, December 2012.