This work is combines different sources of knowledge in order to understand, manipulate, and describe the world. This is what humans do, the difference between us and machines is that we are somehow able to grasp the big picture when we look at an image, rather than get swamped in the details.
In order to demonstrate our ability to combine knowledge from multiple sources and modalities we develop an approach to recognizing, discussing, and manipulating 3D part-based objects. We combine knowledge from
- multiple shape detectors,
- possible parts,
- physics,
- multiple images from different viewpoints,
- multiple images from different stages of assembly, and
- natural language sentences.
This work has been published in:
- Siddharth Narayanaswamy, Andrei Barbu, Jeffrey Mark Siskind, 'A visual language model for estimating object pose and structure in a generative visual domain', IEEE International Conference on Robotics and Automation, May 2011.
- Siddharth Narayanaswamy, Andrei Barbu, Jeffrey Mark Siskind, 'Seeing unseeability to see the unseeable', Advances in Cognitive Systems, December 2012.