0xAB

Andrei Barbu

Picture of Andrei Barbu

Andrei is a research scientist at MIT working on natural language processing, computer vision, and robotics, with a touch of neuroscience.


I’m most interested in finding new AI and ML problems and in refining our understanding of existing problems that might look solved but are rarely so. Often this means formalizing pre-theoretic problems that we only vaguely understand, for example, creating benchmarks for social interactions so that we can make progress on social robotics. Of course, this also leads to exciting new algorithms; I tend to focus on zero-shot learning through compositionality. Other times, it means reevaluating our progress, for example, by creating a new object recognition benchmark show that there is a wide gulf between humans and machines. My main forte as a researcher is a wide cross-disciplinary background: I’ve worked on natural language processing, computer vision, robotics, machine learning, artificial intelligence, conference.

Below are a few of the topics I’ve worked on. Much more is available from my google scholar page.

The measurement problem in AI/ML #

We have no idea how well we are progressing toward human-level ML/AI. We don’t even have a quantitative way to ask this question. This is the source for the hype cycle in AI and a lot of uncertainty about what ML can and cannot do.

For computer vision tasks we can now measure objectively how difficult an image is for a human to recognize ( Citation: , & al., , , , , & (). How hard are computer vision datasets? Calibrating dataset difficulty to viewing time.

@article{mayo2023hard,
    title={How hard are computer vision datasets? Calibrating dataset difficulty to viewing time},
    author={Mayo, David and Cummings, Jesse and Lin, Xinyu and Gutfreund, Dan and Katz, Boris and Barbu, Andrei},
    year={2023}
   }
)
; we call this measure MVT (minimum viewing time required to recognize the image). We discovered that current datasets are too simple and mostly lack images that humans find hard to recognize. Image difficulty has many potential applications, from discovering what the brain is doing for hard images, to creating better datasets, to helping industry understand performance before deployment. See more on objectnet.dev

Image captioning looks like a mostly solved problem from the point of view of current metrics. Machines generally outperform humans. Our work shows that this isn’t so, humans strongly prefer human captions and think they are of far higher quality. We put forward a new metric, HUMANr, that solves these problems and enables future research on captioning. This paper is coming soon!

Object recognition datasets often allow machines to cheat in ways that humans do not. Objects have correlations with the background and they’re often posed to look nice for online photos. This helps machines tremendously and overstates their performance. We developed ObjecNet, a new benchmark, to highlight his problem and it revealed a massive human-machine performance gap in object recognition, far larger than anyone expected. See objectnet.dev for more. ( Citation: , & al., , , , , , , & (). Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. Advances in neural information processing systems (NeurIPS), 32. objectnet.dev

@article{barbu2019objectnet,
    title={Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models},
    author={Barbu, Andrei and Mayo, David and Alverio, Julian and Luo, William and Wang, Christopher and Gutfreund, Dan and Tenenbaum, Josh and Katz, Boris},
    journal={Advances in neural information processing systems (NeurIPS)},
    volume={32},
    year={2019},
    URL={objectnet.dev}
   }
)

Social interactions #

Throughout machine learning social interactions are largely overlooked. We don’t know what social actions are, how to interpret them, and how robots can engage socially. We took a first step by creating the first social simulator ( Citation: , & al., , , , & (). PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception. The AAAI Conference on Artificial Intelligence.

@inproceedings{netanyahu2021phase,
    title={{PHASE}: PHysically-grounded Abstract Social Events for Machine Social Perception},
    author={Netanyahu, Aviv and Shu, Tianmin and Katz, Boris and Barbu, Andrei and Tenenbaum, Joshua B},
    booktitle={The AAAI Conference on Artificial Intelligence},
    year={2020}
   }
)
.

We then created a new class of models, Social MDPs, that perform zero-shot social interactions ( Citation: , & al., , , , & (). Social interactions as recursive MDPs. Conference on Robot Learning. 949–958.

@inproceedings{tejwani2022social,
    title={Social interactions as recursive {MDP}s},
    author={Tejwani, Ravi and Kuo, Yen-Ling and Shu, Tianmin and Katz, Boris and Barbu, Andrei},
    booktitle={Conference on Robot Learning},
    pages={949--958},
    year={2022},
    organization={PMLR}
   }
)
. Psychologists had long proposed that social interactions might have some core skills which are then reused. We formalized several of these hypothesized core skills to enable robots to interact socially ( Citation: , & al., , , , , , , & (). Incorporating rich social interactions into MDPs. International Conference on Robotics and Automation (ICRA). 7395–7401.

@inproceedings{tejwani2022incorporating,
    title={Incorporating rich social interactions into {MDPs}},
    author={Tejwani, Ravi and Kuo, Yen-Ling and Shu, Tianmin and Stankovits, Bennett and Gutfreund, Dan and Tenenbaum, Joshua B and Katz, Boris and Barbu, Andrei},
    booktitle={International Conference on Robotics and Automation (ICRA)},
    pages={7395--7401},
    year={2022},
    organization={IEEE}
   }
)
. Psychologists had also proposed that these core social interactions might combine together to express more complex behaviors. We created the first computational mechanism for combining social interactions, showing that a rich algebra of interactions exists ( Citation: , & al., , , , , , , & (). Zero-shot linear combinations of grounded social interactions with Linear Social MDPs. AAAI Conference on Artificial Intelligence.

@inproceedings{tejwani2023zero,
    title={Zero-shot linear combinations of grounded social interactions with Linear Social MDPs},
    author={Tejwani, Ravi and Kuo, Yen-Ling and Shu, Tianmin and Stankovits, Bennett and Gutfreund, Dan and Tenenbaum, Joshua B and Katz, Boris and Barbu, Andrei},
    booktitle={AAAI Conference on Artificial Intelligence},
    year={2023}
   }
)
.

Social interactions are key to learning, and they may also be key to the development of language and intelligence. While many have investigated how the simplest words, like nouns and verbs, might arise when people attempt to communicate, we investigated how humans develop complex systems of signs and symbols. Subjects were forced to communicate complex statements in FOL physically, by controlling a car. They spontaneously developed rich symbol systems, although these were often meta-stable and could easily collapse. This could shed light on what happened for the hundreds of thousands of years that humans were anatomically modern but not yet technologically advanced ( Citation: , & al., , , , , & (). Quantifying the Emergence of Symbolic Communication. Proceedings of the Annual Meeting of the Cognitive Science Society, 44(44).

@inproceedings{cheng2022quantifying,
    title={Quantifying the Emergence of Symbolic Communication},
    author={Cheng, Emily and Kuo, Yen-Ling and Correa, Josefina and Katz, Boris and Cases, Ignacio and Barbu, Andrei},
    booktitle={Proceedings of the Annual Meeting of the Cognitive Science Society},
    volume={44},
    number={44},
    year={2022}
   }
)
.

Compositional ML #

Compositionality is at the core of much of the work that I do and pervades many of the other topics.

Current approaches to ML are not compositional. This results in models that fail spectacularly on inputs which humans would judge to be trivial. We investigated this problem from many angles and created new tasks and models along the way.

Robots often need to follow complex commands, but, since models are not compositional they fail in surprising ways. One way to address this problem is to parse commands into a formalism like LTL where commands can be understood more easily. But, such parsers are hard to train, there is little data available, and human commands tend to be terse and context-specific. We created a new kind of LTL parser that uses a simulator in the loop to learn to convert sentences into LTL formulas ( Citation: , & al., , , , & (). Learning a natural-language to LTL executable semantic parser for grounded robotics. Conference on Robot Learning. 1706–1718.

@inproceedings{wang2021learning,
    title={Learning a natural-language to LTL executable semantic parser for grounded robotics},
    author={Wang, Christopher and Ross, Candace and Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
    booktitle={Conference on Robot Learning},
    pages={1706--1718},
    year={2021},
    organization={PMLR}
   }
)
. This was based on our earlier approach that attempted to learn syntax and semantics from vision in a more child-like way rather than using language-only massive datasets ( Citation: , & al., , , , & (). Grounding language acquisition by training semantic parsers using captioned videos. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2647–2656.

@inproceedings{ross2018grounding,
    title={Grounding language acquisition by training semantic parsers using captioned videos},
    author={Ross, Candace and Barbu, Andrei and Berzak, Yevgeni and Myanganbayar, Battushig and Katz, Boris},
    booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
    pages={2647--2656},
    year={2018}
   }
)
.

Even when a robot can understand a command and relate it to its environment, an extremely challenging problem on its own ( Citation: , & al., , , , & (). Temporal grounding graphs for language understanding with accrued visual-linguistic context. Proceedings of the 26th International Joint Conference on Artificial Intelligence. 4506–4514.

@inproceedings{paul2017temporal,
    title={Temporal grounding graphs for language understanding with accrued visual-linguistic context},
    author={Paul, Rohan and Barbu, Andrei and Felshin, Sue and Katz, Boris and Roy, Nicholas},
    booktitle={Proceedings of the 26th International Joint Conference on Artificial Intelligence},
    pages={4506--4514},
    year={2017}
   }
)
, executing that command is still very difficult. We don’t know what makes human and animal planning so effortless while to machines planning appears to be a nearly insurmountably difficult challenge even in the simplest domains. To move in the direction of better planners we combined two methods, classical planning with RRT, and deep learning to both find new plans efficiently and to exploit prior knowledge ( Citation: , & al., , & (). Deep sequential models for sampling-based planning. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 6490–6497.

@inproceedings{kuo2018deep,
    title={Deep sequential models for sampling-based planning},
    author={Kuo, Yen-Ling and Barbu, Andrei and Katz, Boris},
    booktitle={2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    pages={6490--6497},
    year={2018},
    organization={IEEE}
   }
Citation: , & al., , & (). Deep sequential models for sampling-based planning. International Conference on Intelligent Robots and Systems (IROS).

@inproceedings{kuo2018deep,
    title={Deep sequential models for sampling-based planning},
    author={Kuo, Yen-Ling and Barbu, Andrei and Katz, Boris},
    booktitle={International Conference on Intelligent Robots and Systems (IROS)},
    year={2018}
   }
)
. We then composed many of these hybrid planners together to execute plans described in natural language ( Citation: , & al., , & (). Deep compositional robotic planners that follow natural language commands. 2020 IEEE International Conference on Robotics and Automation (ICRA). 4906–4912.

@inproceedings{kuo2020deep,
    title={Deep compositional robotic planners that follow natural language commands},
    author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
    booktitle={2020 IEEE International Conference on Robotics and Automation (ICRA)},
    pages={4906--4912},
    year={2020},
    organization={IEEE}
   }
)
and then showed how this method can also execute plans in LTL zero-shot ( Citation: , & al., , & (). Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas. International Conference on Intelligent Robots and Systems (IROS).

@inproceedings{kuo2020encoding,
    title={Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas},
    author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
    booktitle={International Conference on Intelligent Robots and Systems (IROS)},
    year={2020}
   }
)
. These methods perform exceedingly well on benchmarks for compositional reasoning ( Citation: , & al., , & (). Compositional Networks Enable Systematic Generalization for Grounded Language Understanding. Findings of the Association for Computational Linguistics: EMNLP 2021.

@inproceedings{kuo2021compositional,
    title={Compositional Networks Enable Systematic Generalization for Grounded Language Understanding},
    author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
    booktitle={Findings of the Association for Computational Linguistics: EMNLP 2021},
    year={2021}
   }
Citation: , & al., , & (). Compositional RL Agents That Follow Language Commands in Temporal Logic. Frontiers in Robotics and AI, 8.

@article{kuo2021compositional,
    title={Compositional RL Agents That Follow Language Commands in Temporal Logic},
    author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
    journal={Frontiers in Robotics and AI},
    volume={8},
    year={2021},
    publisher={Frontiers Media SA}
   }
)
.

In the autonomous driving domain, we developed methods which incorporate language into how robots reason about the actions of others ( Citation: , & al., , , , , , & (). Trajectory prediction with linguistic representations. 2022 International Conference on Robotics and Automation (ICRA). 2868–2875.

@inproceedings{kuo2022trajectory,
    title={Trajectory prediction with linguistic representations},
    author={Kuo, Yen-Ling and Huang, Xin and Barbu, Andrei and McGill, Stephen G and Katz, Boris and Leonard, John J and Rosman, Guy},
    booktitle={2022 International Conference on Robotics and Automation (ICRA)},
    pages={2868--2875},
    year={2022},
    organization={IEEE}
   }
Citation: , & al., , , , , , & (). Trajectory prediction with linguistic representations. International Conference on Robotics and Automation (ICRA). 2868–2875.

@inproceedings{kuo2022trajectory,
    title={Trajectory prediction with linguistic representations},
    author={Kuo, Yen-Ling and Huang, Xin and Barbu, Andrei and McGill, Stephen G and Katz, Boris and Leonard, John J and Rosman, Guy},
    booktitle={International Conference on Robotics and Automation (ICRA)},
    pages={2868--2875},
    year={2022},
    organization={IEEE}
   }
)
.

The neuroscience of language and vision #

The BrainTreebank is coming soon! It will be the largest neuroscience corpus that is paired with vision and language with every sentence parsed in UD.

To help with decoding experiments that have few training examples we developed BrainBERT. A way to pretrain a large Transformer on SEEG neural recordings. It achieves high classification performance on new tasks with new patients using only a fraction of the training data. ( Citation: , & al., , , , , , & (). BrainBERT: Self-supervised representation learning for intracranial recordings. The Eleventh International Conference on Learning Representations (ICLR).

@inproceedings{wang2023brainbert,
    title={BrainBERT: Self-supervised representation learning for intracranial recordings},
    author={Wang, Christopher and Subramaniam, Vighnesh and Yaari, Adam Uri and Kreiman, Gabriel and Katz, Boris and Cases, Ignacio and Barbu, Andrei},
    booktitle={The Eleventh International Conference on Learning Representations (ICLR)},
    year={2023}
   }
)

Which networks architectures resemble visual areas of the mouse brain the most? And is the mouse visual cortex also arranged hierarchically like our own? We found that yes, mouse visual cortex is arranged hierarchically, and that Transformers are a good fit, as long as you measure them correctly ( Citation: , & al., , , , , & (). Neural regression, representational similarity, model zoology & neural taskonomy at scale in rodent visual cortex. Advances in Neural Information Processing Systems, 34. 5590–5607.

@article{conwell2021neural,
    title={Neural regression, representational similarity, model zoology & neural taskonomy at scale in rodent visual cortex},
    author={Conwell, Colin and Mayo, David and Barbu, Andrei and Buice, Michael and Alvarez, George and Katz, Boris},
    journal={Advances in Neural Information Processing Systems},
    volume={34},
    pages={5590--5607},
    year={2021}
   }
)
.

How does information from language and vision fuse together in the brain? And what we can we learn about multimodal Transformers from their relationship to the brain? We compared multimodal models to neural recordings for the first time and found several sites of multimodal integration ( Citation: , & al., , , , , , & (). Workshop Submission: Using Multimodal DNNs to Study Vision-Language Integration in the Brain. ICLR 2023 Workshop on Multimodal Representation Learning: Perks and Pitfalls.

@inproceedings{subramaniam2023workshop,
    title={Workshop Submission: Using Multimodal DNNs to Study Vision-Language Integration in the Brain},
    author={Subramaniam, Vighnesh and Conwell, Colin and Wang, Christopher and Kreiman, Gabriel and Katz, Boris and Cases, Ignacio and Barbu, Andrei},
    booktitle={ICLR 2023 Workshop on Multimodal Representation Learning: Perks and Pitfalls},
    year={2023}
   }
)
. This required new techniques to avoid statistical artifacts.

Equity, justice, and accessibility #

How biased are multimodal models, and how do we generalize the notion of bias to any modality? You might hope that combining vision and language would lead to fewer biases, but sadly, the result is the opposite. ( Citation: , & al., , & (). Measuring Social Biases in Grounded Vision and Language Embeddings. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 998–1008. https://aclanthology.org/2021.naacl-main.78.pdf

@inproceedings{ross2021measuring,
    title={Measuring Social Biases in Grounded Vision and Language Embeddings},
    author={Ross, Candace and Katz, Boris and Barbu, Andrei},
    booktitle={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
    pages={998--1008},
    year={2021},
    URL={https://aclanthology.org/2021.naacl-main.78.pdf}
   }
)

How do we help people with different disabilities? Historically, the focus has often been on replacing senses, for example, replacing limited vision with a system that tells you what objects are in front of view. We argue the opposite, that we should instead enhance current senses. For example, by automatically highlighting and augmenting what is important or by hiding harmful input such as flashing for people with photosensitivity. We automatically derive visual filters that help people with different disabilities, such as photosensitivity. ( Citation: , & al., , & (). Deep video-to-video transformations for accessibility with an application to photosensitivity. Pattern Recognition Letters, 137. 99–107.

@article{barbu2020deep,
    title={Deep video-to-video transformations for accessibility with an application to photosensitivity},
    author={Barbu, Andrei and Banda, Dalitso and Katz, Boris},
    journal={Pattern Recognition Letters},
    volume={137},
    pages={99--107},
    year={2020},
    publisher={Elsevier}
   }
; Citation: , & al., , & (). Computer method and apparatus making screens safe for those with photosensitivity. Google Patents.

@misc{barbu2022computer,
    title={Computer method and apparatus making screens safe for those with photosensitivity},
    author={Barbu, Andrei and Banda, Dalitso and Katz, Boris},
    year={2022},
    publisher={Google Patents},
    note={US Patent 11,381,715}
   }
)

Multimodal understanding and reasoning with language #

Video search is becoming more advanced, but today it is largely based on keywords and captions. We developed the first approach to search videos with sentences based on the content of the video itself ( Citation: , & al., , , & (). Saying what you`re looking for: Linguistics meets video search. IEEE transactions on pattern analysis and machine intelligence, 38(10). 2069–2081.

@article{barrett2015saying,
    title={Saying what you`re looking for: Linguistics meets video search},
    author={Barrett, Daniel Paul and Barbu, Andrei and Siddharth, N and Siskind, Jeffrey Mark},
    journal={IEEE transactions on pattern analysis and machine intelligence},
    volume={38},
    number={10},
    pages={2069--2081},
    year={2015},
    publisher={IEEE}
   }
)
.

Ambiguity is still something that confuses ML systems. We created the first benchmark of ambiguous language-vision scenarios, captions paired with videos where the caption’s interpretation and grounding (who did what to who) changes based on the video. And we created the first model which performs this disambiguation task ( Citation: , & al., , , , & (). Do You See What I Mean? Visual Resolution of Linguistic Ambiguities. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1477–1487.

@inproceedings{berzak2015you,
    title={Do You See What I Mean? Visual Resolution of Linguistic Ambiguities},
    author={Berzak, Yevgeni and Barbu, Andrei and Harari, Daniel and Katz, Boris and Ullman, Shimon},
    booktitle={Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
    pages={1477--1487},
    year={2015}
   }
)
.

Machines can learn to play games, but to do so, they are usually told the rules of the game and they must learn to play well, not to play correctly. They also are given the input game symbolically, rather than as visual input. Kids on the other hand learn to play games by looking at a board and figuring out the rules as they go. We created the first robotic system that learns to play games the way kids do ( Citation: , & al., , & (). Learning physically-instantiated game play through visual observation. 2010 IEEE International Conference on Robotics and Automation. 1879–1886.

@inproceedings{barbu2010learning,
    title={Learning physically-instantiated game play through visual observation},
    author={Barbu, Andrei and Narayanaswamy, Siddharth and Siskind, Jeffrey Mark},
    booktitle={2010 IEEE International Conference on Robotics and Automation},
    pages={1879--1886},
    year={2010},
    organization={IEEE}
   }
)

We still don’t understand what vision is. Sure, we have some vision-related tasks like object recognition, segmentation, etc. But, the rich perception of vision that we have as humans which integrates with other abilities like physical reasoning still eludes us. We took a step in this direction by building a vision system that could perceive complex structures and use the stability of the structure to infer occluded pieces, as well as to execute linguistic commands that manipulated that structure. ( Citation: , & al., , & (). A visual language model for estimating object pose and structure in a generative visual domain. 2011 IEEE International Conference on Robotics and Automation. 4854–4860.

@inproceedings{narayanaswamy2011visual,
    title={A visual language model for estimating object pose and structure in a generative visual domain},
    author={Narayanaswamy, Siddharth and Barbu, Andrei and Siskind, Jeffrey Mark},
    booktitle={2011 IEEE International Conference on Robotics and Automation},
    pages={4854--4860},
    year={2011},
    organization={IEEE}
   }
; Citation: , & al., , & (). Seeing Unseeability to See the Unseeable. Advances in Cognitive Systems, 2. 77–94.

@article{siddharth2012seeing,
    title={Seeing Unseeability to See the Unseeable},
    author={Siddharth, N and Barbu, Andrei and Siskind, Jeffrey Mark},
    journal={Advances in Cognitive Systems},
    volume={2},
    pages={77--94},
    year={2012},
    publisher={Cognitive Systems Foundation}
   }
)

References #

Tejwani, Kuo, Shu, Stankovits, Gutfreund, Tenenbaum, Katz & Barbu (2022)
, , , , , , & (). Incorporating rich social interactions into MDPs. International Conference on Robotics and Automation (ICRA). 7395–7401.

@inproceedings{tejwani2022incorporating,
    title={Incorporating rich social interactions into {MDPs}},
    author={Tejwani, Ravi and Kuo, Yen-Ling and Shu, Tianmin and Stankovits, Bennett and Gutfreund, Dan and Tenenbaum, Joshua B and Katz, Boris and Barbu, Andrei},
    booktitle={International Conference on Robotics and Automation (ICRA)},
    pages={7395--7401},
    year={2022},
    organization={IEEE}
   }
Netanyahu, Shu, Katz, Barbu & Tenenbaum (2020)
, , , & (). PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception. The AAAI Conference on Artificial Intelligence.

@inproceedings{netanyahu2021phase,
    title={{PHASE}: PHysically-grounded Abstract Social Events for Machine Social Perception},
    author={Netanyahu, Aviv and Shu, Tianmin and Katz, Boris and Barbu, Andrei and Tenenbaum, Joshua B},
    booktitle={The AAAI Conference on Artificial Intelligence},
    year={2020}
   }
Kuo, Barbu & Katz (2018)
, & (). Deep sequential models for sampling-based planning. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 6490–6497.

@inproceedings{kuo2018deep,
    title={Deep sequential models for sampling-based planning},
    author={Kuo, Yen-Ling and Barbu, Andrei and Katz, Boris},
    booktitle={2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    pages={6490--6497},
    year={2018},
    organization={IEEE}
   }
Kuo, Katz & Barbu (2020)
, & (). Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas. International Conference on Intelligent Robots and Systems (IROS).

@inproceedings{kuo2020encoding,
    title={Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas},
    author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
    booktitle={International Conference on Intelligent Robots and Systems (IROS)},
    year={2020}
   }
Kuo, Katz & Barbu (2021)
, & (). Compositional Networks Enable Systematic Generalization for Grounded Language Understanding. Findings of the Association for Computational Linguistics: EMNLP 2021.

@inproceedings{kuo2021compositional,
    title={Compositional Networks Enable Systematic Generalization for Grounded Language Understanding},
    author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
    booktitle={Findings of the Association for Computational Linguistics: EMNLP 2021},
    year={2021}
   }
Kuo, Barbu & Katz (2018)
, & (). Deep sequential models for sampling-based planning. International Conference on Intelligent Robots and Systems (IROS).

@inproceedings{kuo2018deep,
    title={Deep sequential models for sampling-based planning},
    author={Kuo, Yen-Ling and Barbu, Andrei and Katz, Boris},
    booktitle={International Conference on Intelligent Robots and Systems (IROS)},
    year={2018}
   }
Yu, Siddharth, Barbu & Siskind (2015)
, , & (). A compositional framework for grounding language inference, generation, and acquisition in video. Journal of Artificial Intelligence Research (JAIR).

@article{yu2015compositional,
    title={A compositional framework for grounding language inference, generation, and acquisition in video},
    author={Yu, Haonan and Siddharth, N and Barbu, Andrei and Siskind, Jeffrey Mark},
    journal={Journal of Artificial Intelligence Research (JAIR)},
    year={2015}
   }
Barbu, Narayanaswamy & Siskind (2010)
, & (). Learning physically-instantiated game play through visual observation. 2010 IEEE International Conference on Robotics and Automation. 1879–1886.

@inproceedings{barbu2010learning,
    title={Learning physically-instantiated game play through visual observation},
    author={Barbu, Andrei and Narayanaswamy, Siddharth and Siskind, Jeffrey Mark},
    booktitle={2010 IEEE International Conference on Robotics and Automation},
    pages={1879--1886},
    year={2010},
    organization={IEEE}
   }
Barbu, Siddharth, Michaux & Siskind (2012)
, , & (). Simultaneous Object Detection, Tracking, and Event Recognition. Advances in Cognitive Systems, 2. 203–220.

@article{barbu2012simultaneous,
    title={Simultaneous Object Detection, Tracking, and Event Recognition},
    author={Barbu, Andrei and Siddharth, N and Michaux, Aaron and Siskind, Jeffrey Mark},
    journal={Advances in Cognitive Systems},
    volume={2},
    pages={203--220},
    year={2012},
    publisher={Cognitive Systems Foundation}
   }
Ross, Katz & Barbu (2021)
, & (). Measuring Social Biases in Grounded Vision and Language Embeddings. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 998–1008. https://aclanthology.org/2021.naacl-main.78.pdf

@inproceedings{ross2021measuring,
    title={Measuring Social Biases in Grounded Vision and Language Embeddings},
    author={Ross, Candace and Katz, Boris and Barbu, Andrei},
    booktitle={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
    pages={998--1008},
    year={2021},
    URL={https://aclanthology.org/2021.naacl-main.78.pdf}
   }
Barbu, Banda & Katz (2020)
, & (). Deep video-to-video transformations for accessibility with an application to photosensitivity. Pattern Recognition Letters, 137. 99–107.

@article{barbu2020deep,
    title={Deep video-to-video transformations for accessibility with an application to photosensitivity},
    author={Barbu, Andrei and Banda, Dalitso and Katz, Boris},
    journal={Pattern Recognition Letters},
    volume={137},
    pages={99--107},
    year={2020},
    publisher={Elsevier}
   }
Barbu, Banda & Katz (2022)
, & (). Computer method and apparatus making screens safe for those with photosensitivity. Google Patents.

@misc{barbu2022computer,
    title={Computer method and apparatus making screens safe for those with photosensitivity},
    author={Barbu, Andrei and Banda, Dalitso and Katz, Boris},
    year={2022},
    publisher={Google Patents},
    note={US Patent 11,381,715}
   }
Kuo, Katz & Barbu (2020)
, & (). Deep compositional robotic planners that follow natural language commands. 2020 IEEE International Conference on Robotics and Automation (ICRA). 4906–4912.

@inproceedings{kuo2020deep,
    title={Deep compositional robotic planners that follow natural language commands},
    author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
    booktitle={2020 IEEE International Conference on Robotics and Automation (ICRA)},
    pages={4906--4912},
    year={2020},
    organization={IEEE}
   }
Kuo, Katz & Barbu (2021)
, & (). Compositional networks enable systematic generalization for grounded language understanding. Empirical Methods in Natural Language Processing (EMNLP).

@article{kuo2021generalization,
    title={Compositional networks enable systematic generalization for grounded language understanding},
    author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
    journal={Empirical Methods in Natural Language Processing (EMNLP)},
    year={2021}
   }
Kuo, Katz & Barbu (2021)
, & (). Compositional RL Agents That Follow Language Commands in Temporal Logic. Frontiers in Robotics and AI, 8.

@article{kuo2021compositional,
    title={Compositional RL Agents That Follow Language Commands in Temporal Logic},
    author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
    journal={Frontiers in Robotics and AI},
    volume={8},
    year={2021},
    publisher={Frontiers Media SA}
   }
Kuo, Huang, Barbu, McGill, Katz, Leonard & Rosman (2022)
, , , , , & (). Trajectory prediction with linguistic representations. 2022 International Conference on Robotics and Automation (ICRA). 2868–2875.

@inproceedings{kuo2022trajectory,
    title={Trajectory prediction with linguistic representations},
    author={Kuo, Yen-Ling and Huang, Xin and Barbu, Andrei and McGill, Stephen G and Katz, Boris and Leonard, John J and Rosman, Guy},
    booktitle={2022 International Conference on Robotics and Automation (ICRA)},
    pages={2868--2875},
    year={2022},
    organization={IEEE}
   }
Wang, Ross, Kuo, Katz & Barbu (2021)
, , , & (). Learning a natural-language to LTL executable semantic parser for grounded robotics. Conference on Robot Learning. 1706–1718.

@inproceedings{wang2021learning,
    title={Learning a natural-language to LTL executable semantic parser for grounded robotics},
    author={Wang, Christopher and Ross, Candace and Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
    booktitle={Conference on Robot Learning},
    pages={1706--1718},
    year={2021},
    organization={PMLR}
   }
Tejwani, Kuo, Shu, Katz & Barbu (2022)
, , , & (). Social interactions as recursive MDPs. Conference on Robot Learning. 949–958.

@inproceedings{tejwani2022social,
    title={Social interactions as recursive {MDP}s},
    author={Tejwani, Ravi and Kuo, Yen-Ling and Shu, Tianmin and Katz, Boris and Barbu, Andrei},
    booktitle={Conference on Robot Learning},
    pages={949--958},
    year={2022},
    organization={PMLR}
   }
Barbu, Mayo, Alverio, Luo, Wang, Gutfreund, Tenenbaum & Katz (2019)
, , , , , , & (). Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. Advances in neural information processing systems (NeurIPS), 32. objectnet.dev

@article{barbu2019objectnet,
    title={Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models},
    author={Barbu, Andrei and Mayo, David and Alverio, Julian and Luo, William and Wang, Christopher and Gutfreund, Dan and Tenenbaum, Josh and Katz, Boris},
    journal={Advances in neural information processing systems (NeurIPS)},
    volume={32},
    year={2019},
    URL={objectnet.dev}
   }
Kuo, Huang, Barbu, McGill, Katz, Leonard & Rosman (2022)
, , , , , & (). Trajectory prediction with linguistic representations. International Conference on Robotics and Automation (ICRA). 2868–2875.

@inproceedings{kuo2022trajectory,
    title={Trajectory prediction with linguistic representations},
    author={Kuo, Yen-Ling and Huang, Xin and Barbu, Andrei and McGill, Stephen G and Katz, Boris and Leonard, John J and Rosman, Guy},
    booktitle={International Conference on Robotics and Automation (ICRA)},
    pages={2868--2875},
    year={2022},
    organization={IEEE}
   }
Cheng, Kuo, Correa, Katz, Cases & Barbu (2022)
, , , , & (). Quantifying the Emergence of Symbolic Communication. Proceedings of the Annual Meeting of the Cognitive Science Society, 44(44).

@inproceedings{cheng2022quantifying,
    title={Quantifying the Emergence of Symbolic Communication},
    author={Cheng, Emily and Kuo, Yen-Ling and Correa, Josefina and Katz, Boris and Cases, Ignacio and Barbu, Andrei},
    booktitle={Proceedings of the Annual Meeting of the Cognitive Science Society},
    volume={44},
    number={44},
    year={2022}
   }
Conwell, Mayo, Barbu, Buice, Alvarez & Katz (2021)
, , , , & (). Neural regression, representational similarity, model zoology & neural taskonomy at scale in rodent visual cortex. Advances in Neural Information Processing Systems, 34. 5590–5607.

@article{conwell2021neural,
    title={Neural regression, representational similarity, model zoology & neural taskonomy at scale in rodent visual cortex},
    author={Conwell, Colin and Mayo, David and Barbu, Andrei and Buice, Michael and Alvarez, George and Katz, Boris},
    journal={Advances in Neural Information Processing Systems},
    volume={34},
    pages={5590--5607},
    year={2021}
   }
Wang, Subramaniam, Yaari, Kreiman, Katz, Cases & Barbu (2023)
, , , , , & (). BrainBERT: Self-supervised representation learning for intracranial recordings. The Eleventh International Conference on Learning Representations (ICLR).

@inproceedings{wang2023brainbert,
    title={BrainBERT: Self-supervised representation learning for intracranial recordings},
    author={Wang, Christopher and Subramaniam, Vighnesh and Yaari, Adam Uri and Kreiman, Gabriel and Katz, Boris and Cases, Ignacio and Barbu, Andrei},
    booktitle={The Eleventh International Conference on Learning Representations (ICLR)},
    year={2023}
   }
Subramaniam, Conwell, Wang, Kreiman, Katz, Cases & Barbu (2023)
, , , , , & (). Workshop Submission: Using Multimodal DNNs to Study Vision-Language Integration in the Brain. ICLR 2023 Workshop on Multimodal Representation Learning: Perks and Pitfalls.

@inproceedings{subramaniam2023workshop,
    title={Workshop Submission: Using Multimodal DNNs to Study Vision-Language Integration in the Brain},
    author={Subramaniam, Vighnesh and Conwell, Colin and Wang, Christopher and Kreiman, Gabriel and Katz, Boris and Cases, Ignacio and Barbu, Andrei},
    booktitle={ICLR 2023 Workshop on Multimodal Representation Learning: Perks and Pitfalls},
    year={2023}
   }
Mayo, Cummings, Lin, Gutfreund, Katz & Barbu (2023)
, , , , & (). How hard are computer vision datasets? Calibrating dataset difficulty to viewing time.

@article{mayo2023hard,
    title={How hard are computer vision datasets? Calibrating dataset difficulty to viewing time},
    author={Mayo, David and Cummings, Jesse and Lin, Xinyu and Gutfreund, Dan and Katz, Boris and Barbu, Andrei},
    year={2023}
   }
Tejwani, Kuo, Shu, Stankovits, Gutfreund, Tenenbaum, Katz & Barbu (2023)
, , , , , , & (). Zero-shot linear combinations of grounded social interactions with Linear Social MDPs. AAAI Conference on Artificial Intelligence.

@inproceedings{tejwani2023zero,
    title={Zero-shot linear combinations of grounded social interactions with Linear Social MDPs},
    author={Tejwani, Ravi and Kuo, Yen-Ling and Shu, Tianmin and Stankovits, Bennett and Gutfreund, Dan and Tenenbaum, Joshua B and Katz, Boris and Barbu, Andrei},
    booktitle={AAAI Conference on Artificial Intelligence},
    year={2023}
   }
Berzak, Barbu, Harari, Katz & Ullman (2015)
, , , & (). Do You See What I Mean? Visual Resolution of Linguistic Ambiguities. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1477–1487.

@inproceedings{berzak2015you,
    title={Do You See What I Mean? Visual Resolution of Linguistic Ambiguities},
    author={Berzak, Yevgeni and Barbu, Andrei and Harari, Daniel and Katz, Boris and Ullman, Shimon},
    booktitle={Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
    pages={1477--1487},
    year={2015}
   }
Barrett, Barbu, Siddharth & Siskind (2015)
, , & (). Saying what you`re looking for: Linguistics meets video search. IEEE transactions on pattern analysis and machine intelligence, 38(10). 2069–2081.

@article{barrett2015saying,
    title={Saying what you`re looking for: Linguistics meets video search},
    author={Barrett, Daniel Paul and Barbu, Andrei and Siddharth, N and Siskind, Jeffrey Mark},
    journal={IEEE transactions on pattern analysis and machine intelligence},
    volume={38},
    number={10},
    pages={2069--2081},
    year={2015},
    publisher={IEEE}
   }
Narayanaswamy, Barbu & Siskind (2011)
, & (). A visual language model for estimating object pose and structure in a generative visual domain. 2011 IEEE International Conference on Robotics and Automation. 4854–4860.

@inproceedings{narayanaswamy2011visual,
    title={A visual language model for estimating object pose and structure in a generative visual domain},
    author={Narayanaswamy, Siddharth and Barbu, Andrei and Siskind, Jeffrey Mark},
    booktitle={2011 IEEE International Conference on Robotics and Automation},
    pages={4854--4860},
    year={2011},
    organization={IEEE}
   }
Siddharth, Barbu & Siskind (2012)
, & (). Seeing Unseeability to See the Unseeable. Advances in Cognitive Systems, 2. 77–94.

@article{siddharth2012seeing,
    title={Seeing Unseeability to See the Unseeable},
    author={Siddharth, N and Barbu, Andrei and Siskind, Jeffrey Mark},
    journal={Advances in Cognitive Systems},
    volume={2},
    pages={77--94},
    year={2012},
    publisher={Cognitive Systems Foundation}
   }
Ross, Barbu, Berzak, Myanganbayar & Katz (2018)
, , , & (). Grounding language acquisition by training semantic parsers using captioned videos. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2647–2656.

@inproceedings{ross2018grounding,
    title={Grounding language acquisition by training semantic parsers using captioned videos},
    author={Ross, Candace and Barbu, Andrei and Berzak, Yevgeni and Myanganbayar, Battushig and Katz, Boris},
    booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
    pages={2647--2656},
    year={2018}
   }
Paul, Barbu, Felshin, Katz & Roy (2017)
, , , & (). Temporal grounding graphs for language understanding with accrued visual-linguistic context. Proceedings of the 26th International Joint Conference on Artificial Intelligence. 4506–4514.

@inproceedings{paul2017temporal,
    title={Temporal grounding graphs for language understanding with accrued visual-linguistic context},
    author={Paul, Rohan and Barbu, Andrei and Felshin, Sue and Katz, Boris and Roy, Nicholas},
    booktitle={Proceedings of the 26th International Joint Conference on Artificial Intelligence},
    pages={4506--4514},
    year={2017}
   }