0xAB
Andrei Barbu
Andrei is a research scientist at MIT working on natural language processing, computer vision, and robotics, with a touch of neuroscience.
I’m most interested in finding new AI and ML problems and in refining our understanding of existing problems that might look solved but are rarely so. Often this means formalizing pre-theoretic problems that we only vaguely understand, for example, creating benchmarks for social interactions so that we can make progress on social robotics. Of course, this also leads to exciting new algorithms; I tend to focus on zero-shot learning through compositionality. Other times, it means reevaluating our progress, for example, by creating a new object recognition benchmark show that there is a wide gulf between humans and machines. My main forte as a researcher is a wide cross-disciplinary background: I’ve worked on natural language processing, computer vision, robotics, machine learning, artificial intelligence, conference.
Below are a few of the topics I’ve worked on. Much more is available from my google scholar page.
The measurement problem in AI/ML #
We have no idea how well we are progressing toward human-level ML/AI. We don’t even have a quantitative way to ask this question. This is the source for the hype cycle in AI and a lot of uncertainty about what ML can and cannot do.
For computer vision tasks we can now measure objectively how difficult an image
is for a human to recognize
(
Citation: Mayo, Cummings
& al., 2023
Mayo,
D.,
Cummings,
J.,
Lin,
X.,
Gutfreund,
D.,
Katz,
B. & Barbu,
A.
(2023).
How hard are computer vision datasets? Calibrating dataset difficulty to viewing time.
@article{mayo2023hard,
title={How hard are computer vision datasets? Calibrating dataset difficulty to viewing time},
author={Mayo, David and Cummings, Jesse and Lin, Xinyu and Gutfreund, Dan and Katz, Boris and Barbu, Andrei},
year={2023}
}
)
; we call this measure
MVT (minimum viewing time required to recognize the image). We discovered that
current datasets are too simple and mostly lack images that humans find hard to
recognize. Image difficulty has many potential applications, from discovering
what the brain is doing for hard images, to creating better datasets, to helping
industry understand performance before deployment. See more on
objectnet.dev
Image captioning looks like a mostly solved problem from the point of view of current metrics. Machines generally outperform humans. Our work shows that this isn’t so, humans strongly prefer human captions and think they are of far higher quality. We put forward a new metric, HUMANr, that solves these problems and enables future research on captioning. This paper is coming soon!
Object recognition datasets often allow machines to cheat in ways that humans do
not. Objects have correlations with the background and they’re often posed to
look nice for online photos. This helps machines tremendously and overstates
their performance. We developed ObjecNet, a new benchmark, to highlight his
problem and it revealed a massive human-machine performance gap in object
recognition, far larger than anyone expected. See objectnet.dev
for more.
(
Citation: Barbu, Mayo
& al., 2019
Barbu,
A.,
Mayo,
D.,
Alverio,
J.,
Luo,
W.,
Wang,
C.,
Gutfreund,
D.,
Tenenbaum,
J. & Katz,
B.
(2019).
Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models.
Advances in neural information processing systems (NeurIPS), 32.
objectnet.dev
@article{barbu2019objectnet,
title={Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models},
author={Barbu, Andrei and Mayo, David and Alverio, Julian and Luo, William and Wang, Christopher and Gutfreund, Dan and Tenenbaum, Josh and Katz, Boris},
journal={Advances in neural information processing systems (NeurIPS)},
volume={32},
year={2019},
URL={objectnet.dev}
}
)
Social interactions #
Throughout machine learning social interactions are largely overlooked. We don’t
know what social actions are, how to interpret them, and how robots can engage
socially. We took a first step by creating the first social simulator
(
Citation: Netanyahu, Shu
& al., 2020
Netanyahu,
A.,
Shu,
T.,
Katz,
B.,
Barbu,
A. & Tenenbaum,
J.
(2020).
PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception.
The AAAI Conference on Artificial Intelligence.
@inproceedings{netanyahu2021phase,
title={{PHASE}: PHysically-grounded Abstract Social Events for Machine Social Perception},
author={Netanyahu, Aviv and Shu, Tianmin and Katz, Boris and Barbu, Andrei and Tenenbaum, Joshua B},
booktitle={The AAAI Conference on Artificial Intelligence},
year={2020}
}
)
.
We then created a new class of models, Social MDPs, that perform zero-shot
social interactions
(
Citation: Tejwani, Kuo
& al., 2022
Tejwani,
R.,
Kuo,
Y.,
Shu,
T.,
Katz,
B. & Barbu,
A.
(2022).
Social interactions as recursive MDPs.
Conference on Robot Learning. 949–958.
@inproceedings{tejwani2022social,
title={Social interactions as recursive {MDP}s},
author={Tejwani, Ravi and Kuo, Yen-Ling and Shu, Tianmin and Katz, Boris and Barbu, Andrei},
booktitle={Conference on Robot Learning},
pages={949--958},
year={2022},
organization={PMLR}
}
)
. Psychologists had long
proposed that social interactions might have some core skills which are then
reused. We formalized several of these hypothesized core skills to enable robots
to interact socially
(
Citation: Tejwani, Kuo
& al., 2022
Tejwani,
R.,
Kuo,
Y.,
Shu,
T.,
Stankovits,
B.,
Gutfreund,
D.,
Tenenbaum,
J.,
Katz,
B. & Barbu,
A.
(2022).
Incorporating rich social interactions into MDPs.
International Conference on Robotics and Automation (ICRA). 7395–7401.
@inproceedings{tejwani2022incorporating,
title={Incorporating rich social interactions into {MDPs}},
author={Tejwani, Ravi and Kuo, Yen-Ling and Shu, Tianmin and Stankovits, Bennett and Gutfreund, Dan and Tenenbaum, Joshua B and Katz, Boris and Barbu, Andrei},
booktitle={International Conference on Robotics and Automation (ICRA)},
pages={7395--7401},
year={2022},
organization={IEEE}
}
)
. Psychologists had
also proposed that these core social interactions might combine together to
express more complex behaviors. We created the first computational mechanism for
combining social interactions, showing that a rich algebra of interactions
exists
(
Citation: Tejwani, Kuo
& al., 2023
Tejwani,
R.,
Kuo,
Y.,
Shu,
T.,
Stankovits,
B.,
Gutfreund,
D.,
Tenenbaum,
J.,
Katz,
B. & Barbu,
A.
(2023).
Zero-shot linear combinations of grounded social interactions with Linear Social MDPs.
AAAI Conference on Artificial Intelligence.
@inproceedings{tejwani2023zero,
title={Zero-shot linear combinations of grounded social interactions with Linear Social MDPs},
author={Tejwani, Ravi and Kuo, Yen-Ling and Shu, Tianmin and Stankovits, Bennett and Gutfreund, Dan and Tenenbaum, Joshua B and Katz, Boris and Barbu, Andrei},
booktitle={AAAI Conference on Artificial Intelligence},
year={2023}
}
)
.
Social interactions are key to learning, and they may also be key to the
development of language and intelligence. While many have investigated how the
simplest words, like nouns and verbs, might arise when people attempt to
communicate, we investigated how humans develop complex systems of signs and
symbols. Subjects were forced to communicate complex statements in FOL
physically, by controlling a car. They spontaneously developed rich symbol
systems, although these were often meta-stable and could easily collapse. This
could shed light on what happened for the hundreds of thousands of years that
humans were anatomically modern but not yet technologically advanced
(
Citation: Cheng, Kuo
& al., 2022
Cheng,
E.,
Kuo,
Y.,
Correa,
J.,
Katz,
B.,
Cases,
I. & Barbu,
A.
(2022).
Quantifying the Emergence of Symbolic Communication.
Proceedings of the Annual Meeting of the Cognitive Science Society, 44(44).
@inproceedings{cheng2022quantifying,
title={Quantifying the Emergence of Symbolic Communication},
author={Cheng, Emily and Kuo, Yen-Ling and Correa, Josefina and Katz, Boris and Cases, Ignacio and Barbu, Andrei},
booktitle={Proceedings of the Annual Meeting of the Cognitive Science Society},
volume={44},
number={44},
year={2022}
}
)
.
Compositional ML #
Compositionality is at the core of much of the work that I do and pervades many of the other topics.
Current approaches to ML are not compositional. This results in models that fail spectacularly on inputs which humans would judge to be trivial. We investigated this problem from many angles and created new tasks and models along the way.
Robots often need to follow complex commands, but, since models are not
compositional they fail in surprising ways. One way to address this problem is
to parse commands into a formalism like LTL where commands can be understood
more easily. But, such parsers are hard to train, there is little data
available, and human commands tend to be terse and context-specific. We created
a new kind of LTL parser that uses a simulator in the loop to learn to convert
sentences into LTL formulas
(
Citation: Wang, Ross
& al., 2021
Wang,
C.,
Ross,
C.,
Kuo,
Y.,
Katz,
B. & Barbu,
A.
(2021).
Learning a natural-language to LTL executable semantic parser for grounded robotics.
Conference on Robot Learning. 1706–1718.
@inproceedings{wang2021learning,
title={Learning a natural-language to LTL executable semantic parser for grounded robotics},
author={Wang, Christopher and Ross, Candace and Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
booktitle={Conference on Robot Learning},
pages={1706--1718},
year={2021},
organization={PMLR}
}
)
. This was based on
our earlier approach that attempted to learn syntax and semantics from vision in
a more child-like way rather than using language-only massive datasets
(
Citation: Ross, Barbu
& al., 2018
Ross,
C.,
Barbu,
A.,
Berzak,
Y.,
Myanganbayar,
B. & Katz,
B.
(2018).
Grounding language acquisition by training semantic parsers using captioned videos.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2647–2656.
@inproceedings{ross2018grounding,
title={Grounding language acquisition by training semantic parsers using captioned videos},
author={Ross, Candace and Barbu, Andrei and Berzak, Yevgeni and Myanganbayar, Battushig and Katz, Boris},
booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
pages={2647--2656},
year={2018}
}
)
.
Even when a robot can understand a command and relate it to its environment, an
extremely challenging problem on its own
(
Citation: Paul, Barbu
& al., 2017
Paul,
R.,
Barbu,
A.,
Felshin,
S.,
Katz,
B. & Roy,
N.
(2017).
Temporal grounding graphs for language understanding with accrued visual-linguistic context.
Proceedings of the 26th International Joint Conference on Artificial Intelligence. 4506–4514.
@inproceedings{paul2017temporal,
title={Temporal grounding graphs for language understanding with accrued visual-linguistic context},
author={Paul, Rohan and Barbu, Andrei and Felshin, Sue and Katz, Boris and Roy, Nicholas},
booktitle={Proceedings of the 26th International Joint Conference on Artificial Intelligence},
pages={4506--4514},
year={2017}
}
)
,
executing that command is still very difficult. We don’t know what makes human
and animal planning so effortless while to machines planning appears to be a
nearly insurmountably difficult challenge even in the simplest domains. To move
in the direction of better planners we combined two methods, classical planning
with RRT, and deep learning to both find new plans efficiently and to exploit
prior knowledge
(
Citation: Kuo, Barbu
& al., 2018
Kuo,
Y.,
Barbu,
A. & Katz,
B.
(2018).
Deep sequential models for sampling-based planning.
2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 6490–6497.
@inproceedings{kuo2018deep,
title={Deep sequential models for sampling-based planning},
author={Kuo, Yen-Ling and Barbu, Andrei and Katz, Boris},
booktitle={2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
pages={6490--6497},
year={2018},
organization={IEEE}
}
Citation: Kuo, Barbu
& al., 2018
Kuo,
Y.,
Barbu,
A. & Katz,
B.
(2018).
Deep sequential models for sampling-based planning.
International Conference on Intelligent Robots and Systems (IROS).
@inproceedings{kuo2018deep,
title={Deep sequential models for sampling-based planning},
author={Kuo, Yen-Ling and Barbu, Andrei and Katz, Boris},
booktitle={International Conference on Intelligent Robots and Systems (IROS)},
year={2018}
}
)
. We then composed many of these
hybrid planners together to execute plans described in natural language
(
Citation: Kuo, Katz
& al., 2020
Kuo,
Y.,
Katz,
B. & Barbu,
A.
(2020).
Deep compositional robotic planners that follow natural language commands.
2020 IEEE International Conference on Robotics and Automation (ICRA). 4906–4912.
@inproceedings{kuo2020deep,
title={Deep compositional robotic planners that follow natural language commands},
author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
booktitle={2020 IEEE International Conference on Robotics and Automation (ICRA)},
pages={4906--4912},
year={2020},
organization={IEEE}
}
)
and then showed how this method can also execute plans
in LTL zero-shot
(
Citation: Kuo, Katz
& al., 2020
Kuo,
Y.,
Katz,
B. & Barbu,
A.
(2020).
Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas.
International Conference on Intelligent Robots and Systems (IROS).
@inproceedings{kuo2020encoding,
title={Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas},
author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
booktitle={International Conference on Intelligent Robots and Systems (IROS)},
year={2020}
}
)
. These methods perform
exceedingly well on benchmarks for compositional reasoning
(
Citation: Kuo, Katz
& al., 2021
Kuo,
Y.,
Katz,
B. & Barbu,
A.
(2021).
Compositional Networks Enable Systematic Generalization for Grounded Language Understanding.
Findings of the Association for Computational Linguistics: EMNLP 2021.
@inproceedings{kuo2021compositional,
title={Compositional Networks Enable Systematic Generalization for Grounded Language Understanding},
author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2021},
year={2021}
}
Citation: Kuo, Katz
& al., 2021
Kuo,
Y.,
Katz,
B. & Barbu,
A.
(2021).
Compositional RL Agents That Follow Language Commands in Temporal Logic.
Frontiers in Robotics and AI, 8.
@article{kuo2021compositional,
title={Compositional RL Agents That Follow Language Commands in Temporal Logic},
author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
journal={Frontiers in Robotics and AI},
volume={8},
year={2021},
publisher={Frontiers Media SA}
}
)
.
In the autonomous driving domain, we developed methods which incorporate
language into how robots reason about the actions of others
(
Citation: Kuo, Huang
& al., 2022
Kuo,
Y.,
Huang,
X.,
Barbu,
A.,
McGill,
S.,
Katz,
B.,
Leonard,
J. & Rosman,
G.
(2022).
Trajectory prediction with linguistic representations.
2022 International Conference on Robotics and Automation (ICRA). 2868–2875.
@inproceedings{kuo2022trajectory,
title={Trajectory prediction with linguistic representations},
author={Kuo, Yen-Ling and Huang, Xin and Barbu, Andrei and McGill, Stephen G and Katz, Boris and Leonard, John J and Rosman, Guy},
booktitle={2022 International Conference on Robotics and Automation (ICRA)},
pages={2868--2875},
year={2022},
organization={IEEE}
}
Citation: Kuo, Huang
& al., 2022
Kuo,
Y.,
Huang,
X.,
Barbu,
A.,
McGill,
S.,
Katz,
B.,
Leonard,
J. & Rosman,
G.
(2022).
Trajectory prediction with linguistic representations.
International Conference on Robotics and Automation (ICRA). 2868–2875.
@inproceedings{kuo2022trajectory,
title={Trajectory prediction with linguistic representations},
author={Kuo, Yen-Ling and Huang, Xin and Barbu, Andrei and McGill, Stephen G and Katz, Boris and Leonard, John J and Rosman, Guy},
booktitle={International Conference on Robotics and Automation (ICRA)},
pages={2868--2875},
year={2022},
organization={IEEE}
}
)
.
The neuroscience of language and vision #
The BrainTreebank is coming soon! It will be the largest neuroscience corpus that is paired with vision and language with every sentence parsed in UD.
To help with decoding experiments that have few training examples we developed
BrainBERT. A way to pretrain a large Transformer on SEEG neural recordings. It
achieves high classification performance on new tasks with new patients using
only a fraction of the training data.
(
Citation: Wang, Subramaniam
& al., 2023
Wang,
C.,
Subramaniam,
V.,
Yaari,
A.,
Kreiman,
G.,
Katz,
B.,
Cases,
I. & Barbu,
A.
(2023).
BrainBERT: Self-supervised representation learning for intracranial recordings.
The Eleventh International Conference on Learning Representations (ICLR).
@inproceedings{wang2023brainbert,
title={BrainBERT: Self-supervised representation learning for intracranial recordings},
author={Wang, Christopher and Subramaniam, Vighnesh and Yaari, Adam Uri and Kreiman, Gabriel and Katz, Boris and Cases, Ignacio and Barbu, Andrei},
booktitle={The Eleventh International Conference on Learning Representations (ICLR)},
year={2023}
}
)
Which networks architectures resemble visual areas of the mouse brain the most?
And is the mouse visual cortex also arranged hierarchically like our own? We
found that yes, mouse visual cortex is arranged hierarchically, and that
Transformers are a good fit, as long as you measure them correctly
(
Citation: Conwell, Mayo
& al., 2021
Conwell,
C.,
Mayo,
D.,
Barbu,
A.,
Buice,
M.,
Alvarez,
G. & Katz,
B.
(2021).
Neural regression, representational similarity, model zoology & neural taskonomy at scale in rodent visual cortex.
Advances in Neural Information Processing Systems, 34. 5590–5607.
@article{conwell2021neural,
title={Neural regression, representational similarity, model zoology & neural taskonomy at scale in rodent visual cortex},
author={Conwell, Colin and Mayo, David and Barbu, Andrei and Buice, Michael and Alvarez, George and Katz, Boris},
journal={Advances in Neural Information Processing Systems},
volume={34},
pages={5590--5607},
year={2021}
}
)
.
How does information from language and vision fuse together in the brain? And
what we can we learn about multimodal Transformers from their relationship to
the brain? We compared multimodal models to neural recordings for the first time
and found several sites of multimodal integration
(
Citation: Subramaniam, Conwell
& al., 2023
Subramaniam,
V.,
Conwell,
C.,
Wang,
C.,
Kreiman,
G.,
Katz,
B.,
Cases,
I. & Barbu,
A.
(2023).
Workshop Submission: Using Multimodal DNNs to Study Vision-Language Integration in the Brain.
ICLR 2023 Workshop on Multimodal Representation Learning: Perks and Pitfalls.
@inproceedings{subramaniam2023workshop,
title={Workshop Submission: Using Multimodal DNNs to Study Vision-Language Integration in the Brain},
author={Subramaniam, Vighnesh and Conwell, Colin and Wang, Christopher and Kreiman, Gabriel and Katz, Boris and Cases, Ignacio and Barbu, Andrei},
booktitle={ICLR 2023 Workshop on Multimodal Representation Learning: Perks and Pitfalls},
year={2023}
}
)
. This required new techniques to avoid statistical artifacts.
Equity, justice, and accessibility #
How biased are multimodal models, and how do we generalize the notion of bias to
any modality? You might hope that combining vision and language would lead to
fewer biases, but sadly, the result is the opposite.
(
Citation: Ross, Katz
& al., 2021
Ross,
C.,
Katz,
B. & Barbu,
A.
(2021).
Measuring Social Biases in Grounded Vision and Language Embeddings.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 998–1008.
https://aclanthology.org/2021.naacl-main.78.pdf
@inproceedings{ross2021measuring,
title={Measuring Social Biases in Grounded Vision and Language Embeddings},
author={Ross, Candace and Katz, Boris and Barbu, Andrei},
booktitle={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
pages={998--1008},
year={2021},
URL={https://aclanthology.org/2021.naacl-main.78.pdf}
}
)
How do we help people with different disabilities? Historically, the focus has
often been on replacing senses, for example, replacing limited vision with a
system that tells you what objects are in front of view. We argue the opposite,
that we should instead enhance current senses. For example, by automatically
highlighting and augmenting what is important or by hiding harmful input such as
flashing for people with photosensitivity. We automatically derive visual
filters that help people with different disabilities, such as photosensitivity.
(
Citation: Barbu, Banda
& al., 2020
Barbu,
A.,
Banda,
D. & Katz,
B.
(2020).
Deep video-to-video transformations for accessibility with an application to photosensitivity.
Pattern Recognition Letters, 137. 99–107.
@article{barbu2020deep,
title={Deep video-to-video transformations for accessibility with an application to photosensitivity},
author={Barbu, Andrei and Banda, Dalitso and Katz, Boris},
journal={Pattern Recognition Letters},
volume={137},
pages={99--107},
year={2020},
publisher={Elsevier}
}
;
Citation: Barbu, Banda
& al., 2022
Barbu,
A.,
Banda,
D. & Katz,
B.
(2022).
Computer method and apparatus making screens safe for those with photosensitivity.
Google Patents.
@misc{barbu2022computer,
title={Computer method and apparatus making screens safe for those with photosensitivity},
author={Barbu, Andrei and Banda, Dalitso and Katz, Boris},
year={2022},
publisher={Google Patents},
note={US Patent 11,381,715}
}
)
Multimodal understanding and reasoning with language #
Video search is becoming more advanced, but today it is largely based on
keywords and captions. We developed the first approach to search videos with
sentences based on the content of the video itself
(
Citation: Barrett, Barbu
& al., 2015
Barrett,
D.,
Barbu,
A.,
Siddharth,
N. & Siskind,
J.
(2015).
Saying what you`re looking for: Linguistics meets video search.
IEEE transactions on pattern analysis and machine intelligence, 38(10). 2069–2081.
@article{barrett2015saying,
title={Saying what you`re looking for: Linguistics meets video search},
author={Barrett, Daniel Paul and Barbu, Andrei and Siddharth, N and Siskind, Jeffrey Mark},
journal={IEEE transactions on pattern analysis and machine intelligence},
volume={38},
number={10},
pages={2069--2081},
year={2015},
publisher={IEEE}
}
)
.
Ambiguity is still something that confuses ML systems. We created the first
benchmark of ambiguous language-vision scenarios, captions paired with videos
where the caption’s interpretation and grounding (who did what to who) changes
based on the video. And we created the first model which performs this
disambiguation task
(
Citation: Berzak, Barbu
& al., 2015
Berzak,
Y.,
Barbu,
A.,
Harari,
D.,
Katz,
B. & Ullman,
S.
(2015).
Do You See What I Mean? Visual Resolution of Linguistic Ambiguities.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1477–1487.
@inproceedings{berzak2015you,
title={Do You See What I Mean? Visual Resolution of Linguistic Ambiguities},
author={Berzak, Yevgeni and Barbu, Andrei and Harari, Daniel and Katz, Boris and Ullman, Shimon},
booktitle={Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
pages={1477--1487},
year={2015}
}
)
.
Machines can learn to play games, but to do so, they are usually told the rules
of the game and they must learn to play well, not to play correctly. They also
are given the input game symbolically, rather than as visual input. Kids on the
other hand learn to play games by looking at a board and figuring out the rules
as they go. We created the first robotic system that learns to play games the
way kids do
(
Citation: Barbu, Narayanaswamy
& al., 2010
Barbu,
A.,
Narayanaswamy,
S. & Siskind,
J.
(2010).
Learning physically-instantiated game play through visual observation.
2010 IEEE International Conference on Robotics and Automation. 1879–1886.
@inproceedings{barbu2010learning,
title={Learning physically-instantiated game play through visual observation},
author={Barbu, Andrei and Narayanaswamy, Siddharth and Siskind, Jeffrey Mark},
booktitle={2010 IEEE International Conference on Robotics and Automation},
pages={1879--1886},
year={2010},
organization={IEEE}
}
)
We still don’t understand what vision is. Sure, we have some vision-related
tasks like object recognition, segmentation, etc. But, the rich perception of
vision that we have as humans which integrates with other abilities like
physical reasoning still eludes us. We took a step in this direction by building
a vision system that could perceive complex structures and use the stability of
the structure to infer occluded pieces, as well as to execute linguistic
commands that manipulated that structure.
(
Citation: Narayanaswamy, Barbu
& al., 2011
Narayanaswamy,
S.,
Barbu,
A. & Siskind,
J.
(2011).
A visual language model for estimating object pose and structure in a generative visual domain.
2011 IEEE International Conference on Robotics and Automation. 4854–4860.
@inproceedings{narayanaswamy2011visual,
title={A visual language model for estimating object pose and structure in a generative visual domain},
author={Narayanaswamy, Siddharth and Barbu, Andrei and Siskind, Jeffrey Mark},
booktitle={2011 IEEE International Conference on Robotics and Automation},
pages={4854--4860},
year={2011},
organization={IEEE}
}
;
Citation: Siddharth, Barbu
& al., 2012
Siddharth,
N.,
Barbu,
A. & Siskind,
J.
(2012).
Seeing Unseeability to See the Unseeable.
Advances in Cognitive Systems, 2. 77–94.
@article{siddharth2012seeing,
title={Seeing Unseeability to See the Unseeable},
author={Siddharth, N and Barbu, Andrei and Siskind, Jeffrey Mark},
journal={Advances in Cognitive Systems},
volume={2},
pages={77--94},
year={2012},
publisher={Cognitive Systems Foundation}
}
)
References #
- Tejwani, Kuo, Shu, Stankovits, Gutfreund, Tenenbaum, Katz & Barbu (2022)
-
Tejwani,
R.,
Kuo,
Y.,
Shu,
T.,
Stankovits,
B.,
Gutfreund,
D.,
Tenenbaum,
J.,
Katz,
B. & Barbu,
A.
(2022).
Incorporating rich social interactions into MDPs.
International Conference on Robotics and Automation (ICRA). 7395–7401.
@inproceedings{tejwani2022incorporating,
title={Incorporating rich social interactions into {MDPs}},
author={Tejwani, Ravi and Kuo, Yen-Ling and Shu, Tianmin and Stankovits, Bennett and Gutfreund, Dan and Tenenbaum, Joshua B and Katz, Boris and Barbu, Andrei},
booktitle={International Conference on Robotics and Automation (ICRA)},
pages={7395--7401},
year={2022},
organization={IEEE}
} - Netanyahu, Shu, Katz, Barbu & Tenenbaum (2020)
-
Netanyahu,
A.,
Shu,
T.,
Katz,
B.,
Barbu,
A. & Tenenbaum,
J.
(2020).
PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception.
The AAAI Conference on Artificial Intelligence.
@inproceedings{netanyahu2021phase,
title={{PHASE}: PHysically-grounded Abstract Social Events for Machine Social Perception},
author={Netanyahu, Aviv and Shu, Tianmin and Katz, Boris and Barbu, Andrei and Tenenbaum, Joshua B},
booktitle={The AAAI Conference on Artificial Intelligence},
year={2020}
} - Kuo, Barbu & Katz (2018)
-
Kuo,
Y.,
Barbu,
A. & Katz,
B.
(2018).
Deep sequential models for sampling-based planning.
2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 6490–6497.
@inproceedings{kuo2018deep,
title={Deep sequential models for sampling-based planning},
author={Kuo, Yen-Ling and Barbu, Andrei and Katz, Boris},
booktitle={2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
pages={6490--6497},
year={2018},
organization={IEEE}
} - Kuo, Katz & Barbu (2020)
-
Kuo,
Y.,
Katz,
B. & Barbu,
A.
(2020).
Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas.
International Conference on Intelligent Robots and Systems (IROS).
@inproceedings{kuo2020encoding,
title={Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas},
author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
booktitle={International Conference on Intelligent Robots and Systems (IROS)},
year={2020}
} - Kuo, Katz & Barbu (2021)
-
Kuo,
Y.,
Katz,
B. & Barbu,
A.
(2021).
Compositional Networks Enable Systematic Generalization for Grounded Language Understanding.
Findings of the Association for Computational Linguistics: EMNLP 2021.
@inproceedings{kuo2021compositional,
title={Compositional Networks Enable Systematic Generalization for Grounded Language Understanding},
author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2021},
year={2021}
} - Kuo, Barbu & Katz (2018)
-
Kuo,
Y.,
Barbu,
A. & Katz,
B.
(2018).
Deep sequential models for sampling-based planning.
International Conference on Intelligent Robots and Systems (IROS).
@inproceedings{kuo2018deep,
title={Deep sequential models for sampling-based planning},
author={Kuo, Yen-Ling and Barbu, Andrei and Katz, Boris},
booktitle={International Conference on Intelligent Robots and Systems (IROS)},
year={2018}
} - Yu, Siddharth, Barbu & Siskind (2015)
-
Yu,
H.,
Siddharth,
N.,
Barbu,
A. & Siskind,
J.
(2015).
A compositional framework for grounding language inference, generation, and acquisition in video.
Journal of Artificial Intelligence Research (JAIR).
@article{yu2015compositional,
title={A compositional framework for grounding language inference, generation, and acquisition in video},
author={Yu, Haonan and Siddharth, N and Barbu, Andrei and Siskind, Jeffrey Mark},
journal={Journal of Artificial Intelligence Research (JAIR)},
year={2015}
} - Barbu, Narayanaswamy & Siskind (2010)
-
Barbu,
A.,
Narayanaswamy,
S. & Siskind,
J.
(2010).
Learning physically-instantiated game play through visual observation.
2010 IEEE International Conference on Robotics and Automation. 1879–1886.
@inproceedings{barbu2010learning,
title={Learning physically-instantiated game play through visual observation},
author={Barbu, Andrei and Narayanaswamy, Siddharth and Siskind, Jeffrey Mark},
booktitle={2010 IEEE International Conference on Robotics and Automation},
pages={1879--1886},
year={2010},
organization={IEEE}
} - Barbu, Siddharth, Michaux & Siskind (2012)
-
Barbu,
A.,
Siddharth,
N.,
Michaux,
A. & Siskind,
J.
(2012).
Simultaneous Object Detection, Tracking, and Event Recognition.
Advances in Cognitive Systems, 2. 203–220.
@article{barbu2012simultaneous,
title={Simultaneous Object Detection, Tracking, and Event Recognition},
author={Barbu, Andrei and Siddharth, N and Michaux, Aaron and Siskind, Jeffrey Mark},
journal={Advances in Cognitive Systems},
volume={2},
pages={203--220},
year={2012},
publisher={Cognitive Systems Foundation}
} - Ross, Katz & Barbu (2021)
-
Ross,
C.,
Katz,
B. & Barbu,
A.
(2021).
Measuring Social Biases in Grounded Vision and Language Embeddings.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 998–1008.
https://aclanthology.org/2021.naacl-main.78.pdf
@inproceedings{ross2021measuring,
title={Measuring Social Biases in Grounded Vision and Language Embeddings},
author={Ross, Candace and Katz, Boris and Barbu, Andrei},
booktitle={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
pages={998--1008},
year={2021},
URL={https://aclanthology.org/2021.naacl-main.78.pdf}
} - Barbu, Banda & Katz (2020)
-
Barbu,
A.,
Banda,
D. & Katz,
B.
(2020).
Deep video-to-video transformations for accessibility with an application to photosensitivity.
Pattern Recognition Letters, 137. 99–107.
@article{barbu2020deep,
title={Deep video-to-video transformations for accessibility with an application to photosensitivity},
author={Barbu, Andrei and Banda, Dalitso and Katz, Boris},
journal={Pattern Recognition Letters},
volume={137},
pages={99--107},
year={2020},
publisher={Elsevier}
} - Barbu, Banda & Katz (2022)
-
Barbu,
A.,
Banda,
D. & Katz,
B.
(2022).
Computer method and apparatus making screens safe for those with photosensitivity.
Google Patents.
@misc{barbu2022computer,
title={Computer method and apparatus making screens safe for those with photosensitivity},
author={Barbu, Andrei and Banda, Dalitso and Katz, Boris},
year={2022},
publisher={Google Patents},
note={US Patent 11,381,715}
} - Kuo, Katz & Barbu (2020)
-
Kuo,
Y.,
Katz,
B. & Barbu,
A.
(2020).
Deep compositional robotic planners that follow natural language commands.
2020 IEEE International Conference on Robotics and Automation (ICRA). 4906–4912.
@inproceedings{kuo2020deep,
title={Deep compositional robotic planners that follow natural language commands},
author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
booktitle={2020 IEEE International Conference on Robotics and Automation (ICRA)},
pages={4906--4912},
year={2020},
organization={IEEE}
} - Kuo, Katz & Barbu (2021)
-
Kuo,
Y.,
Katz,
B. & Barbu,
A.
(2021).
Compositional networks enable systematic generalization for grounded language understanding.
Empirical Methods in Natural Language Processing (EMNLP).
@article{kuo2021generalization,
title={Compositional networks enable systematic generalization for grounded language understanding},
author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
journal={Empirical Methods in Natural Language Processing (EMNLP)},
year={2021}
} - Kuo, Katz & Barbu (2021)
-
Kuo,
Y.,
Katz,
B. & Barbu,
A.
(2021).
Compositional RL Agents That Follow Language Commands in Temporal Logic.
Frontiers in Robotics and AI, 8.
@article{kuo2021compositional,
title={Compositional RL Agents That Follow Language Commands in Temporal Logic},
author={Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
journal={Frontiers in Robotics and AI},
volume={8},
year={2021},
publisher={Frontiers Media SA}
} - Kuo, Huang, Barbu, McGill, Katz, Leonard & Rosman (2022)
-
Kuo,
Y.,
Huang,
X.,
Barbu,
A.,
McGill,
S.,
Katz,
B.,
Leonard,
J. & Rosman,
G.
(2022).
Trajectory prediction with linguistic representations.
2022 International Conference on Robotics and Automation (ICRA). 2868–2875.
@inproceedings{kuo2022trajectory,
title={Trajectory prediction with linguistic representations},
author={Kuo, Yen-Ling and Huang, Xin and Barbu, Andrei and McGill, Stephen G and Katz, Boris and Leonard, John J and Rosman, Guy},
booktitle={2022 International Conference on Robotics and Automation (ICRA)},
pages={2868--2875},
year={2022},
organization={IEEE}
} - Wang, Ross, Kuo, Katz & Barbu (2021)
-
Wang,
C.,
Ross,
C.,
Kuo,
Y.,
Katz,
B. & Barbu,
A.
(2021).
Learning a natural-language to LTL executable semantic parser for grounded robotics.
Conference on Robot Learning. 1706–1718.
@inproceedings{wang2021learning,
title={Learning a natural-language to LTL executable semantic parser for grounded robotics},
author={Wang, Christopher and Ross, Candace and Kuo, Yen-Ling and Katz, Boris and Barbu, Andrei},
booktitle={Conference on Robot Learning},
pages={1706--1718},
year={2021},
organization={PMLR}
} - Barbu, Mayo, Alverio, Luo, Wang, Gutfreund, Tenenbaum & Katz (2019)
-
Barbu,
A.,
Mayo,
D.,
Alverio,
J.,
Luo,
W.,
Wang,
C.,
Gutfreund,
D.,
Tenenbaum,
J. & Katz,
B.
(2019).
Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models.
Advances in neural information processing systems (NeurIPS), 32.
objectnet.dev
@article{barbu2019objectnet,
title={Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models},
author={Barbu, Andrei and Mayo, David and Alverio, Julian and Luo, William and Wang, Christopher and Gutfreund, Dan and Tenenbaum, Josh and Katz, Boris},
journal={Advances in neural information processing systems (NeurIPS)},
volume={32},
year={2019},
URL={objectnet.dev}
} - Kuo, Huang, Barbu, McGill, Katz, Leonard & Rosman (2022)
-
Kuo,
Y.,
Huang,
X.,
Barbu,
A.,
McGill,
S.,
Katz,
B.,
Leonard,
J. & Rosman,
G.
(2022).
Trajectory prediction with linguistic representations.
International Conference on Robotics and Automation (ICRA). 2868–2875.
@inproceedings{kuo2022trajectory,
title={Trajectory prediction with linguistic representations},
author={Kuo, Yen-Ling and Huang, Xin and Barbu, Andrei and McGill, Stephen G and Katz, Boris and Leonard, John J and Rosman, Guy},
booktitle={International Conference on Robotics and Automation (ICRA)},
pages={2868--2875},
year={2022},
organization={IEEE}
} - Cheng, Kuo, Correa, Katz, Cases & Barbu (2022)
-
Cheng,
E.,
Kuo,
Y.,
Correa,
J.,
Katz,
B.,
Cases,
I. & Barbu,
A.
(2022).
Quantifying the Emergence of Symbolic Communication.
Proceedings of the Annual Meeting of the Cognitive Science Society, 44(44).
@inproceedings{cheng2022quantifying,
title={Quantifying the Emergence of Symbolic Communication},
author={Cheng, Emily and Kuo, Yen-Ling and Correa, Josefina and Katz, Boris and Cases, Ignacio and Barbu, Andrei},
booktitle={Proceedings of the Annual Meeting of the Cognitive Science Society},
volume={44},
number={44},
year={2022}
} - Conwell, Mayo, Barbu, Buice, Alvarez & Katz (2021)
-
Conwell,
C.,
Mayo,
D.,
Barbu,
A.,
Buice,
M.,
Alvarez,
G. & Katz,
B.
(2021).
Neural regression, representational similarity, model zoology & neural taskonomy at scale in rodent visual cortex.
Advances in Neural Information Processing Systems, 34. 5590–5607.
@article{conwell2021neural,
title={Neural regression, representational similarity, model zoology & neural taskonomy at scale in rodent visual cortex},
author={Conwell, Colin and Mayo, David and Barbu, Andrei and Buice, Michael and Alvarez, George and Katz, Boris},
journal={Advances in Neural Information Processing Systems},
volume={34},
pages={5590--5607},
year={2021}
} - Wang, Subramaniam, Yaari, Kreiman, Katz, Cases & Barbu (2023)
-
Wang,
C.,
Subramaniam,
V.,
Yaari,
A.,
Kreiman,
G.,
Katz,
B.,
Cases,
I. & Barbu,
A.
(2023).
BrainBERT: Self-supervised representation learning for intracranial recordings.
The Eleventh International Conference on Learning Representations (ICLR).
@inproceedings{wang2023brainbert,
title={BrainBERT: Self-supervised representation learning for intracranial recordings},
author={Wang, Christopher and Subramaniam, Vighnesh and Yaari, Adam Uri and Kreiman, Gabriel and Katz, Boris and Cases, Ignacio and Barbu, Andrei},
booktitle={The Eleventh International Conference on Learning Representations (ICLR)},
year={2023}
} - Subramaniam, Conwell, Wang, Kreiman, Katz, Cases & Barbu (2023)
-
Subramaniam,
V.,
Conwell,
C.,
Wang,
C.,
Kreiman,
G.,
Katz,
B.,
Cases,
I. & Barbu,
A.
(2023).
Workshop Submission: Using Multimodal DNNs to Study Vision-Language Integration in the Brain.
ICLR 2023 Workshop on Multimodal Representation Learning: Perks and Pitfalls.
@inproceedings{subramaniam2023workshop,
title={Workshop Submission: Using Multimodal DNNs to Study Vision-Language Integration in the Brain},
author={Subramaniam, Vighnesh and Conwell, Colin and Wang, Christopher and Kreiman, Gabriel and Katz, Boris and Cases, Ignacio and Barbu, Andrei},
booktitle={ICLR 2023 Workshop on Multimodal Representation Learning: Perks and Pitfalls},
year={2023}
} - Mayo, Cummings, Lin, Gutfreund, Katz & Barbu (2023)
-
Mayo,
D.,
Cummings,
J.,
Lin,
X.,
Gutfreund,
D.,
Katz,
B. & Barbu,
A.
(2023).
How hard are computer vision datasets? Calibrating dataset difficulty to viewing time.
@article{mayo2023hard,
title={How hard are computer vision datasets? Calibrating dataset difficulty to viewing time},
author={Mayo, David and Cummings, Jesse and Lin, Xinyu and Gutfreund, Dan and Katz, Boris and Barbu, Andrei},
year={2023}
} - Tejwani, Kuo, Shu, Stankovits, Gutfreund, Tenenbaum, Katz & Barbu (2023)
-
Tejwani,
R.,
Kuo,
Y.,
Shu,
T.,
Stankovits,
B.,
Gutfreund,
D.,
Tenenbaum,
J.,
Katz,
B. & Barbu,
A.
(2023).
Zero-shot linear combinations of grounded social interactions with Linear Social MDPs.
AAAI Conference on Artificial Intelligence.
@inproceedings{tejwani2023zero,
title={Zero-shot linear combinations of grounded social interactions with Linear Social MDPs},
author={Tejwani, Ravi and Kuo, Yen-Ling and Shu, Tianmin and Stankovits, Bennett and Gutfreund, Dan and Tenenbaum, Joshua B and Katz, Boris and Barbu, Andrei},
booktitle={AAAI Conference on Artificial Intelligence},
year={2023}
} - Berzak, Barbu, Harari, Katz & Ullman (2015)
-
Berzak,
Y.,
Barbu,
A.,
Harari,
D.,
Katz,
B. & Ullman,
S.
(2015).
Do You See What I Mean? Visual Resolution of Linguistic Ambiguities.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1477–1487.
@inproceedings{berzak2015you,
title={Do You See What I Mean? Visual Resolution of Linguistic Ambiguities},
author={Berzak, Yevgeni and Barbu, Andrei and Harari, Daniel and Katz, Boris and Ullman, Shimon},
booktitle={Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
pages={1477--1487},
year={2015}
} - Barrett, Barbu, Siddharth & Siskind (2015)
-
Barrett,
D.,
Barbu,
A.,
Siddharth,
N. & Siskind,
J.
(2015).
Saying what you`re looking for: Linguistics meets video search.
IEEE transactions on pattern analysis and machine intelligence, 38(10). 2069–2081.
@article{barrett2015saying,
title={Saying what you`re looking for: Linguistics meets video search},
author={Barrett, Daniel Paul and Barbu, Andrei and Siddharth, N and Siskind, Jeffrey Mark},
journal={IEEE transactions on pattern analysis and machine intelligence},
volume={38},
number={10},
pages={2069--2081},
year={2015},
publisher={IEEE}
} - Narayanaswamy, Barbu & Siskind (2011)
-
Narayanaswamy,
S.,
Barbu,
A. & Siskind,
J.
(2011).
A visual language model for estimating object pose and structure in a generative visual domain.
2011 IEEE International Conference on Robotics and Automation. 4854–4860.
@inproceedings{narayanaswamy2011visual,
title={A visual language model for estimating object pose and structure in a generative visual domain},
author={Narayanaswamy, Siddharth and Barbu, Andrei and Siskind, Jeffrey Mark},
booktitle={2011 IEEE International Conference on Robotics and Automation},
pages={4854--4860},
year={2011},
organization={IEEE}
} - Siddharth, Barbu & Siskind (2012)
-
Siddharth,
N.,
Barbu,
A. & Siskind,
J.
(2012).
Seeing Unseeability to See the Unseeable.
Advances in Cognitive Systems, 2. 77–94.
@article{siddharth2012seeing,
title={Seeing Unseeability to See the Unseeable},
author={Siddharth, N and Barbu, Andrei and Siskind, Jeffrey Mark},
journal={Advances in Cognitive Systems},
volume={2},
pages={77--94},
year={2012},
publisher={Cognitive Systems Foundation}
} - Ross, Barbu, Berzak, Myanganbayar & Katz (2018)
-
Ross,
C.,
Barbu,
A.,
Berzak,
Y.,
Myanganbayar,
B. & Katz,
B.
(2018).
Grounding language acquisition by training semantic parsers using captioned videos.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2647–2656.
@inproceedings{ross2018grounding,
title={Grounding language acquisition by training semantic parsers using captioned videos},
author={Ross, Candace and Barbu, Andrei and Berzak, Yevgeni and Myanganbayar, Battushig and Katz, Boris},
booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
pages={2647--2656},
year={2018}
} - Paul, Barbu, Felshin, Katz & Roy (2017)
-
Paul,
R.,
Barbu,
A.,
Felshin,
S.,
Katz,
B. & Roy,
N.
(2017).
Temporal grounding graphs for language understanding with accrued visual-linguistic context.
Proceedings of the 26th International Joint Conference on Artificial Intelligence. 4506–4514.
@inproceedings{paul2017temporal,
title={Temporal grounding graphs for language understanding with accrued visual-linguistic context},
author={Paul, Rohan and Barbu, Andrei and Felshin, Sue and Katz, Boris and Roy, Nicholas},
booktitle={Proceedings of the 26th International Joint Conference on Artificial Intelligence},
pages={4506--4514},
year={2017}
}