Scene graph

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.1109/CVPR.2017.330•

Scene Graph Generation by Iterative Message Passing

[...]

Danfei Xu¹, Yuke Zhu¹, Christopher Choy¹, Li Fei-Fei¹•Institutions (1)

Stanford University¹

1 Jul 2017

TL;DR: In this article, the problem of graph generation is formulated as message passing between the primal node graph and its dual edge graph, which can take advantage of contextual cues to make better predictions on objects and their relationships.

...read moreread less

Abstract: Understanding a visual scene goes beyond recognizing individual objects in isolation. Relationships between objects also constitute rich semantic information about the scene. In this work, we explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image. We propose a novel end-to-end model that generates such structured scene representation from an input image. Our key insight is that the graph generation problem can be formulated as message passing between the primal node graph and its dual edge graph. Our joint inference model can take advantage of contextual cues to make better predictions on objects and their relationships. The experiments show that our model significantly outperforms previous methods on the Visual Genome dataset as well as support relation inference in NYU Depth V2 dataset.

...read moreread less

1,491 citations

Proceedings Article•10.1109/CVPR.2015.7298990•

Image retrieval using scene graphs

[...]

Justin Johnson¹, Ranjay Krishna¹, Michael Stark², Li-Jia Li³, David A. Shamma³, Michael S. Bernstein¹, Li Fei-Fei¹ - Show less +3 more•Institutions (3)

Stanford University¹, Max Planck Society², Yahoo!³

7 Jun 2015

TL;DR: A conditional random field model that reasons about possible groundings of scene graphs to test images and shows that the full model can be used to improve object localization compared to baseline methods and outperforms retrieval methods that use only objects or low-level image features.

...read moreread less

Abstract: This paper develops a novel framework for semantic image retrieval based on the notion of a scene graph. Our scene graphs represent objects (“man”, “boat”), attributes of objects (“boat is white”) and relationships between objects (“man standing on boat”). We use these scene graphs as queries to retrieve semantically related images. To this end, we design a conditional random field model that reasons about possible groundings of scene graphs to test images. The likelihoods of these groundings are used as ranking scores for retrieval. We introduce a novel dataset of 5,000 human-generated scene graphs grounded to images and use this dataset to evaluate our method for image retrieval. In particular, we evaluate retrieval using full scene graphs and small scene subgraphs, and show that our method outperforms retrieval methods that use only objects or low-level image features. In addition, we show that our full model can be used to improve object localization compared to baseline methods.

...read moreread less

1,431 citations

Book Chapter•10.1007/978-3-030-01246-5_41•

Graph R-CNN for Scene Graph Generation

[...]

Jianwei Yang¹, Jiasen Lu¹, Stefan Lee¹, Dhruv Batra¹, Dhruv Batra², Devi Parikh¹, Devi Parikh² - Show less +3 more•Institutions (2)

Georgia Institute of Technology¹, Facebook²

8 Sep 2018

TL;DR: A novel scene graph generation model called Graph R-CNN, that is both effective and efficient at detecting objects and their relations in images, is proposed and a new evaluation metric is introduced that is more holistic and realistic than existing metrics.

...read moreread less

Abstract: We propose a novel scene graph generation model called Graph R-CNN, that is both effective and efficient at detecting objects and their relations in images. Our model contains a Relation Proposal Network (RePN) that efficiently deals with the quadratic number of potential relations between objects in an image. We also propose an attentional Graph Convolutional Network (aGCN) that effectively captures contextual information between objects and relations. Finally, we introduce a new evaluation metric that is more holistic and realistic than existing metrics. We report state-of-the-art performance on scene graph generation as evaluated using both existing and our proposed metrics.

...read moreread less

923 citations

Proceedings Article•10.1109/CVPR.2019.01094•

Auto-Encoding Scene Graphs for Image Captioning

[...]

Xu Yang¹, Kaihua Tang¹, Hanwang Zhang¹, Jianfei Cai¹•Institutions (1)

Nanyang Technological University¹

1 Jun 2019

TL;DR: Yang et al. as mentioned in this paper proposed Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions.

...read moreread less

Abstract: We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions. Intuitively, we humans use the inductive bias to compose collocations and contextual inference in discourse. For example, when we see the relation ``person on bike'', it is natural to replace ``on'' with ``ride'' and infer ``person riding bike on a road'' even the ``road'' is not evident. Therefore, exploiting such bias as a language prior is expected to help the conventional encoder-decoder models less likely to overfit to the dataset bias and focus on reasoning. Specifically, we use the scene graph --- a directed graph (G) where an object node is connected by adjective nodes and relationship nodes --- to represent the complex structural layout of both image (I) and sentence (S). In the textual domain, we use SGAE to learn a dictionary (D) that helps to reconstruct sentences in the S -> G -> D -> S pipeline, where D encodes the desired language prior; in the vision-language domain, we use the shared D to guide the encoder-decoder in the I -> G -> D -> S pipeline. Thanks to the scene graph representation and shared dictionary, the inductive bias is transferred across domains in principle. We validate the effectiveness of SGAE on the challenging MS-COCO image captioning benchmark, \eg, our SGAE-based single-model achieves a new state-of-the-art 127.8 CIDEr-D on the Karpathy split, and a competitive 125.5 CIDEr-D (c40) on the official server even compared to other ensemble models. Code has been made available at: https://github.com/yangxuntu/SGAE.

...read moreread less

739 citations

Proceedings Article•10.1109/ICCV.2017.142•

Scene Graph Generation from Objects, Phrases and Region Captions

[...]

Yikang Li¹, Wanli Ouyang², Bolei Zhou³, Kun Wang¹, Xiaogang Wang¹ - Show less +1 more•Institutions (3)

The Chinese University of Hong Kong¹, University of Sydney², Massachusetts Institute of Technology³

1 Oct 2017

TL;DR: Zhang et al. as mentioned in this paper proposed a multi-level scene description network (MSDN) to solve the three vision tasks jointly in an end-to-end manner, where object, phrase, and caption regions are aligned with a dynamic graph based on their spatial and semantic connections.

...read moreread less

Abstract: Object detection, scene graph generation and region captioning, which are three scene understanding tasks at different semantic levels, are tied together: scene graphs are generated on top of objects detected in an image with their pairwise relationship predicted, while region captioning gives a language description of the objects, their attributes, relations and other context information. In this work, to leverage the mutual connections across semantic levels, we propose a novel neural network model, termed as Multi-level Scene Description Network (denoted as MSDN), to solve the three vision tasks jointly in an end-to-end manner. Object, phrase, and caption regions are first aligned with a dynamic graph based on their spatial and semantic connections. Then a feature refining structure is used to pass messages across the three levels of semantic tasks through the graph. We benchmark the learned model on three tasks, and show the joint learning across three tasks with our proposed method can bring mutual improvements over previous models. Particularly, on the scene graph generation task, our proposed method outperforms the stateof- art method with more than 3% margin. Code has been made publicly available.

...read moreread less

672 citations

...

Expand

Year	Papers
2025	11
2024	15
2023	100
2022	147
2021	126
2020	114

Topic Tools

Papers published on a yearly basis

Papers

Scene Graph Generation by Iterative Message Passing

Image retrieval using scene graphs

Graph R-CNN for Scene Graph Generation

Auto-Encoding Scene Graphs for Image Captioning

Scene Graph Generation from Objects, Phrases and Region Captions

Related Topics (5)

Performance Metrics