NeurIPS’22 paper: Iterative process provides flexibility and better results for computer vision

Part 1 of a series featuring some of the CS department’s accepted papers for the NeurIPS 2022 (conference being held Nov. 28 - Dec. 9)

Dr. Leonid Sigal and his co-author, PhD student Siddhesh Khandelwal, have had their paper accepted to NeurIPS 2022, the premiere conference in machine learning and AI.

In their paper, Iterative Scene Graph Generation, the researchers advance and refine previous higher-level work done on scene graph estimations.

What are ‘scene graphs?’
They are graph-based representations of scenes, where nodes correspond to detected objects and their locations, and edges denote relations and interactions between the objects. The results are highly useful by providing a high-level understanding of the depicted scene.

“What we’ve been trying to do in Computer Vision for a few years now is to develop more flexible models for scene graph estimation from images or videos,” Dr. Sigal said. “In previous work, people would identify objects first, then their relationship to each other. But that approach may not always be optimal, especially when you’re dealing with smaller objects that are hard to identify by themselves.”

Through their research which uses a series of estimation refinements, Khandelwal and Sigal discovered that interactions between objects and the objects themselves can be identified together and in a mutually beneficial manner. As a result, accuracy improves and computation time is reduced.

In the series of beach scene images below, one can see how iteration improves the estimations. The initial prediction has several nodes wrong, coloured red. In the next iteration, after refinement, fewer nodes are wrong. Another refinement provides even greater accuracy.

“A programmer can then choose the level of refinement desired, which allows more flexibility for one’s purposes,” said Sigal. He explained that it enables the trading of computation power and time, for performance and accuracy. The choice a programmer makes depends on the particular requirements for granularity and accuracy.

Using autonomous vehicles as an example of real world applications where accurate representations are essential, there are many questions about incoming data. Questions like, “How close is that other vehicle to this car,” and “Where is the streetlight in relation to this vehicle,” or “Is that person about to cross the crosswalk?” Scene graphs can help answer these questions at a granular level.

Leon explained how their paper serves to accomplish three key things which resulted in acceptance by NeurIPS:

It provides a novel idea for creating a more flexible model for estimating scene graphs.
It beats state-of-the-art results.
It provides other researchers with a comprehensive lay of the land by including detailed analyses of the problem and the impact of trading off performance and time, as well as performance between more and less frequent interactions (allowing to address biases in the data).

“Essentially, this approach of Iterative Scene Graph Generation can be applied anywhere in cases where you need a more granular understanding of the entire scene and how objects relate to each other,” Leon said.

Siddhesh will be attending the conference in New Orleans to provide a poster presentation for the paper.

Learn more about Leon, and the research happening in Computer Vision at UBC.

In total, the department has 13 accepted papers by 9 professors at the NeurIPS conference. Read more about the papers accepted and their authors.