1st Workshop on

Neural Volumetric Video

Dynamic view synthesis
Volumetric video
4D scene representation
AR/VR glasses

NVV @ CVPR2024

Motivation
Overview
Schedule
Speakers
Organizers

Motivation

Dynamic view synthesis aims to enable users to watch dynamic 3D scenes from arbitrary views for immersive experiences. This is an important technique for many AR/VR applications, such as volumetric filmmaking, virtual telepresence, and video games. Recent methods have achieved impressive progress on novel view synthesis of static scenes. For instance, NeRF and its follow-up works are able to efficiently produce high-quality 3D scene representation that supports real-time rendering. However, it is not trivial to extend the success on static scenes to dynamic scenes. A dynamic scene can be seen as a large number of static scenes along the temporal dimension. Therefore, to enable high-fidelity and interactive AR/VR applications, dynamic scenes will require much more than static scenes on the model capacity and reconstruction speed, while expecting much smaller model storage per frame for transmission

Overview

This workshop aims to bring researchers from the field of dynamic view synthesis together to discuss the latest progress and important technical challenges of this task. More concretely, we have the following question to talk about:

  • Assuming that the input observations are sufficient, e.g., videos recorded by the dense camera array, what are the right 4D scene representations that can faithfully capture complex dynamic scenes while being storage-efficient for easy transmission.
  • For the immersive telepresence, will the image-based blending technique be the ultimate solution? What are the limitations of this technique?
  • How to expand the view synthesis range when the input is monocular video?
  • What is the product form of the camera for common users to produce their own volumetric videos of daily life?

More broadly, given the rapid advancements in AR/VR glasses, we hope that this workshop will facilitate discussions on how dynamic view synthesis techniques can be optimized to enhance content capture and display on these devices.

Schedule

01:30 pm - 01:40 pm Opening remarks
01:40 pm - 02:15 pm Invited talk: Yaser Sheikh
02:15 pm - 02:50 pm Invited talk: Deva Ramanan
02:50 pm - 03:25 pm Invited talk: Lingjie Liu
03:25 pm - 04:30 pm Poster session / Coffee break
04:30 pm - 05:05 pm Invited talk: Steve Seitz
05:05 pm - 05:40 pm Invited talk: Michael Zollhoefer
05:40 pm - 05:50 pm Concluding remarks

Tentative Speakers

Placeholder Image
Yaser Sheikh

Yaser Sheikh. Final confirmation. Relevance: Yaser Sheikh is an Associate Professor at the Robotics Institute (on leave), Carnegie Mellon University, with appointments in the Mechanical Engineering Department. He founded and directs Facebook Reality Lab in Pittsburgh focused on pursuing ”metric telepresence”: remote interactions in AR/VR that are indistinguishable from reality. His research is focused on machine perception and rendering of social behavior, spanning sub-disciplines in computer vision, computer graphics, and machine learning.

Placeholder Image
Deva Ramanan

Deva Ramanan. Final confirmation. Relevance: Deva Ramanan is a Professor in the Robotics Institute at Carnegie-Mellon University and the director of the CMU Argo AI Center for Autonomous Vehicle Research. His research interests span computer vision and machine learning, with a focus on visual recognition. He was named a National Academy of Sciences Kavli Fellow in 2013. He has many impressive works on dynamic scene reconstruction, such as Dynamic 3D Gaussians and BANMo

Placeholder Image
Lingjie Liu

Lingjie Liu. Final confirmation. Relevance: Lingjie Liu is the Aravind K. Joshi Assistant Professor in the Department of Computer and Information Science at the University of Pennsylvania, where she leads the Penn Computer Graphics Lab. Her research interests are at the interface of Computer Graphics, Computer Vision, and AI, with a focus on Neural Scene Representations, Neural Rendering, and 3D Reconstruction.

Placeholder Image
Michael Zollhoefer

Michael Zollhoefer. Final confirmation. Relevance: Michael Zollhoefer is a Director at Reality Labs Research (RL-R) in Pittsburgh leading a group of six research and engineering teams. His group is focused on building the technology that is required to develop a Codec Telepresence system that is indistinguishable from reality. Achieving this goal requires building first-ofits-kind multi-view capture systems, complex pilot captures for data collection, as well as cutting-edge research on neural representations for avatars, audio, and spaces.

Placeholder Image
Steve Seitz

Steve Seitz. Tentative confirmation. Relevance: Steve Seitz is a professor at UW, where his research focuses on computer vision, computer graphics, and related topics. He co-directs the UW Reality Lab and is affiliated with GRAIL. He is also a Director at Google on the Project Starline team, and previously worked on VR, Photos, Computational Photography, and Maps.

Organizers

Placeholder Image
Sida Peng

Sida Peng is an Assistant Professor at the School of Software Technology, Zhejiang University. He received his Ph.D. degree from College of Computer Science and Technology at Zhejiang University in 2023, and obtained his bachelor degree in Information Engineering from Zhejiang University in 2018. His research interests include 3D reconstruction, rendering, and 3D generation. He received the 2020 CCF-CV Excellent Young Researcher Award and was selected as the 2022 Apple Scholar in AI/ML.

Placeholder Image
Yiyi Liao

Yiyi Liao is an assistant professor at Zhejiang University. She received her Ph.D. in Control Science and Engineering from Zhejiang University. Her research interest lies in 3D computer vision, including 3D reconstruction, scene understanding, and 3D-aware generative models. She co-organized the ICCV 2021 Workshop on Differentiable 3D Vision and Graphics.

Placeholder Image
Xiaowei Zhou

Xiaowei Zhou is a tenured Associate Professor of Computer Science at Zhejiang University, China. He obtained his PhD degree from The Hong Kong University and Science and Technology, after which he was a postdoctoral researcher at the GRASP Lab, University of Pennsylvania. His research interests include 3D reconstruction, understanding and synthesis of objects, humans and scenes, with applications in VR/AR and robotics. He is on the editorial board of IJCV, served as area chairs of CVPR’21 and ICCV’21, and co-organized the series of Geometry Meets Deep Learning Workshops (GMDL).

Placeholder Image
Andreas Geiger

Andreas Geiger is a Professor of computer science heading the Autonomous Vision Group (AVG). His group is part of the University of T¨ubingen and the MPI for Intelligent Systems located in T¨ubingen, Germany at the heart of CyberValley. His research group is developing machine learning models for computer vision, natural language and robotics with applications in self-driving, VR/AR and scientific document analysis. His work has been recognized with several prizes, including the Longuet-Higgins Prize, the Mark Everingham Prize, the IEEE PAMI Young Investigator Award, and the Heinz Maier Leibnitz Prize. He is the Program Chair of CVPR 2023. He co-organized several Autonomous Driving Workshops at ECCV 2018, CVPR 2021.