SeLFVi: Self-supervised Light Field Video Reconstruction from Stereo Video

Prasan Shedligeri

Florian Schiffers

Sushobhan Ghosh

Oliver Cossairt

Kaushik Mitra

[Paper]

[GitHub]

Abstract

Light-field imaging is appealing to the mobile devices market because of its capability for intuitive post-capture processing. Acquiring LF data with high angular, spatial and temporal resolution poses significant challenges, especially with space constraints preventing bulky optics. At the same time, stereo video capture, now available on many consumer devices, can be interpreted as a sparse LF-capture. We explore the application of small baseline stereo videos for reconstructing high fidelity LF videos. We propose a self-supervised learning-based algorithm for LF video reconstruction from stereo video. The self-supervised LF video reconstruction is guided via the geometric information from the individual stereo pairs and the temporal information from the video sequence. LF estimation is further regularized by a low-rank constraint based on layered LF displays. The proposed self-supervised algorithm facilitates advantages such as post-training fine-tuning on test sequences and variable angular view interpolation and extrapolation. Quantitatively the reconstructed LF videos show higher fidelity than previously proposed unsupervised approaches.% for LF reconstruction. We demonstrate our results via LF videos generated from publicly available stereo videos acquired from commercially available stereoscopic cameras. Finally, we demonstrate that our reconstructed LF videos allow applications such as post-capture focus control and region-of-interest (RoI) based focus tracking for videos.

Talk

[Slides]

Key takeaways

Commercial light-field cameras like Lytro acquire light-field videos at only 3 fps due to the large data bandwidth requirement.

Stereo cameras that are now available in major mobile devices acquire stereo videos. These stereo videos can be considered as a sparse sample of light-field views.

Reconstruction of light-field videos from stereo videos is ill-posed and requires strong regularization. We utilize low-rank light-field representation based on tensor-display model as our regularizer.

Photometric, geometric, and temporal consistency constraints can be extracted from the input stereo video sequence. We define self-supervised loss functions based on these consistency constraints to reconstruct light-field videos from stereo videos alone.

Algorithm

Overall flow of the proposed self-supervised algorithm for LF video reconstruction from stereo video. The LF frames are generated from the input stereo pair via an intermediate low-rank tensor-display (TD) based representation. The self-supervised learning of LF reconstruction is guided via self-supervised cost functions involving stereo pair, disparity maps and optical flow maps.

Paper and Supplementary Material

Prasan Shedligeri et al.

(Available as Pre-print)

[Supplementary Material]

[Code]

Related Publications

Prasan Shedligeri, Anupama S & Kaushik Mitra. (2021) A Unified Framework for Compressive Video Recovery from Coded ExposureTechniques. Accepted at IEEE/CVF Winter Conference on Applications of Computer Vision, doi to be assigned [Preprint] [Slides] [Supplementary] [Code] [Webpage]

Prasan Shedligeri, Anupama S & Kaushik Mitra. (2021) CodedRecon: Video reconstruction for coded exposure imaging techniques. Accepted at Elsevier Journal of Software Impacts, https://doi.org/10.1016/j.simpa.2021.100064 [Paper] [Code]

Acknowledgements

The authors would like to thank Matta Gopi Raju for collecting some of the data used in this and related publications.
This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.