Hello! My name is Wonbong — but you can also call me Won.
I completed my PhD defence at University College London, and currently work as a Postdoctoral Researcher at Meta, on Monetization + GenAI team in the London office.
During my PhD, supervised by
Prof. Lourdes Agapito,
I focused on learning to understand and reconstruct 3D scenes from images.
My PhD work (CodeNeRF, NViST) attempt to generate radiance fields given single input images (single-image to NeRF).
I also explored how additional camera and scene priors could improve the accuracy of 3D point map generation, working with amazing mentors at Naver Labs Europe (Pow3R).
At Meta, I contribute to the Kaleido project, which scales generative neural rendering using sequence-to-sequence diffusion models.
I continue to work along this direction : 3D-consistent generative novel view synthesis and solving 3D problems using diffusion models.
Before moving into 3D Computer Vision and Machine Learning, I had the privilege of working for the Korean government for six years.
I enjoy cycling around the city and having a cup of coffee.
Feel free to drop me an email if you’d like to chat!
Kaleido: Scaling Sequence-to-Sequence Generative Neural Rendering
Shikun Liu, Kam-Woh Ng, Wonbong Jang, Jiadong Guo, Junlin Han, Haozhe Liu, Yiannis Douratsos, Juan C. Perez, Zijian Zhou, Chi Phung, Tao Xiang and Juan-Manuel Perez-Rua Arxiv, 2025 Project Page /
arXiv
Kaleido pushes the idea of "3D perception is not a geometric problem, but a form of visual common sense" to novel view synthesis problem. It generates beautiful renderings and achieves per-scene optimization model with much fewer number of input images.
Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy, Lourdes Agapito, Jerome Revaud CVPR, 2025 Project Webpage /
Paper /
Arxiv /
Poster PDF /
Code
DUSt3R generates 3D pointmaps from regular images without requiring camera poses. In practice, significant effort is put into camera calibration or deploying additional sensors to acquire point clouds. We present Pow3R, a single network capable of processing any subset of this auxiliary information. By incorporating priors, our method achieves more accurate and precise 3D reconstructions, multi-view depth estimation, and camera pose predictions. This approach opens new possibilities, such as processing images at native resolution and performing depth completion. Additionally, Pow3R generates pointmaps in two distinct coordinate systems, enabling the model to compute camera poses more quickly and accurately.
NViST turns in-the-wild single images into implicit 3D functions with a single pass using Transformers. Extending CodeNeRF to multiple real-world scenes, Feed-forward model, Transformers.