Computer vision, broadly speaking, is a research field aimed to enable computers to process and interpret visual data (namely in the form of images and video), as sighted humans can. It is one of the most exciting areas of research in computing science and among the fastest growing technologies in today’s industry. This course provides an introduction to the fundamental principles and applications of computer vision, including image formation, sampling and filtering, color analysis, single and multi-image geometry, feature detection and matching, stereo imaging, motion estimation, segmentation, image classification and object detection. We’ll study basic methods and application of these concepts to a variety of visual tasks.
In this course, we will look into how Deep Learning is used for 3D Computer Vision. Specifically, we first look into the basics of Deep Learning, then into epipolar geometry, the math governing the geometry behind two cameras and a point. We will then integrate Deep Learning with epipolar geometry, through learned local features and Deep Learning on point clouds. We will also look into how neural fields can be used in this context, including NeRF and more recent Gaussian Splatting. We will also see how recent Generative models, such as diffusion-based models integrate into the 3D Computer Vision landscape.