Assignment 5: Stereo and Optical Flow
(assignment is adopted and heavily inspired by assignments given out by Stefan Roth, with the code being re-written and adopted by TAs)
Due: At the end of day11:59pm, Monday, November 25, 2024.
The purpose of this assignment is to introduce you to basics in stereo and optical flow. Specifically, you will be asked to implement classic approaches for these problems we covered in class. The zip file containing starter code and data is hw5.zip. The unzipped directory, contains a skeleton Python programs for both parts of the assignment main_stereo.py and main_flow.py (which you can run by typing
python main_stereo.py
python main_flow.py
at the command prompt). Note that you only need to modify problem1.py and problem2.py files. You are also welcome to use Jupyter Notebook ans/or Colab for the assignment. You will just need to copy and paste the functions and files.
Part 1: Stereo
You will perform stereo matching by estimating a disparity map between two front-parallel images. As described in the lecture slides, we will try the window-based stereo matching method. Given the two rectified images, we estimate the disparity of each pixel along the horizontal scan-line by comparing the cost between two window patches.
As a cost function, we will use a weighted sum of two cost functions, SSD (Sum of Squared Differences) and NC (Normalized Correlation):
$f_{cost}(x,y,d) = \frac{1}{m^2} \left[ SSD(x,y,d) + \alpha NC(x,y,d) \right]$
where
$SSD(x,y,d) = \sum_{(x', y') \in W(x,y)} \left[ I_L(x', y') - I_R(x'-d, y') \right]^2$
$NC(x,y,d) = \frac{\mathbf{w}_L(x,y)}{||\mathbf{w}_L(x,y)||} \cdot \frac{\mathbf{w}_R(x-d,y)}{||\mathbf{w}_R(x-d,y)||} $
where $\mathbf{w}_L$ and $\mathbf{w}_R$ is a $m \times m$-sized image patch from the left and right image respective, $\mathbf{w}_L$ the reshaped vector of $\mathbf{w}_L$ with the size of $m^2 × 1$, and $\alpha$ is the weighting factor. The details of each cost function should be familar to you.
-
Implement the two cost functions. Your first task is to implement the two cost functions, SSD (Sum of Squared Differences) and NC (Normalized Correlation). The input of each function is two $m \times m$ image patches from the left and right image, respectively, and the output is the scalar value of the calculated cost.
-
To have the same size of window for pixels near the image boundary, the boundary handling needs to be properly done by padding images. Implement the function pad_image that inputs an image and outputs a padded image with given the input padding width. An additional parameter here is the name of the padding scheme, which can take one of three values: "symmetric", "reflect", or "constant". In the case of "constant" assume zero padding.
-
Implement function compute_disparity that calculates per-pixel disparity map between two input images after padding (i.e., padded_img_l and padded_img_r), given the maximum disparity range (max_disp), the window size (window_size), and the alpha ($\alpha$). To calculate the cost, call the cost calculation function (cost_function) inside the function compute_disparity.
-
Implement function compute_epe that calculates the average end-point error (EPE) between the ground truth dgt and the estimated disparity d, where the end-point error is defined as $AEPE (d_{gt}, d) = \frac{1}{N} \sum_{(x,y)} | d_{gt}(x,y) -d(x,y) |_1$, where $N$ is the number of pixels.
-
Experiments with different settings of $\alpha$. Try values {-0.06, -0.01, 0.04, 0.1} and return the value of alpha (from this set) with the minimum EPE in function optimal_alpha.
Part 2: Optical Flow
In this task, we will estimate optical flow using Lucas-Kanade method. Recall from the lecture the following system of linear equations we are trying to solve:
$\sum_R u \cdot I_x + v \cdot I_y + I_t = 0$
where $(u, v)$ is the unknown flow vector for the centre pixel in region $R$; $I_x$, $I_y$ and $I_t$ are the image derivatives with respect to $x$, $y$ and $t$; and the summation is over an image patch $R$ of a pre-defined size. This implies that $I_x$, $I_y$ and $I_t$ should be read as scalar values per pixel in $R$. Our basic implementation of Lucas-Kanade will contain the following steps.
-
Implement function compute_derivatives that computes image derivatives $I_x$, $I_y$ and $I_t$ as defined in the lecture. Note that the returned values should have the same size as the image.
-
Implement function compute_motion that computes $(u, v)$ for each pixel given $I_x$, $I_y$ and $I_t$ as input. Please ignore aggregate and sigma parameters for now, we will come back to these a bit later.
-
Implement function warp that warps the given (first) image with the provided flow estimates. Hint: You may find function griddata from package scipy.interpolate useful.
-
We will need to verify that our flow estimates indeed minimise the objective. The cost function minimised by Lucas-Kanade is an sum of L2 distances between pixels of the worped first image and the second image. Implement this cost function in compute_cost. Specifically, subtract the worpped image 1 from image 2 pixel-by-pixel, then compute the L2 norm of the difference; finally sum up all the differences together and divide by the number of pixels. The resulting value can be intereperted as the average difference between pixels.
-
Note there are two additional arguments (aggregate and sigma) to the function compute_motion we have not used. The summation of pixel values in patch $R$ can also be weighted, that is the pixels that are farther away from the centre pixel in the patch will have smaller contribution to the sum than the pixels closer to the centre pixel. This intuitive idea suggests a Gaussian weighting: the weights in the patch will correspond to the weights in the Gaussian kernel. Modify your implementation of compute_motion to accept aggregate='gaussian' and sigma that sums the patch values in this fashion. The argument sigma is the sigma for the Gaussian kernel. You can use the default sigma=2 for this assignment. You are welcome to use code from Assignment 1 to construct Gaussian weights.
Deliverables
Hand in your functions (i.e., *.py files) or a Jupyter Notebook and PDF file showing results and answering questions. These must have sufficient comments for others to easily understand the code. Both Python files and PDF should be submitted through Canvas.