Assignment 5: Stereo and Optical Flow

(assignment is adopted and heavily inspired by assignments given out by Stefan Roth, with the code being re-written and adopted by TAs)

The purpose of this assignment is to introduce you to basics in stereo and optical flow. Specifically, you will be asked to implement classic approaches for these problems we covered in class. The zip file containing starter code and data is hw5.zip. The unzipped directory, contains a skeleton Python programs for both parts of the assignment main_stereo.py and main_flow.py (which you can run by typing

python main_stereo.py
python main_flow.py

at the command prompt). Note that you only need to modify problem1.py and problem2.py files. You are also welcome to use Jupyter Notebook and/or Colab for the assignment. You will just need to copy and paste the functions and files.

Part 1: Stereo

You will perform stereo matching by estimating a disparity map between two front-parallel images. As described in the lecture slides, we will try the window-based stereo matching method. Given the two rectified images, we estimate the disparity of each pixel along the horizontal scan-line by comparing the cost between two window patches.

As a cost function, we will use a weighted sum of two cost functions, SSD (Sum of Squared Differences) and NC (Normalized Correlation):

\[f_{cost}(x,y,d) = \frac{1}{m^2} \left[ \text{SSD}(x,y,d) \right] + \alpha \text{NC}(x,y,d)\]

with

\[\text{SSD}(x,y,d) = \sum_{(x', y') \in w_L(x,y)} \left[ I_L(x', y') - I_R(x'-d, y') \right]^2\] \[\text{NC}(x,y,d) = \frac{(\mathbf{w}_L(x,y)-\bar{\mathbf{w}}_L(x,y))}{||\mathbf{w}_L(x,y)-\bar{\mathbf{w}}_L(x,y)||} \cdot \frac{(\mathbf{w}_R(x-d,y)-\bar{\mathbf{w}}_R(x-d,y))}{||\mathbf{w}_R(x-d,y)-\bar{\mathbf{w}}_R(x-d,y)||}\]

where \(w_L\) is a \(m \times m\)-sized image patch from the left image (\(I_L\)); \(w_R\) is a \(m \times m\)-sized image patch from the right image (\(I_R\)); \(\mathbf{w}_L\) is the reshaped vector of \(w_L\) with the size of \(m^2 \times 1\); \(\mathbf{w}_R\) is the reshaped vector of \(w_R\) with the size of \(m^2 \times 1\); and \(\alpha\) is the weighting factor. \(\bar{\mathbf{w}}_L\) (\(\mathbf{w}_L\) bar) and \(\bar{\mathbf{w}}_R\) (\(\mathbf{w}_R\) bar) are the mean of \(\mathbf{w}_L\) and \(\mathbf{w}_R\) respectively.

1 . (6 points) Implement the two cost functions, cost_ssd and cost_nc. Your first task is to implement the two cost functions, SSD (Sum of Squared Differences) and NC (Normalized Correlation). The input of each function is two \(m \times m\) image patches from the left and right image, respectively, and the output is the scalar value of the calculated cost.

2 . (3 points) To have the same size of window for pixels near the image boundary, the boundary handling needs to be properly done by padding images. Implement the function pad_image that inputs an image and outputs a padded image with given the input padding width. An additional parameter here is the name of the padding scheme, which can take one of three values: “symmetric”, “reflect”, or “constant”. In the case of “constant” assume zero padding.

3 . (9 points) Implement function compute_disparity that calculates per-pixel disparity map between two input images after padding (i.e., padded_img_l and padded_img_r), given the maximum disparity range (max_disp), the window size (window_size), and the alpha (\(\alpha\)). To calculate the cost, call the cost calculation function (cost_function) inside the function compute_disparity. Note that the disparity map should have the same size as the input images without padding.

4 . (3 points) Implement function compute_epe that calculates the average end-point error (EPE) between the ground truth \(d_{gt}\) and the estimated disparity \(d\), where the end-point error is defined as follows, where \(N\) is the number of pixels.

\[\text{AEPE} (d_{gt}, d) = \frac{1}{N} \sum_{(x,y)} | d_{gt}(x,y) -d(x,y) |_1\]

5 . (3 points) Experiment with different settings of \(\alpha\). Try values \(\{-0.06, -0.01, 0.04, 0.1\}\) and return the value of \(\alpha\) (from this set) with the minimum EPE in function optimal_alpha. In your PDF report, you need to record the EPE for each \(\alpha\) and report the \(\alpha\) with the minimum EPE. For the \(\alpha\) with the minimum EPE, visualize the corresponding disparity map.

Part 2: Optical Flow

In this task, we will estimate optical flow using Lucas-Kanade method. Recall from the lecture the following system of linear equations we are trying to solve:

\[\sum_R u \cdot I_x + v \cdot I_y + I_t = 0\]

where \((u, v)\) is the unknown flow vector for the centre pixel in region \(R\); \(I_x\), \(I_y\) and \(I_t\) are the image derivatives with respect to \(x\), \(y\) and \(t\); and the summation is over an image patch \(R\) of a pre-defined size. This implies that \(I_x\), \(I_y\) and \(I_t\) should be read as scalar values per pixel in \(R\). Our basic implementation of Lucas-Kanade will contain the following steps.

1 . (3 points) Implement function compute_derivatives that computes image derivatives \(I_x\), \(I_y\) and \(I_t\) as defined in the lecture. Note that the returned values should have the same size as the image. You should use symmetric padding when handling image boundary.

2 . (10 points) Implement function compute_motion that computes \((u, v)\) for each pixel given \(I_x\), \(I_y\) and \(I_t\) as input. Please ignore aggregate and sigma parameters for now, we will come back to these a bit later. Add the visualized motion for the given “things” and “street” examples into your PDF report.

3 . (4 points) Implement function warp that warps the given (first) image with the provided flow estimates. Hint: You may find function griddata from package scipy.interpolate useful.

4 . (3 points) We will need to verify that our flow estimates indeed minimise the objective. The objective/cost function minimised by Lucas-Kanade is the averaged SSD (Sum of Squared Differences) between pixels of the warped first image and the second image. Implement this cost function in compute_cost. Specifically, first calculate the SSD between the warped image 1 and image 2, and then divide the SSD value by the number of pixels. The result can be intereperted as the averaged difference between pixels. Report the costs for the given “things” and “street” examples in your PDF.

5 . (4 points) Note there are two additional arguments (aggregate and sigma) to the function compute_motion we have not used. The summation of pixel values in patch \(R\) can also be weighted, that is, the pixels that are farther away from the centre pixel in the patch will have smaller contribution to the sum than the pixels closer to the centre pixel. This intuitive idea suggests a Gaussian weighting: the weights in the patch will correspond to the weights in the Gaussian kernel. Modify your implementation of compute_motion to accept aggregate='gaussian' and sigma that sums the patch values in this fashion. The argument sigma is the sigma for the Gaussian kernel. You can use the default sigma=2.2 for this assignment. You are welcome to use code from Assignment 1 to construct Gaussian weights. Add the visualized motion for the given “things” and “street” examples into your PDF report, along with the corresponding costs.

Deliverables

Hand in your functions (i.e., *.py files) or a Jupyter Notebook and PDF file showing results and answering questions. These must have sufficient comments for others to easily understand the code. Both Python files and PDF should be submitted through Gradescope.

Design by Andreas Viklund