The purpose of this assignment is to introduce you to basics in stereo and optical flow. Specifically, you will be asked to implement classic approaches for these problems we covered in class. The zip file containing starter code and data is hw5.zip. The unzipped directory, contains a skeleton Python programs for both parts of the assignment main_stereo.py
and main_flow.py
(which you can run by typing
python main_stereo.py
python main_flow.py
at the command prompt). Note that you only need to modify problem1.py
and problem2.py
files. You are also welcome to use Jupyter Notebook and/or Colab for the assignment. You will just need to copy and paste the functions and files.
You will perform stereo matching by estimating a disparity map between two front-parallel images. As described in the lecture slides, we will try the window-based stereo matching method. Given the two rectified images, we estimate the disparity of each pixel along the horizontal scan-line by comparing the cost between two window patches.
As a cost function, we will use a weighted sum of two cost functions, SSD (Sum of Squared Differences) and NC (Normalized Correlation):
\[f_{cost}(x,y,d) = \frac{1}{m^2} \left[ \text{SSD}(x,y,d) \right] + \alpha \text{NC}(x,y,d)\]with
\[\text{SSD}(x,y,d) = \sum_{(x', y') \in w_L(x,y)} \left[ I_L(x', y') - I_R(x'-d, y') \right]^2\] \[\text{NC}(x,y,d) = \frac{(\mathbf{w}_L(x,y)-\bar{\mathbf{w}}_L(x,y))}{||\mathbf{w}_L(x,y)-\bar{\mathbf{w}}_L(x,y)||} \cdot \frac{(\mathbf{w}_R(x-d,y)-\bar{\mathbf{w}}_R(x-d,y))}{||\mathbf{w}_R(x-d,y)-\bar{\mathbf{w}}_R(x-d,y)||}\]where \(w_L\) is a \(m \times m\)-sized image patch from the left image (\(I_L\)); \(w_R\) is a \(m \times m\)-sized image patch from the right image (\(I_R\)); \(\mathbf{w}_L\) is the reshaped vector of \(w_L\) with the size of \(m^2 \times 1\); \(\mathbf{w}_R\) is the reshaped vector of \(w_R\) with the size of \(m^2 \times 1\); and \(\alpha\) is the weighting factor. \(\bar{\mathbf{w}}_L\) (\(\mathbf{w}_L\) bar) and \(\bar{\mathbf{w}}_R\) (\(\mathbf{w}_R\) bar) are the mean of \(\mathbf{w}_L\) and \(\mathbf{w}_R\) respectively.
1 . (6 points)
Implement the two cost functions, cost_ssd
and cost_nc
. Your first task is to implement the two cost functions, SSD (Sum of Squared Differences) and NC (Normalized Correlation). The input of each function is two \(m \times m\) image patches from the left and right image, respectively, and the output is the scalar value of the calculated cost.
2 . (3 points)
To have the same size of window for pixels near the image boundary, the boundary handling needs to be properly done by padding images. Implement the function pad_image
that inputs an image and outputs a padded image with given the input padding width. An additional parameter here is the name of the padding scheme, which can take one of three values: “symmetric”, “reflect”, or “constant”. In the case of “constant” assume zero padding.
3 . (9 points)
Implement function compute_disparity
that calculates per-pixel disparity map between two input images after padding (i.e., padded_img_l
and padded_img_r
), given the maximum disparity range (max_disp
), the window size (window_size
), and the alpha (\(\alpha\)). To calculate the cost, call the cost calculation function (cost_function
) inside the function compute_disparity
. Note that the disparity map should have the same size as the input images without padding.
4 . (3 points)
Implement function compute_epe
that calculates the average end-point error (EPE) between the ground truth \(d_{gt}\) and the estimated disparity \(d\), where the end-point error is defined as follows, where \(N\) is the number of pixels.
5 . (3 points)
Experiment with different settings of \(\alpha\). Try values \(\{-0.06, -0.01, 0.04, 0.1\}\) and return the value of \(\alpha\) (from this set) with the minimum EPE in function optimal_alpha
. In your PDF report, you need to record the EPE for each \(\alpha\) and report the \(\alpha\) with the minimum EPE. For the \(\alpha\) with the minimum EPE, visualize the corresponding disparity map.
In this task, we will estimate optical flow using Lucas-Kanade method. Recall from the lecture the following system of linear equations we are trying to solve:
\[\sum_R u \cdot I_x + v \cdot I_y + I_t = 0\]where \((u, v)\) is the unknown flow vector for the centre pixel in region \(R\); \(I_x\), \(I_y\) and \(I_t\) are the image derivatives with respect to \(x\), \(y\) and \(t\); and the summation is over an image patch \(R\) of a pre-defined size. This implies that \(I_x\), \(I_y\) and \(I_t\) should be read as scalar values per pixel in \(R\). Our basic implementation of Lucas-Kanade will contain the following steps.
1 . (3 points)
Implement function compute_derivatives
that computes image derivatives \(I_x\), \(I_y\) and \(I_t\) as defined in the lecture. Note that the returned values should have the same size as the image. You should use symmetric padding when handling image boundary.
2 . (10 points)
Implement function compute_motion
that computes \((u, v)\) for each pixel given \(I_x\), \(I_y\) and \(I_t\) as input. Please ignore aggregate
and sigma
parameters for now, we will come back to these a bit later. Add the visualized motion for the given “things” and “street” examples into your PDF report.
3 . (4 points)
Implement function warp
that warps the given (first) image with the provided flow estimates. Hint: You may find function griddata
from package scipy.interpolate
useful.
4 . (3 points)
We will need to verify that our flow estimates indeed minimise the objective. The objective/cost function minimised by Lucas-Kanade is the averaged SSD (Sum of Squared Differences) between pixels of the warped first image and the second image. Implement this cost function in compute_cost
. Specifically, first calculate the SSD between the warped image 1 and image 2, and then divide the SSD value by the number of pixels. The result can be intereperted as the averaged difference between pixels. Report the costs for the given “things” and “street” examples in your PDF.
5 . (4 points)
Note there are two additional arguments (aggregate
and sigma
) to the function compute_motion
we have not used. The summation of pixel values in patch \(R\) can also be weighted, that is, the pixels that are farther away from the centre pixel in the patch will have smaller contribution to the sum than the pixels closer to the centre pixel. This intuitive idea suggests a Gaussian weighting: the weights in the patch will correspond to the weights in the Gaussian kernel. Modify your implementation of compute_motion
to accept aggregate='gaussian'
and sigma
that sums the patch values in this fashion. The argument sigma
is the sigma
for the Gaussian kernel. You can use the default sigma=2.2
for this assignment. You are welcome to use code from Assignment 1 to construct Gaussian weights. Add the visualized motion for the given “things” and “street” examples into your PDF report, along with the corresponding costs.
Hand in your functions (i.e., *.py
files) or a Jupyter Notebook and PDF file showing results and answering questions. These must have sufficient comments for others to easily understand the code. Both Python files and PDF should be submitted through Gradescope.
© 2024 Leonid Sigal | Design by Andreas Viklund |