CSE 252A Computer Vision I Fall 2018 - Assignment 1¶

Instructor: David Kriegman¶

Assignment Published On: Tuesday, October 9, 2018¶

Due On: Tuesday, October 23, 2018 11:59 pm¶

Name: Jingpei Lu¶

Student ID: A91033661¶

Instructions¶

Review the academic integrity and collaboration policies on the course website.
This assignment must be completed individually.
This assignment contains theoretical and programming exercises. If you plan to submit hand written answers for theoretical exercises, please be sure your writing is readable and merge those in order with the final pdf you create out of this notebook. You could fill the answers within the notebook iteself by creating a markdown cell.
Programming aspects of this assignment must be completed using Python in this notebook.
If you want to modify the skeleton code, you can do so. This has been provided just to provide you with a framework for the solution.
You may use python packages for basic linear algebra (you can use numpy or scipy for basic operations), but you may not use packages that directly solve the problem.
If you are unsure about using a specific package or function, then ask the instructor and teaching assistants for clarification.
You must submit this notebook exported as a pdf. You must also submit this notebook as .ipynb file.
You must submit both files (.pdf and .ipynb) on Gradescope. You must mark each problem on Gradescope in the pdf.
Late policy - 10% per day late penalty after due date up to 3 days.

Problem 1: Perspective Projection [5 pts]¶

Consider a perspective projection where a point $$ P = [\text{x y z}]^T $$ is projected onto an image plane $\Pi'$ represented by $k = f'>0$ as shown in the following figure. The first second and third coordinate axes are denoted by $i$, $j$, $k$ respectively.

Consider the projection of two rays in the world coordinate system $$ Q1 = [\text{7 -3 1}] + t[\text{8 2 4}] $$ $$ Q2 = [\text{2 -5 9}] + t[\text{8 2 4}] $$ where $-\infty \leq t \leq -1$.

Calculate the coordinates of the endpoints of the projection of the rays onto the image plane. Identify the vanishing point based on the coordinates.

Problem 2: Thin Lens Equation [5 pts]¶

An illuminated arrow forms a real inverted image of itself at a distance of $w = 60\text{ cm}$, measured along the optical axis of a convex thin lens as shown above. The image is half the size of the object

How far from the object must the lens be placed? Whats is the focal length of the lens?
At what distance from the center of the lens should the arrow be placed so that the height of the image is the same?
What would be the type and location of image formed if the arrow is placed at a distance of 5 cm along the optical axis from the optical center?

Problem 3: Affine Projection [3 pts]¶

Show that the image of a pair of parallel lines in 3D space is a pair of parallel lines in an affine camera.

Problem 4: Image Formation and Rigid Body Transformations [10 points]¶

In this problem we will practice rigid body transformations and image formations through the projective and affine camera model. The goal will be to photograph the following four points $$^AP_1 = [\text{-1 -0.5 2}]^T$$, $$^AP_2 = [\text{1 -0.5 2}]^T$$, $$^AP_3 = [\text{1 0.5 2}]^T$$, $$^AP_4 = [\text{-1 0.5 2}]^T$$

To do this we will need two matrices. Recall, first, the following formula for rigid body transformation $$ ^BP = \text{ } ^B_AR\text{ }^AP + \text{ } ^BO_A $$ Where $^BP$ is the point coordinate in the target ($B$) coordinate system. $^AP$ is the point coordinate in the source ($A$) coordinate system. $^B_AR$ is the rotation matrix from $A$ to $B$, and $^BO_A$ is the origin of the coordinate system $A$ expressed in $B$ coordinates.

The rotation and translation can be combined into a single 4 $\times$ 4 extrinsic parameter matrix, $P_e$, so that $^BP = P_e \cdot \text{ }^AP$.

Once transformed, the points can be photographed using the intrinsic camera matrix, $P_i$ which is a 3 $\times$ 4 matrix.

Once these are found, the image of a point, $^AP$, can be calculated as $P_i \cdot P_e \cdot \text{ }^AP$.

We will consider four different settings of focal length, viewing angles and camera positions below. For each of these calculate:

a) Extrinsic transformation matrix,

b) Intrinsic camera matrix under the perspective camera assumption.

c) Intrinsic camera matrix under the affine camera assumption. In particular, around what point do you do the taylor series expansion?

d) Calculate the image of the four vertices and plot using the supplied functions

Your output should look something like the following image (Your output values might not match, this is just an example)

[No rigid body transformation]. Focal length = 1. The optical axis of the camera is aligned with the z-axis.
[Translation]. $^BO_A = [\text{0 0 1}]^T$. Focal length = 1. The optical axis of the camera is aligned with the z-axis.
[Translation and Rotation]. Focal length = 1. $^B_AR$ encodes a 30 degrees around the z-axis and then 60 degrees around the y-axis. $^BO_A = [\text{0 0 1}]^T$.
[Translation and Rotation, long distance]. Focal length = 5. $^B_AR$ encodes a 30 degrees around the z-axis and then 60 degrees around the y-axis. $^BO_A = [\text{0 0 13}]^T$.

You can refer the Richard Szeliski starting page 36 for image formation and the extrinsic matrix.

Intrinsic matrix calculation for perspective and affine camera models was covered in class and can be referred in slide 3 http://cseweb.ucsd.edu/classes/fa18/cse252A-a/lec3.pdf

We will not use a full intrinsic camera matrix (e.g. that maps centimeters to pixels, and defines the coordinates of the center of the image), but only parameterize this with $f$, the focal length. In other words: the only parameter in the intrinsic camera matrix under the perspective assumption is $f$, and the only ones under the affine assumption are: $f,x_0,y_0,z_0$, where $x_0,y_0,z_0$ is the center of the taylor series expansion.

Note that the axis are the same for each row, to facilitate comparison between the two camera models. Also include:

The actual points around which you did the taylor series expansion for the affine camera models.
How did you arrive at these points?
How do the projective and affine camera models differ? Why is this difference smaller for the last image compared to the second last?

Answer:¶

1.The actual points I use to do the taylor series for each camera is [0,0,2], [0,0,3], [1.73205081,0,2], [1.73205081,0,14] respectively.

2.To arrive this points, I fist derive the reference point [0,0,2], which is the center of the four given point. Then, I compute 4×4 extrinsic parameter matrix for each camera model. Then, the new reference point is computed by the matrix multiplication of the extrinsic parameter matrix and the old reference point.

3.The intrinsic matrix of the projective model and affine camera model are different. The affine transformation preserves the parallelism but projective transformation doesn't. The difference of the last image is smaller than the second last is because the z0 of the last image is larger and the focal length is 5. That makes the difference between the affine intrinsic matrix and projective intrinsic matrix smaller. Also, the large z0 means the object is far away from the optical center. If an object is far away from the camera, the orientation and transformation will be less affected in the vision.

import numpy as np
import matplotlib.pyplot as plt
import math


# convert points from euclidian to homogeneous
def to_homog(points):
    ones = [1]*points.shape[1]
    return np.row_stack((points, ones))
    

# convert points from homogeneous to euclidian
def from_homog(points_homog):
    points_homog = points_homog.astype(float)
    for i in range(points_homog.shape[1]):
        points_homog[:,i] = points_homog[:,i]/points_homog[-1,i]
    return points_homog[:-1,:]
    

# project 3D euclidian points to 2D euclidian
def project_points(P_int, P_ext, pts):
    
    pts = to_homog(pts)
    out = from_homog(np.dot(np.dot(P_int, P_ext), pts))
    
    return out
    


# Change the three matrices for the four cases as described in the problem
# in the four camera functions geiven below. Make sure that we can see the formula
# (if one exists) being used to fill in the matrices. Feel free to document with
# comments any thing you feel the need to explain. 

def get_affin_int(f,ori):
    # helper method to compute the intrinsic matrix of affine transformation
    matrix = np.array([[f/ori[2][0]              , 0                     ,
                      -f*ori[0][0]/(ori[2][0]**2), f*ori[0][0]/ori[2][0]],
                     [0                          , f/ori[2][0]           ,
                      -f*ori[1][0]/(ori[2][0]**2), f*ori[1][0]/ori[2][0]],
                     [0                          , 0                     ,
                      0                          , 1                    ]])
    return matrix

def camera1():
    # write your code here
    
    f = 1
    P_ext = np.eye(4,4)
    P_int_proj = np.eye(3,4)
    ori = np.reshape(np.array([0,0,2]),(3,1)) #take [0,0,2] as reference point
    P_int_affine = get_affin_int(f,ori)
    
    return P_int_proj, P_int_affine, P_ext

def camera2():
    # write your code here
    f = 1
    P_ext = np.eye(4,4)
    P_ext[:,3] += np.array([0,0,1,0])
    ori = np.reshape(np.array([0,0,2]),(3,1))
    ori = from_homog(np.dot(P_ext, to_homog(ori)))
    P_int_proj = np.eye(3,4)
    P_int_affine = get_affin_int(f,ori)
    
    return P_int_proj, P_int_affine, P_ext

def camera3():
    # write your code here
    f = 1
    rotation_z = np.array([[np.cos(np.pi/6), -np.sin(np.pi/6), 0, 0],
                 [np.sin(np.pi/6), np.cos(np.pi/6) , 0, 0],
                 [0              , 0               , 1, 0],
                 [0              , 0               , 0, 1]]) #rotation matrix about z axis
    rotation_y = np.array([[np.cos(np.pi/3) , 0, np.sin(np.pi/3), 0],
                 [0               , 1, 0              , 0],
                 [-np.sin(np.pi/3), 0, np.cos(np.pi/3), 0],
                 [0               , 0, 0              , 1]]) #rotation matrix about y axis
    P_ext = np.dot(rotation_y,rotation_z)
    P_ext[:,3] += np.array([0,0,1,0]) 
    ori = np.reshape(np.array([0,0,2]),(3,1))
    ori = from_homog(np.dot(P_ext, to_homog(ori)))
    P_int_proj = np.eye(3,4)
    P_int_affine = get_affin_int(f,ori)
    return P_int_proj, P_int_affine, P_ext

def camera4():    
    # write your code here
    
    f = 5
    rotation_z = np.array([[np.cos(np.pi/6), -np.sin(np.pi/6), 0, 0],
                 [np.sin(np.pi/6), np.cos(np.pi/6) , 0, 0],
                 [0              , 0               , 1, 0],
                 [0              , 0               , 0, 1]])
    rotation_y = np.array([[np.cos(np.pi/3) , 0, np.sin(np.pi/3), 0],
                 [0               , 1, 0              , 0],
                 [-np.sin(np.pi/3), 0, np.cos(np.pi/3), 0],
                 [0               , 0, 0              , 1]])
    P_ext = np.dot(rotation_y,rotation_z)
    P_ext[:,3] += np.array([0,0,13,0]) 
    ori = np.reshape(np.array([0,0,2]),(3,1))
    ori = from_homog(np.dot(P_ext, to_homog(ori)))
    P_int_proj = np.eye(3,4)
    P_int_proj[2,:] = [0,0,1/f,0] 
    P_int_affine = get_affin_int(f,ori)
    return P_int_proj, P_int_affine, P_ext


# Use the following code to display your outputs
# You are free to change the axis parameters to better 
# display your quadrilateral but do not remove any annotations

def plot_points(points, title='', style='.-r', axis=[]):
    inds = list(range(points.shape[1]))+[0]
    plt.plot(points[0,inds], points[1,inds],style)
    
    for i in range(len(points[0,inds])):
        plt.annotate(str("{0:.3f}".format(points[0,inds][i]))+",  "+str("{0:.3f}".format(points[1,inds][i])),(points[0,inds][i], points[1,inds][i]))
    
    if title:
        plt.title(title)
    if axis:
        plt.axis(axis)
        
    plt.tight_layout()
        
def main():
    point1 = np.array([[-1,-.5,2]]).T
    point2 = np.array([[1,-.5,2]]).T
    point3 = np.array([[1,.5,2]]).T
    point4 = np.array([[-1,.5,2]]).T
    points = np.hstack((point1,point2,point3,point4))
    
    for i, camera in enumerate([camera1, camera2, camera3, camera4]):
        P_int_proj, P_int_affine, P_ext = camera()
        plt.subplot(1, 2, 1)
        plot_points(project_points(P_int_proj, P_ext, points), title='Camera %d Projective'%(i+1), axis=[-.6,3,-.6,1])
        plt.subplot(1, 2, 2)
        plot_points(project_points(P_int_affine, P_ext, points), title='Camera %d Affine'%(i+1), axis=[-.6,3,-.6,1])
        plt.show()

main()

Problem 5: Homography [12 pts]¶

You may use eig or svd routines in python for this part of the assignment.

Consider a vision application in which components of the scene are replaced by components from another image scene.

In this problem, we will implement partial functionality of a smartphone camera scanning application (Example: CamScanner) that, in case you've never used before, takes pictures of documents and transforms it by warping and aligning to give an image similar to one which would've been obtained through using a scanner.

The transformation can be visualized by imagining the use of two cameras forming an image of a scene with a document. The scene would be the document you're trying to scan placed on a table and one of the cameras would be your smart phone camera, forming the image that you'll be uploading and using in this assignment. There can also be an ideally placed camera, oriented in the world in such a way that the image it forms of the scene has the document perfectly algined. While it is unlikely you can hold your phone still enough to get such an image, we can use homography to transform the image you take into the image that the ideally placed camera would have taken.

This digital replacement is accomplished by a set of corresponding points for the document in both the source (your picture) and target (the ideal) images. The task then consists of mapping the points from the source to their respective points in the target image. In the most general case, there would be no constraints on the scene geometry, making the problem quite hard to solve. If, however, the scene can be approximated by a plane in 3D, a solution can be formulated much more easily even without the knowledge of camera calibration parameters.

To solve this section of the homework, you will begin by understanding the transformation that maps one image onto another in the planar scene case. Then you will write a program that implements this transformation and use it to warp some document into a well aligned document (See the given example to understand what we mean by well aligned).

To begin with, we consider the projection of planes in images. imagine two cameras $C_1$ and $C_2$ looking at a plane $\pi$ in the world. Consider a point $P$ on the plane $\pi$ and its projection $p=[\text{u1, v1, 1}]^T$ in the image 1 and $q=[\text{u2, v2, 1}]^T$ in image 2.

There exists a unique, upto scale, 3 $\times$ 3 matrix $H$ such that, for any point $P$: $$q \approx Hp$$ Here $\approx$ denotes equality in homogeneous coordinates, meaning that the left and right hand sides are proportional. Note that $H$ only depends on the plane and the projection matrices of the two cameras.

The interesting thing about this result is that by using $H$ we can compute the image of $P$ that would be seen in the camera with center $C_2$ from the image of the point in the camera with center at $C_1$, without knowing the three dimensional location. Such an $H$ is a projective transformation of the plane, called a homography.

In this problem, complete the code for computeH and warp functions that can be used in the skeletal code that follows.

There are three warp functions to implement in this assignment, example ouputs of which are shown below. In warp1, you will create a homography from points in your image to the target image (Mapping source points to target points). In warp2, the inverse of this process will be done. In warp3, you will create a homography between a given image and your image, replacing your document with the given image.

2. 3.
In the context of this problem, the source image refers to the image of a document you take that needs to be replaced into the target.
The target image can start out as an empty matrix that you fill out using your code.
You will have to implement the computeH function that computes a homography. It takes in the point correspondences between the source image and target image in homogeneous coordinates respectively and returns a 3 $\times$ 3 homography matrix.
You will also have to implement the three warp functions in the skeleton code given and plot the resultant image pairs. For plotting, make sure that the target image is not smaller than the source image.

Note: We have provided test code to check if your implementation for computeH is correct. All the code to plot the results needed is also provided along with the code to read in the images and other data required for this problem. Please try not to modify that code.

You may find following python built-ins helpful: numpy.linalg.svd, numpy.meshgrid

import numpy as np
from scipy.misc import imread, imresize
from scipy.io import loadmat
import matplotlib.pyplot as plt

# load image to be used - resize to make sure it's not too large
# You can use the given image as well
# A large image will make testing you code take longer; once you're satisfied with your result,
# you can, if you wish to, make the image larger (or till your computer memory allows you to)
source_image = imresize(imread("photo.jpg"),.1)[:,:,:3]/255.

# display images
plt.imshow(source_image)

# Align the polygon such that the corners align with the document in your picture
# This polygon doesn't need to overlap with the edges perftectly, an approximation is fine
# The order of points is clockwise, starting from bottom left.
x_coords = [11,50,195,195] 
y_coords = [115,10,19,150]

# Plot points from the previous problem is used to draw over your image 
# Note that your coordinates will change once you resize your image again
source_points = np.vstack((x_coords, y_coords))
plot_points(source_points)

plt.show()

/anaconda2/envs/py36/lib/python3.5/site-packages/ipykernel/__main__.py:10: DeprecationWarning: `imread` is deprecated!
`imread` is deprecated in SciPy 1.0.0, and will be removed in 1.2.0.
Use ``imageio.imread`` instead.
/anaconda2/envs/py36/lib/python3.5/site-packages/ipykernel/__main__.py:10: DeprecationWarning: `imresize` is deprecated!
`imresize` is deprecated in SciPy 1.0.0, and will be removed in 1.2.0.
Use ``skimage.transform.resize`` instead.

def computeH(source_points, target_points):
    # returns the 3x3 homography matrix such that:
    # np.matmul(H, source_points) ~ target_points
    # where source_points and target_points are expected to be in homogeneous
    
    # Please refer the note on DLT algorithm given at: 
    # https://cseweb.ucsd.edu/classes/wi07/cse252a/homography_estimation/homography_estimation.pdf
    
    # make sure points are 3D homogeneous
    assert source_points.shape[0]==3 and target_points.shape[0]==3
    n = source_points.shape[1]
    A = np.zeros((2*n,9))
    for i in range(n):
        x1 = source_points[0][i]
        y1 = source_points[1][i]
        x2d = target_points[0][i] / target_points[2][i]
        y2d = target_points[1][i] / target_points[2][i]
        A[2*i,:] = [-x1, -y1, -1, 0, 0, 0, x2d*x1, x2d*y1, x2d]
        A[2*i+1,:] = [0, 0, 0, -x1, -y1, -1, y2d*x1, y2d*y1, y2d]
    
    U, S, V = np.linalg.svd(A, full_matrices=True)
    
    return V[-1].reshape(3,3)
    

#######################################################
# test code. Do not modify
#######################################################
def test_computeH():
    source_points = np.array([[0,0.5],[1,0.5],[1,1.5],[0,1.5]]).T
    target_points = np.array([[0,0],[1,0],[2,1],[-1,1]]).T
    H = computeH(to_homog(source_points), to_homog(target_points))
    mapped_points = from_homog(np.matmul(H,to_homog(source_points)))

    plot_points(source_points,style='.-k')
    plot_points(target_points,style='*-b')
    plot_points(mapped_points,style='.:r')
    plt.show()
    print('The red and blue quadrilaterals should overlap if ComputeH is implemented correctly.')
test_computeH()

The red and blue quadrilaterals should overlap if ComputeH is implemented correctly.

def warp(source_img, source_points, target_size):
    # Create a target image and select target points to create a homography from source image to target image,
    # in other words map all source points to target points and then create
    # a warped version of the image based on the homography by filling in the target image.
    # Make sure the new image (of size target_size) has the same number of color channels as source image
    assert target_size[2]==source_img.shape[2]
    target_image = np.ones(target_size)
    target_points = np.array([[0,0],[target_size[1],0],
                              [target_size[1],target_size[0]],
                              [0,target_size[0]]]).T
    H = computeH(to_homog(source_points), to_homog(target_points))
    
    for y in range(source_img.shape[0]):
        for x in range(source_img.shape[1]):
            mapped_points = from_homog(np.matmul(H,to_homog(np.array([[x],[y]]))))
            mapped_points = mapped_points.astype(int)
            if ((mapped_points[0] in range(target_image.shape[1])) and
            (mapped_points[1] in range(target_image.shape[0]))):
                target_image[mapped_points[1],mapped_points[0],:]= source_img[y,x,:]
    return target_image
    

# Use the code below to plot your result
result = warp(source_image, source_points, (200,140,3))
plt.subplot(1, 2, 1)
plt.imshow(source_image)
plt.subplot(1, 2, 2)
plt.imshow(result)
plt.show()

The output of warp1 of your code probably has some striations or noise. The larger you make your target image, the less it will resemble the document in the source image. Why is this happening?

Answer:

That is because we have to convert the mapped coordinates to integer, that reduce the precision of the mapping. Therefore, some pixel in target image cannot be mapped from the source image.

To fix this, implement warp2, by creating an inverse homography matrix and fill in the target image.

def warp2(source_img, source_points, target_size):
    # Create a target image and select target points to create a homography from target image to source image,
    # in other words map each target point to a source point, and then create a warped version
    # of the image based on the homography by filling in the target image.
    # Make sure the new image (of size target_size) has the same number of color channels as source image
    target_image = np.ones(target_size)
    target_points = np.array([[0,0],[target_size[1],0],
                              [target_size[1],target_size[0]],
                              [0,target_size[0]]]).T
    H = computeH(to_homog(target_points), to_homog(source_points))
    
    for y in range(target_image.shape[0]):
        for x in range(target_image.shape[1]):
            mapped_points = from_homog(np.matmul(H,to_homog(np.array([[x],[y]]))))
            mapped_points = mapped_points.astype(int)
            if ((mapped_points[0] in range(source_img.shape[1])) and
            (mapped_points[1] in range(source_img.shape[0]))):
                target_image[y,x,:]= source_img[mapped_points[1],mapped_points[0],:]
    return target_image



# Use the code below to plot your result
size_factor = 1
result = warp2(source_image, source_points, (600*size_factor,420*size_factor,3))
plt.subplot(1, 2, 1)
plt.imshow(source_image)
plt.subplot(1, 2, 2)
plt.imshow(result)
plt.show()

Try playing around with the size of your target image in warp1 versus in warp2, additionally you can also implement nearest pixel interpolation or bi-linear interpolations and see if that makes a difference in your output.

In warp3, you'll be replacing the document in your image with a provided image. Read in "ucsd_logo.png" as the source image, keeping your document as the target.

# Load the given UCSD logo image
source_image2 = imread('ucsd_logo.png')[:,:,:3]/255.

def warp3(target_image, target_points, source_image):
    source_size = source_image.shape
    source_points = np.array([[0,0],[source_size[1],0],
                              [source_size[1],source_size[0]],
                              [0,source_size[0]]]).T
    H = computeH(to_homog(source_points), to_homog(target_points))
    
    for x in range(source_size[1]):
        for y in range(source_size[0]):
            mapped_points = from_homog(np.matmul(H,to_homog(np.array([[x],[y]]))))
            mapped_points = mapped_points.astype(int)
            target_image[mapped_points[1],mapped_points[0],:] = source_image[y,x,:]
            
    return target_image

# Use the code below to plot your result
result = warp3(source_image, source_points, source_image2)
plt.subplot(1, 2, 1)
plt.imshow(source_image2)
plt.subplot(1, 2, 2)
plt.imshow(result)
plt.show()

/anaconda2/envs/py36/lib/python3.5/site-packages/ipykernel/__main__.py:2: DeprecationWarning: `imread` is deprecated!
`imread` is deprecated in SciPy 1.0.0, and will be removed in 1.2.0.
Use ``imageio.imread`` instead.
  from ipykernel import kernelapp as app