Skip to content

3D

Camera

References: - cameras · PyTorch3D - Pinhole Camera - Kornia

Camera Coordinate Systems

Camera Coordinate System: A reference frame that changes with camera position and orientation. In the camera coordinate system, the origin is at the camera position, with the Z-axis typically aligned with the camera's line of sight, and X and Y axes aligned with the camera's horizontal and vertical axes respectively.

World Coordinate System: A fixed, global reference frame used to describe object positions in a scene. In this coordinate system, each object's position is defined relative to a fixed point (world origin).

Intrinsic Matrix

Camera Intrinsic Matrix with Example in Python | by Neeraj Krishna | Towards Data Science

  • fx and fy: Focal lengths in pixels along the image plane's x and y axes. These reflect the lens magnification of the scene. Ideally, for square pixels, fx and fy should be identical, but they may differ slightly due to lens distortion and manufacturing tolerances.
  • cx and cy: Principal point coordinates, representing the image coordinate system origin's position on the image plane. Typically assumed to be at the image center but may be offset due to imprecise lens manufacturing and assembly.

The camera intrinsic matrix K is typically represented as:

\[ K=\begin{bmatrix}f_x&0&c_x\\0&f_y&c_y\\0&0&1\end{bmatrix} \]

Extrinsic Matrix

Camera Extrinsic Matrix with Example in Python | by Neeraj Krishna | Towards Data Science

Describes the camera's position and orientation in global space (world coordinate system):

\[ E_t=\begin{bmatrix}R&\mathbf{t}\end{bmatrix} \]
  • Rotation: The 3x3 rotation matrix component enables object rotation around the origin. Rotations can be single-axis (around X, Y, or Z) or any combination thereof.
  • Translation: The Tx, Ty, Tz elements enable object movement along each direction in 3D space.

In practice, a 4x4 homogeneous transformation matrix is often used to handle both rotation and translation, simplifying calculations through a single matrix multiplication.

Homogeneous Coordinate Transformation Matrix

\[ \mathbf{M}_{\mathrm{c2w}}=\begin{bmatrix}R&T\\0&1\end{bmatrix} \]
  • \(R\) (Rotation Matrix): Describes how the camera coordinate system's basis vectors (forward, up, and right directions) are rotated relative to the world coordinate system.
  • \(T\) (Translation Vector): Represents the camera coordinate system's origin (camera's optical center) position in the world coordinate system.

This matrix transforms points from camera coordinates to world coordinates through rotation by \(R\) followed by translation by \(T\).

For transforming \(P_w\) to \(P_c\), we use \(\mathbf{M}_{\mathrm{w2c}}\), which can be obtained through inverse transformation: \(\mathbf{M}_{\mathrm{w2c}}=\mathbf{M}_{\mathrm{c2w}}^{-1}\)

Trajectory files (traj.txt) typically contain 4x4 transformation matrices per line:

R11 R12 R13 Tx
R21 R22 R23 Ty
R31 R32 R33 Tz
 0   0   0  1

Coordinate Transformation During Imaging

assets/Pasted image 20241117134255.png

The projection of 3D world coordinates to 2D image plane involves:

  1. World to Camera Coordinates: Using homogeneous extrinsic matrix \(\mathbf{M}_\mathrm{w2c}=\begin{bmatrix}R&T\\0&1\end{bmatrix}\), transform \(P_w=(X_w,Y_w,Z_w,1)^T\) to \(\mathbf{P}_c=(X_c,Y_c,Z_c)^T\):
\[ \mathbf{P}_c=\mathbf{M}_{\mathrm{w2c}}\cdot\mathbf{P}_w \]
  1. Camera to Image Plane: Using intrinsic matrix \(K\), project \(\mathbf{P}_c=(X_c,Y_c,Z_c)^T\) to image plane pixel \(P_i=(u,v)\):
\[ \mathbf{P}_i=\mathbf{K}\cdot\begin{bmatrix}X_c\\Y_c\\Z_c\end{bmatrix}/Z_c \]