\( \newcommand{\vecIII}[3]{\left[\begin{array}{c} #1\\#2\\#3 \end{array}\right]} \newcommand{\vecIV}[4]{\left[\begin{array}{c} #1\\#2\\#3\\#4 \end{array}\right]} \newcommand{\Choose}[2]{ { { #1 }\choose{ #2 } } } \newcommand{\vecII}[2]{\left[\begin{array}{c} #1\\#2 \end{array}\right]} \newcommand{\vecIII}[3]{\left[\begin{array}{c} #1\\#2\\#3 \end{array}\right]} \newcommand{\vecIV}[4]{\left[\begin{array}{c} #1\\#2\\#3\\#4 \end{array}\right]} \newcommand{\matIIxII}[4]{\left[ \begin{array}{cc} #1 & #2 \\ #3 & #4 \end{array}\right]} \newcommand{\matIIIxIII}[9]{\left[ \begin{array}{ccc} #1 & #2 & #3 \\ #4 & #5 & #6 \\ #7 & #8 & #9 \end{array}\right]} \)

Camera API

Reading: Chapter 2 of Jos Dirksen's book, pages 57-64.

Additional Reading. If the following is confusing or you just want to hear a different explanation, feel free to consult the following:

  • The Red Book, chapter 3
  • Angel, sections 1.5-1.6 and 5.2-5.5
  • Hearn and Baker, sections 7-3 and 7-10

There are two aspects to the camera API: the placement of the camera, and the shape of the camera. To understand the latter, we have to understand how cameras work.

The Pinhole Camera

cutaway side view of pinhole camera

This is a cutaway side view of a pinhole camera (in Latin, the "camera obscura," which means "dark room"). The z axis is imaginary, but the box on the left part of the picture is a real box: solid on all six sides. There is a tiny pinhole in the front of the box, where the circle is on the picture. The origin is placed at the pinhole, the y axis goes straight up through the front of the box, and the z axis points out into the scene. The depth of the box is d, and the height of the box is h. We also might care about the width of the box, but that's not visible on this picture, since we've eliminated that dimension.

cutaway side view of pinhole camera and a tree

Rays of light from the outside go through the hole and land on the image plane, which is the entire back of the box. For example, a ray of light from the top of the tree lands on the back of the box as shown. Because the hole is so tiny, only light from along a single ray can land on any spot on the back of the camera. So, in this example, only light from the top of the tree can land at that spot. This is why pinhole camera pictures are so sharp. Theoretically, they can provide infinite clarity, though in practice other issues arise (diffraction of light rays and the lack of enough photons). However in computer graphics, we only need the theoretical camera.

Pinhole cameras are simple and work really well. They're standard equipment for viewing solar eclipses. We only use lenses so that we can gather more light. A perfect pinhole camera only really allows one ray of light from each place in the scene, and that would make for a really dark image, or require very sensitive film (or retinas). Since the need to gather enough light is not important in computer graphics, the OpenGL model is of a pinhole camera.

Points to notice:

  • The projection of an image (such as the tree) can be computed using similar triangles.
  • The effects of perspective are manifest, such as things getting smaller with distance.
  • Parallel lines seem to converge at a "vanishing point."
  • One disadvantage of a pinhole camera is that the image is upside down on the film. (Your SLR camera has "through the lens" viewing, but uses additional lenses to flip the image right side up.) We'll see how to address this soon.

Computing Projection by similar triangles

Suppose we want to compute the projection of the top of the tree. We know the coordinates of the top of the tree, let them be (X,Y,Z). We want to know the coordinates of the projection, let them be (x,y,z). We can do this by similar triangles, using the two yellow triangles in the following figure:

the similar triangles used to compute the projection of the tree

The height and base of the big triangle are just Y and Z. The height and base of the little triangle are y and z. The base of the little triangle is known, because that's determined by the shape and size of our pinhole camera. In the figure, it's notated as "d." By similar triangles, we know: \[ y/d = Y/Z \\ y = d*Y/Z \\ y = Y/(Z/d) \]

Everything on the right hand side is known, so we can compute the projection of any point simply by knowing the depth of our pinhole camera and the location of the point.

By the way, the reason we do this last algebraic step of dividing by $Z/d$ is because of the way we will use this in our projection matrix.

Projecting the x coordinate works the same way.

The Synthetic Camera

There's a hassle with signs in the pinhole camera, such as whether the z value for the tree is negative. Also, the image ends up being upside down. In CG, since we're really only interested in the mathematics of projection, we use a synthetic camera, in which the image plane is placed on the same side of the origin as the scene.

the synthetic camera has the image plane between the focal point and the scene

  • In CG, we can put the image plane in front of the focal point. This means the image is right side up.
  • Mathematically, we'll make the origin be the focal point, with the camera pointing down the negative z axis.
  • The image plane is the top of a frustum (a truncated pyramid). See the demo below.
  • The frustum is also our view volume. Anything outside the view volume is "clipped" away.
  • Note that this also means that the CG system can't see infinitely far. That's because it needs to calculate relative depth, and it can't make infinitely fine distinctions.
  • The projection is computed using a projection matrix
  • Note that there is also an option of doing parallel projection rather than perspective projection. Parallel projection is useful in architectural drawings and the like. We will rarely use it.

Demo: Frustum

The view volume of the synthetic (perspective) camera is a frustum: a truncated rectangular pyramid. Please follow this link to this demo:

Demo: Camera

It's easier to understand the power of the camera parameters when we can compare the geometry of the frustum with the resulting rendering of the scene. In the following demo, we see a scene with a teddy bear and a camera, and the rendered result. Please follow the link to:

Perspective Matrices and Perspective Division

For the same reason that we want to make all our affine transformations into matrix multiplications, we want to make projection into a matrix multiplication.

There are two kinds of projection available in in OpenGL and Three.js.

  • Orthographic (Parallel). With this kind of projection, the view volume is a rectangular box, and we project just by squeezing out one dimension. As we mentioned earlier, this kind of projection is useful for architectural drawings and situations where there really isn't any perspective. If we align the direction of projection (DOP) with the Z axis, this kind of projection amounts to setting the z coordinate of all points to zero. Here's the orthographic projection matrix: \[ \left[ \begin{array}{rrrr} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right] \vecIV{x}{y}{z}{1} \]

    Notice how it zeros out the z coordinate of the vertex. It's easier to come up with the matrix for that than to write it in HTML!

  • Perspective. With this kind of projection, if we set up a frame where the origin is the center of projection (COP) and the image plane is parallel to the z=0 plane (the xy plane), we can compute the projection of each point using the similar triangles calculation, dividing each Y and X by Z/d.

The matrix for perspective projection isn't obvious. It involves using homogeneous coordinates and leaving part of the calculation undone.

The part of the calculation that is left undone is called perspective division. The idea is that the homogeneous coordinate (X,Y,Z,W) is the same as (X/W, Y/W, Z/W, 1): that is, we divide through by w. If $w=1$, this is a null operation, so it doesn't change our ordinary vertices. If, however, W has the value $Z/d$, this perspective division accomplishes what we did earlier in our similar triangles, namely: \[ y = Y/(Z/d) \]

Therefore the perspective matrix is a matrix that accomplishes setting $W=Z/d$ and leaves the other coordinates unchanged. Since the last row of the matrix computes W, all we need to do is put $1/d$ in Z column of the W row. The perspective matrix, then, is just the following matrix: \[ \left[ \begin{array}{rrrr} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 1/d & 0 \end{array} \right] \]

Let's consider how that matrix transforms an arbitrary point: \[ \vecIV{x}{y}{z}{z/d} \left[ \begin{array}{rrrr} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 1/d & 0 \end{array} \right] \vecIV{x}{y}{z}{1} \]

To transform the result into a vector with \( w=1 \), we do the perspective division step, namely we divide all the components by the \( z/d \), yielding: \[ \vecIV{x/(z/d)}{y/(z/d)}{z/(z/d)}{1} = \frac{1}{z/d} \vecIV{x}{y}{z}{z/d} \]

This is exactly the point we want, namely the projection of the original point onto an image plane at a distance of \( d \).

Here's a PDF document that gives the math for projection

The OpenGL API for Camera Shape

Now that we know the basic theory, how do we get this to work in OpenGL? In OpenGL, to specify the camera shape (which determines the projection matrix), we have a choice between

  • Directly specifying the frustum using
            glFrustum(left,right,bottom,top, near,far);
  • Indirectly specifying the frustum using

The latter is more intuitive, so Three.js only offers the latter kind of API, though with some changes. Let's look at both, though.

glFrustum(left,right,bottom,top, near,far);

Directly specifying the frustum using glFrustum is slightly more powerful than gluPerspective, but it can be unintuitive and therefore difficult. The first five arguments are all describing the rectangle that is the image plane, which is where the view volume will be projected. The first pair of arguments are the X coordinates of the left and right edges of the image plane; the second pair are Y coordinates of the top and bottom edges of the image plane. The near parameter is the distance to the image plane from the origin. Finally, the far parameter is the distance to the other side of the frustum, parallel to the image plane. The X and Y pairs are typically signed (positive and negative), while the Z pair must both be positive: distances measure from the origin. Furthermore:

0 < near < far

Here's a visualization of the frustum with these distances labeled:

frustum from positive Z axis frustum from positive X axis
left and right are measured along the X axis, top and bottom along the Y axis. Near and far are measured as positive distances along the negative z axis. Equivalently, there is a left-handed coordinate system in which they are the z coordinates.
Note that in the usual case,
left < 0 < right
bottom < 0 < top


Alternatively, we can specify the frustum indirectly using


This builds a frustum that is symmetrical around the Z axis. (So "left" and "right" are negatives of each other, such as -5 and +5, and similarly for "top" and "bottom".) The first argument is the "field of view": the angle at the apex of the pyramid in the YZ plane. The second is the aspect ratio of the top of the frustum. The last two arguments are the same as for glFrustum. It looks like this:

frustum with
field-of-view-y aspect ratio
is width/height
"Field of View Y" is the angle formed by the top and bottom of the frustum in the YZ plane.the aspect ratio of the top of the frustum is width/height

The Meaning of FOVY

The first argument of gluPerspective() is called "fovy," which stands for "field of view, y." This is the angle between the upper and lower sides of the frustum.

Picture a pyramid, but not a normal square pyramid, but one where each slice through it is a rectangle. Picture it extreme, just to make this visualization easier. Now, think about the apex of the pyramid and the angle between opposite walls. There's a narrow angle between one pair of walls (the ones along the long sides of the rectangles) and a wider angle between the other pair. Thus, in the general case, there isn't just one angle for a frustum.

Since the frustum is aligned with the Z axis, we can refer to the angle in the YZ plane and the angle in the XZ plane. OpenGL arbitrarily chose to control the angle of the frustum using the angle in the YZ plane, hence the "field of view y." This is just the angle between the upper and lower (not left and right) sides of the frustum.

The FOVY is fairly intuitive with practice. Here are some things to keep in mind. (Try sketching to see what's going on.)

  1. The greater the FOVY, the larger the frustum and therefore the view volume. (all else being equal)
  2. The larger the view volume, the smaller the projection of anything in it. Therefore, things will appear small on the screen (all else being equal).
  3. The closer the focal point (the center of projection or COP) to an object, the greater the angle that lines converge, so you get more noticeable perspective effects.
  4. The perspective effect on an object depends on the relative depth of its front and back with respect to the COP. For example if the back of a cube is twice as far from the COP as the front, the image of the back will be half the size of the image of the front, so the perspective effect will be pronounced.
  5. The farther the COP, the more parallel the projection lines, so you see less of a perspective effect.
  6. The closer the image plane to the object, the larger its projection will be, and therefore the larger its image.
  7. The human eye's FOV is approximately 90. A 50mm lens has a roughly 90 degree FOV, which is why photographers often choose that lens for natural scenes.
  8. TW's camera setup uses a FOV of 90, and puts the camera very close to the scene. The result is that the image is big, but we get extreme perspective.
  9. A zoom lens can also make the image big, but "flattens" the scene much more. A zoom lens has a small FOVY.
  10. A wide-angle lens captures a lot of the scene, but makes everything small (so it'll fit). A wide-angle lens has a big FOVY.

The following screenshot shows how much difference camera distance and shape can make. The two cubes are identical; what's different is the FOVY and the distance of the COP from the scene.

Here's a screen shot of the demo:

two shots of a cube

  • In the shot on the left, the FOVY=90 and camera is at z=2 and the image plane (the top of the frustum) is at z=1. The camera is 1 unit from the top of the frustum. The camera is 1.5 units from the near face of the cube and 2.5 units from the far face, so the projection of the far face is 3/5ths the size of the projection of the near face.
  • In the shot on the right, the camera has been pulled back to be 3 units from the top of the frustum. The frustum has been left at z=1. Therefore, the FOVY=37 degrees. The top of the frustum is the same size and the same distance from the cube. However, the picture looks quite different.

The following are two diagrams from the side illustrating what is going on. The magenta cube is the wire cube, and it is centered around the origin, just as it is in the code. The z axis (blue arrow) goes horizontally to the left. The y axis (green arrow) goes up. The frustum is in red.

diagram of the 90 degree frustum diagram of the 37 degree frustum
In this picture, the frustum has a FOVY=90. In this picture, the frustum has a FOVY=37. The top of the frustum has been kept the same size by pulling back the camera to compensate.

Perspective Cameras in Three.js

In Three.js, we can set up a perspective camera like this:

  var camera = new THREE.PerspectiveCamera(fov,aspect_ratio,near,far);

We'll think of this API as setting up the camera shape (the geometry of the frustum). Below, we look inside perspective Camera, to see exactly how that frustum is set up. For now, we'll think of it as a black box. Next, let's look at how to locate and point the camera.

The Three.js API for Camera Location

As we saw above, the perspective and orthographic projections work when the axis is aligned with the Z axis. This is actually the same as the initial coordinate system in OpenGL. But what if we don't want our scene/camera set up that way?

In OpenGL, there is a method called

    gluLookAt(eyex,eyey,eyez, atx,aty,atz, upx,upy,upz);
  • The "eye" point is the location of the focal point (also known as the "center of projection" or COP), as a point in space. Yet another standard term for this is the VRP: view reference point.
  • The "at" point is the location of some point in the direction we want the camera to face. It doesn't even have to be in the view volume. It's only used to figure out where the camera is pointed. A standard term for the direction the camera is pointed is the VPN: view plane normal. (Here's our first encounter with a vector in OpenGL.) This point is actually a very convenient concept, because it makes it easy to aim our camera at something location that should project to the center of the picture. For example, if we want to take a picture of a person, the at point might be the tip of their nose, or between their eyes.
  • The "up" vector says which direction, projected onto the image plane, is the same as the vertical direction on the monitor (parallel with the left edge of the canvas). Note that it is a vector, not a point. A standard term for this is "VUP."

The camera is positioned at a point called the eye, and it is facing a point called at. That is, it's looking at (atx,aty,atz). Finally, the vector (upx,upy,upz) orients the camera (for example, landscape versus portrait).

Three.js, uses this same idea, but in a more general way. A Camera is just a subclass of Object3D, so it can be positioned using the position attribute (or the Translate methods), and it can be rotated as well. (I have yet to need to scale a camera.)

However, there are some other useful methods that Three.js defines for us. All instances of Object3D() have a up attribute that is a vector indicating which way is up for that object. That's very convenient to have for a camera, as you've seen with the TeddyBear demo.

Secondly, there's a useful method called lookAt, which points an object to face a particular point, the argument. That method also uses the up vector to orient the object appropriately. You should use that method last, after setting the location and up vector.

Thus, a good way to set up a camera is:

function setupCamera(scene) {
    // camera shape
    var fov    = cameraParams.fov || 75;  // in degrees
    var aspect = cameraParams.aspectRatio || 400/300;  // canvas width/height
    var near   = cameraParams.near ||  5;  // measured from eye
    var far    = cameraParams.far  || 30;  // measured from eye
    camera = new THREE.PerspectiveCamera(fov,aspect,near,far);
    // camera location
    camera.lookAt(new THREE.Vector(atx,aty,atz));
    return camera;


There's a bit more work we have to do to create a canvas and to have THREE.js render our scene on that canvas using our camera. We'll look at two ways to do with, using TW and without TW. It's your choice whether to use TW when using your own camera.


We will always need a THREE.Renderer object. This object has a method, called render, that takes a scene and a camera and renders the scene using the camera. Any time you adjust your scene or your camera, you'll want to re-invoke this function. If you have globals holding your camera and your scene, you might just define a simpler wrapper function to do the rendering:

function render() {
        renderer.render( scene, camera );

Creating a renderer object causes an HTML canvas object to be created (with a default size of 300x150, which is quite small). However, the canvas is not added to the document; you have to do that yourself (or get TW to do it for you).

First, because the default canvas is so small, we'll use CSS to set up a policy of the size of the canvas. Here, I'll use 800 x 500, and so I'll use 800/500 for the aspect ratio of my camera (the top of the frustum). You can also consider using a canvas of size 100% x 100%, covering the whole browser. If you do that, use canvasElt.clientWidth/canvasElt.clientHeight as the camera's aspect ratio, where canvasElt is a variable defined below.

canvas {
    display: block;
    width: 800px;
    height: 500px;
    margin: 10px auto;

Without TW

Let's put all these ideas together, without using TW. Next, we'll see how TW simplifies this a little. Here's the JavaScript:

var scene = new THREE.Scene();
var renderer = new THREE.WebGLRenderer();
var canvasElt = renderer.domElement;
renderer.setClearColor( 0xdddddd, 1);

You can see the whole code in action in Simple Camera Without TW

Using TW

TW's mainInit function takes a renderer and a scene as its arguments, extracts the HTML canvas element from the renderer, and adds it to the document. It also sets its size and clears it to light gray.

Here is the JavaScript code:

var scene = new THREE.Scene();
var renderer = new THREE.WebGLRenderer();

You can see the whole code in action in Simple Camera With TW


The OpenGL API is almost the same as another API for camera positioning that is common in the Computer Graphics field, namely:

  • VRP: View Reference Point, the location of the eye
  • VPN: View Plane Normal, the orientation of the view plane
  • VUP: View Up, the rotation of the view plane

The only difference is that this API specifies a direction to look in (VPN) rather than specifing a point to look at.

Other Terminology

The following terms are commonly used for the kinds of motions for cameras and such.

  • pan: rotating a fixed camera around a vertical axis
  • tilt: rotating a fixed camera around a horizontal axis
  • zoom: adjusting the lens to zoom in or out. (This adjusts the frustum.)
  • roll: rotating a camera or a ship around a longitudinal axis
  • pitch: same as tilt, but for ships and airplanes
  • yaw: same as pan, but for ships and airplanes
  • strafe: moving a camera along a horizontal axis; this terminology is used in video games I believe.

Inside Perspective Camera

In case you're curious, and you should be, here's the code in Three.js that creates a projection matrix for a THREE.Perspective() camera:

function makePerspective( fov, aspect, near, far ) {
  var ymax = near * Math.tan( THREE.Math.degToRad( fov * 0.5 ) );
  var ymin = - ymax;
  var xmin = ymin * aspect;
  var xmax = ymax * aspect;

  return this.makeFrustum( xmin, xmax, ymin, ymax, near, far );

Note that xmin is the same thing as left, and so forth, so the final line is just like glFrustum.

The makePerspective method is actually a method of a THREE.Matrix4(). The math for making the perspective matrix from the frustum values is in that source code as well. That takes us beyond the scope of our course, though. Of course, you're welcome to look anyway.

Forced Perspective

As long as we're setting up a camera, let's look at how to play tricks with it. In particular, we'll look at how forced perspective works. If we have time, we can look at the Lord of the Rings DVD that explains how they did some of this. Very cool.