Some thought questions about affine transformations.
In the past, some students have wondered why we're learning the matrix stuff and how that connects with the OpenGL stuff. It's a legitimate question. It doesn't seem to be necessary to using the OpenGL functions. Here are some reasons:
Additional Reading. If the following is confusing or you just want to hear a different explanation, feel free to consult the following:
There are two aspects to the camera API: the placement of the camera, and the shape of the camera. To understand the latter, we have to understand how cameras work.
This is a cutaway side view of a pinhole camera (in Latin, the "camera obscura," which means "dark room"). The z axis is imaginary, but the box on the left part of the picture is a real box: solid on all six sides. There is a tiny pinhole in the front of the box, where the circle is on the picture. The origin is placed at the pinhole, the y axis goes straight up through the front of the box, and the z axis points out into the scene. The depth of the box is d, and the height of the box is h. We also might care about the width of the box, but that's not visible on this picture, since we've eliminated that dimension.
Rays of light from the outside go through the hole and land on the image plane, which is the entire back of the box. For example, a ray of light from the top of the tree lands on the back of the box as shown. Because the hole is so tiny, only light from along a single ray can land on any spot on the back of the camera. So, in this example, only light from the top of the tree can land at that spot. This is why pinhole camera pictures are so sharp. Theoretically, they can provide infinite clarity, though in practice other issues arise (diffraction of light rays and the lack of enough photons). However in computer graphics, we only need the theoretical camera.
Pinhole cameras are simple and work really well. They're standard equipment for viewing solar eclipses. We only use lenses so that we can gather more light. A perfect pinhole camera only really allows one ray of light from each place in the scene, and that would make for a really dark image, or require very sensitive film (or retinas). Since the need to gather enough light is not important in computer graphics, the OpenGL model is of a pinhole camera.
Points to notice:
Suppose we want to compute the projection of the top of the tree. We know the coordinates of the top of the tree, let them be (X,Y,Z). We want to know the coordinates of the projection, let them be (x,y,z). We can do this by similar triangles, using the two yellow triangles in the following figure:
The height and base of the big triangle are just Y and Z. The height and base of the little triangle are y and z. The base of the little triangle is known, because that's determined by the shape and size of our pinhole camera. In the figure, it's notated as "d." By similar triangles, we know:
y/d = Y/Z
y = d*Y/Z
y = Y/(Z/d)
Everything on the right hand side is known, so we can compute the projection of any point simply by knowing the depth of our pinhole camera and the location of the point.
As for the x coordinate, a similar argument holds.
There's a hassle with signs in the pinhole camera, such as whether the z value for the tree is negative. Also, the image ends up being upside down. In CG, since we're really only interested in the mathematics of projection, we use a synthetic camera, in which the image plane is placed on the same side of the origin as the scene.
For the same reason that we want to make all our affine transformations into matrix multiplications, we want to make projection into a matrix multiplication.
There are two kinds of projection available in OpenGL.
| 1 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 |
| 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 1 |
Notice how it zeros out the z coordinate of the vertex. It's easier to come up with the matrix for that than to write it in HTML!
The matrix for perspective projection isn't obvious. It involves using homogeneous coordinates and leaving part of the calculation undone.
The part of the calculation that is left undone is called perspective division. The idea is that the homogeneous coordinate (x,y,z,w) is the same as (x/w, y/w, z/w, 1): that is, we divide through by w. If w=1, this is a null operation, so it doesn't change our ordinary vertices. If, however, w has the value z/d, this perspective division accomplishes what we did earlier in our similar triangles, namely:
y = Y/(Z/d)
Therefore the perspective matrix is a matrix that accomplishes setting w=Z/d and leaves the other coordinates unchanged. Since the last row of the matrix computes w, all we need to do is put 1/d in z column of the w row. The perspective matrix, then, is just
| 1 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 0 | 1/d | 0 |
Here's a PDF document that gives the math for projection
Now that we know the basic theory, how do we get this to work in OpenGL? In OpenGL, to specify the camera shape, we have a choice:
glFrustum(left,right,bottom,top, near,far);
This is slightly more powerful than gluPerspective, but it can be unintuitive and therefore difficult. The first pair of arguments give the left and right edges of the top of the frustum along the X axis; the second pair are the same for the Y axis, and the last pair are the distance to the top and bottom of the frustum along the Z. The X and Y pairs are typically signed (positive and negative), while the Z pair must both be positive: distances measure from the origin.
![]()
![]()
left and right are measured along the X axis, top and bottom along the Y axis. Near and far are measured as positive distances along the negative z axis. Equivalently, there is a left-handed coordinate system in which they are the z coordinates.
gluPerspective(fovy,aspect,near,far);
This builds a frustum that is symmetrical around the Z axis. (So "left" and "right" are negatives of each other, such as -5 and +5, and similarly for "top" and "bottom".) The first argument is the "field of view y": the angle at the apex of the pyramid in the YZ plane. The second is the aspect ratio of the top of the frustum. The last two arguments are the same as for glFrustum. It looks like this:
"Field of View Y" is the angle formed by the top and bottom of the frustum in the YZ plane. the aspect ratio of the top of the frustum is width/height
The first argument of gluPerspective() is called "fovy," which stands for "field of view, y." This is the angle between the upper and lower sides of the frustum.
Picture a pyramid, but not a normal square pyramid, but one where each slice through it is a rectangle. Picture it extreme, just to make this visualization easier. Now, think about the apex of the pyramid and the angle between opposite walls. There's a narrow angle between one pair of walls (the ones along the long sides of the rectangles) and a wider angle between the other pair. Thus, in the general case, there isn't just one angle for a frustum.
Since the frustum is aligned with the Z axis, we can refer to the angle in the YZ plane and the angle in the XZ plane. OpenGL arbitrarily chose to control the angle of the frustum using the angle in the YZ plane, hence the "field of view y." This is just the angle between the upper and lower (not left and right) sides of the frustum.
The FOVY is fairly intuitive with practice. Here are some things to keep in mind. (Try sketching to see what's going on.)
The demos/camera/Perspective.py
demo shows how much difference camera distance and shape can make.
The two cubes are identical; what's different is the FOVY and the
distance of the COP from the scene. You're encouraged to look at the
code to see how the two cameras are set up. In particular, look at
the code for leftDisplay() and rightDisplay,
which sets up the left and right scenes in the demo.
Here's a screen shot of the demo:
The following are two diagrams from the side illustrating what is going on. The magenta cube is the wire cube, and it is centered around the origin, just as it is in the code. The z axis (blue arrow) goes horizontally to the left. The y axis (green arrow) goes up. The frustum is in red.
![]() |
![]() |
| In this picture, the frustum has a FOVY=90. | In this picture, the frustum has a FOVY=37. The top of the frustum has been kept the same size by pulling back the camera to compensate. |
In this course, we will typically use gluPerspective, but later we'll have occasion to use glFrustum for a special effect.
To modify the projection matrix, we have to do the following:
glMatrixMode(GL_PROJECTION); glLoadIdentity(); gluPerspective(fovy,aspect,near,far);
Last time, we talked about the CTM. This is an almost entirely accurate concept, but it's not exactly implemented as a single matrix in OpenGL. Instead, it's implemented as two: the projection matrix and the modelview matrix. The first is about the shape of the camera, the second is about the transformations involved with placing objects in the scene and placing the camera.
To modify the shape of the camera, we switch to operating on the projection matrix. Then, since what we almost always want to do is to replace the current matrix with a different one, we load the identity matrix and then multiply by the desired projection matrix. That explains the sequence of calls at the end of the last section. To switch back to modifying the modelview matrix, we just do:
glMatrixMode(GL_MODELVIEW);
As we saw above, the perspective and orthographic projections work when the axis is aligned with the Z axis. This is actually the same as the initial coordinate system in OpenGL. But what if we don't want our scene/camera set up that way? We can either move the camera or move the scene: these are the same thing!
Therefore, to specify camera location, we have another choice:
glMatrixMode(GL_MODELVIEW); glLoadIdentity(); gluLookAt(eyex,eyey,eyez, atx,aty,atz, upx,upy,upz);
The OpenGL API is almost the same as another API for camera positioning that is standard in the field, namely:
The following terms are commonly used for the kinds of motions for cameras and such.
The demos/camera/BarnViews.py demo shows three different views of the barn, using the typical API for camera setup, and eschewing twCamera(). The key here is to look at the code for how to see the barn views are set up. Use the following keyboard callbacks:
Nate Robbins (not me) has built a tutor for playing with these
camera APIs. It is in a subdirectory of tw, namely:
~cs307/public_html/tw/tutorsand the program name is "projection." We'll play with this some in class, but you're encouraged to play with it outside class as well.
If we have time, we can talk about how the twCamera setup works. There are two modes:
You can dump out a lot of information about the TW camera setup by doing the following:
twSetMessages(TW_ALL_MESSAGES);
Note that you may not use the twCamera() setup functions in your next assignment, but that's the only time this is forbidden. You may always use the TW functions when they are helpful and convenient.
Note also that I've never tried mixing the twCamera with plain OpenGL camera API, so if you decide to try it, let us know what you learn!
As long as we're setting up a camera, let's look at how to play tricks with it. In particular, we'll look at how forced perspective works. If we have time, we can look at the Lord of the Rings DVD that explains how they did some of this. Very cool.
Written by Scott D. Anderson
scott.anderson@acm.org

This work is licensed under a Creative Commons
License.