Software Engineering Tools
gdb
and make
Debugging
Writing code is by far the easiest part of programming. Even solving compile-time errors is comparatively easy (not that it isn't frustrating).Solving a problem can be very hard, though most problems we have to solve are not so bad: we can normally see a solution (or most of a solution) pretty quickly.
Debugging is very hard.
Abstraction and modularity are vital to problem solving, and equally vital for debugging. But debugging is still hard. Partly, because errors have a way of poking through abstraction barriers (you're not supposed to worry about how a function works, but when it fails, its details are suddenly thrust upon you).
Staring at your code and thinking hard will help you find really trivial bugs (sometimes you'll have a breakthrough moment), but it is more than likely a way to spend a lot of time getting frustrated and making no progress.
This might be a good time to let you in on a secret. Normally, when you write a new program or system, you design, build, debug, and test your code, and then you throw it away and start over. Good programs of any size or complexity, just like good poems, novels or research papers, do not come about on a first effort. The first attempt helps you learn where the right abstraction boundaries are and where the performance problems are. With this knowledge, you can then build a good solution.
Your debugging tools are:
- Good clear program design. If you can't read and understand the code, you are doomed.
- Sitting and thinking: OK if you bound the time, i.e., tell yourself, if I don't figure this out in 20 minutes, I move on to something else.
- Print statements. A great way to see what your code is actually doing. The downside is you have to put them in and take them out again (or use conditional execution/compilation), which itself adds complexity.
- Talking it out. This is an oft overlooked technique that
we should use a lot more. Some call this the
confessional approach. In industry they call it a
code walk throuth or, if there is some degree of
procedural formality involving a large team, a structured
code walk through.
You don't always need another person, and the other person (if present) doesn't always have to pay attention to you. Just describing the problem often reveals things you hadn't noticed before. When the other person does pay attention, they can offer a fresh look unencumbered by the assumptions you've been making. Never underestimate the value of a fresh set of eyes!
- A debugger — a program that lets you run your program in a controlled environment so that you can observe what it is actually doing. You must learn to use a debugger, because it allows you to poke around in a program while it's running, and it doesn't involve changing the program (which risks introducing new bugs). It is also a great way to increase your understanding of what is happening in the machine. The problem is that learning to use a debugger always seems like more work than fixing an individual bug, and this may be true. But once you've learned, it will speed up many future programming bouts. And the really good news is that you can get huge benefits by just learning to use a few features of a modern debugger.
gdb
: The GNU Debugger
A debugger is a controlled environment for exploring a running
program. You can also explore a program that has already
crashed — which can be useful if a program runs a long
time before encountering an error.
There are quite a few choices, including those with fancy
GUIs, but we will use the GNU debugger, gdb
.
gdb
is run from a terminal command line, and is
pretty low-tech, which means it will work when the thing you're
debugging is the window system. Linux systems usually come with
a GUI wrapper for debuggers (usually gdb
) called
ddd
which I encourage you to play with.
To get started, you'll need some buggy code. You can supply your own, but let's start with some of mine. Here are buggy versions of the linked list code from the Dynamic Memory Management lecture as well as a linked list client program:
linked_list.h
linked_list.c
test_list.c
Download these files and we'll debug them using
gdb
.
Compile the code using the -g
option (which puts
symbol table information in the executable image so the debugger
can get at it), and run it on something. (You'll have to
compile the two C programs and then link them together.) What
happens?
Type gdb test_list
, and we can start to explore.
We can run the program with any arguments using
the run
command. Type run
and see
what happens. Then try run test_list.c
.
If the program crashes, you can use the where
command to see where you were. You can always use
the help
command to find out more.
Use the break
command to tell gdb
to stop at a function or a particular line number. Breakpoints
can be relative to a file if you have a multi-program file.
You can use the list
command to look at source
code.
Let's choose a place where we'd like the program to stop:
perhaps at the first line of main()
. When the
program stops at the break point, we can
print
values, even run code! Then we can
step
though the program 1 or more lines at a time.
(There is also a next
command that will avoid
stepping into functions.)
Celebrate when you find each bug!
NOTE:
To debug a program that has crashed, you'll need access to a
To tell core
file, also called a core dump. A core
dump is a file that has the contents of your program's memory
when the OS detected an error (usually a pointer gone wild).
Our system prevents core
files from being produced
by default, because they are very large and forgetting to
delete them wastes a lot of space. To allow the system to
create core dumps, you can type:
Which tells the shell that you don't want a maximum
% ulimit -c unlimited
core
file size. (This size is 0 when you log in
on our machines.)
gdb
load a particular core file, you
type something like this:
(gdb) core-file core.23033
System Building
The principles of abstraction and modularity lead us to break a problem (and therefore the corresponding program) up into smaller pieces. For example, we define functions (methods, procedures, whatever your language calls them) to encapsulate solutions to subproblems that can be combined into a solution to a larger problem. (One way to build modular abstractions is to base them on common idioms/patterns in your code.)Similarly, we may break a program up into separate modules, each of which implements a set of related abstractions (types, data structures, and functions). In most systems, this notion is bound up with the idea of the source file: the basic unit of editing and printing. In C, files and modules are exactly the same: there is no language-level notion (e.g., class, unit, module, cluster) at all.
This same divide and conquer approach applies to larger systems as well. For example, a project might involve building a searchable file system (Google for the desktop). Such a system will involve at least two processes: indexing and searching. Searching will involve a single server program with many components. Indexing will involve a process that coordinates a host of smaller mission-specific programs (each able to index files of a given type).
Breaking problems up this way has many advantages: Decomposing a problem into smaller pieces makes the whole problem easier to solve. It also makes the solution clearer to us and to others. Another benefit is that different people or groups can work on different pieces, which speeds up development. Finally, well-modularized code is also easier to maintain because bug fixes or feature additions can be localized to one (or a few) components. Widespread changes are not only harder, they risk the addition of more bugs!
Nothing is free, however. All these advantages come with a price: managing all the pieces we've created becomes very complicated very quickly. The problems fall into two broad, related categories: building the program or system and version control.
When everything is in one file, the build process is
very simple: run the compiler on the file. As you saw on a recent
homework, having just 2 or 3 files in the mix makes life a
lot more complicated. It is common to spend hours debugging
something that is not actually broken in the code because some
source file was not recompiled — and the failure is
showing up in another module. In fact, the author of the Unix
make
facility (the first widespread build tool)
started the project in direct response to two episodes like
this. In industry, it is quite common to have a whole group of
programmers who work as a build team that is separate
from the actual development team. (Industry also separates
project testing into a quality assurance (QA) team.)
Version management is related: how do we ensure that we have the most recent version of all the files, or at least that a build is working with a consistent set of program components. The problem really comes to the fore when more than one person is writing code: Without help, the programmers will soon be spending more time coordinating their updates (and fixing inconsistencies introduced by multiple developers working concurrently). In the Unix community, CVS is the standard version management tool, though the Linux kernel community uses a tool called git. We will not be discussing this issue today.
make
: Automating Builds
The make
program's purpose is to keep track of file
dependencies and figure out the minimum number of commands to
execute (e.g., compiles) to generate some target. For example:
- To make a program, you need to make all of its component
.o
files. Once you have them, you link them together (usually usinggcc
). - To make a particular
.o
file, you need the corresponding C source code and its include files. You then use the compiler to produce the.o
file.
The make
program takes such rules (and other
information that we'll see below) and effectively constructs the
transitive closure of the resulting dependency graph.
Once this is done, make
can build the entire system.
The genius of make
is that it uses dependency
information and the file modify times to figure out exactly what
needs to be compiled. For example, you might want to make a
program, say test_strncat
, that depends on two
.o
files each of which depends on one C source file
and one header file. If only one of the C source files has
changed since the last build (that is, the source file is newer
than its descendents in the dependency graph), then that file
and anything that depends on it must be rebuilt.
This information is encoded in something called a make
file, which is traditionally named Makefile
or
makefile
. A make file contains variable
definitions and dependency rules. Variable
definitions look like this:
CFLAGS = -Wall -g
PROGRAMS = foo baz
You can refer to the values of these variables in either of two
ways: $(PROGRAMS)
or ${PROGRAMS}
, and
references are replaced like macro definitions (think
#define
in a .h
file). By convention,
variables are given uppercase names.
Some variables are predefined and others have developed
conventional uses. For example, CC
holds the name
of the C compiler. If you want to use another C compiler, you
can set CC
to your choice, and all the standard
build rules will use the new value. CFLAGS
is a
set of flags to be used in every compilation.
Dependency rules have a target name,a colon, and a list of files the target depends on on one line. After this line, there are zero or more actions, shell commands, preceeded by a tab character. This insistence on the tab character is one of the most famous bone-headed decisions in Unix history.
foo: foo.o bar.o
gcc $(CFLAGS) -o foo foo.o bar.o
says that foo
requires foo.o
and
bar.o
. Once you have the these required targets
built, you make a foo
by calling gcc
with the approopriate arguments.
If you just type make
and there is a file named
Makefile
or makefile
in the current
directory, then make
will build the first target
specified in the file. Traditionally, therefore, the first
target is typically the entire system (and the target name is
usually all
) or the typical item users want to
build. The target named install
usually builds a
system, and then installs the result on the current machine, eg,
by moving the program files to places like
/usr/local/bin
. Often anyone can build and run a
system in a private directory, but installation requires
administrator privileges. Another commonly used target name is
clean
, which usually has no dependencies.
make clean
should remove all temporary files (like
.o
files) so that a fresh build from scratch can
take place. Here is a common entry:
clean:
rm -r *.o
Build your own make file for test_list
. Use
some variables, too. You can also define a make file for your
current assignment if you like.
Modified: 5 March 2008