Systems Programming

Standard I/O

When running C under Unix, there are two distince levels of I/O: Applications programs should, for portability reasons, use the standard I/O package, which provides portable support for most input and output tasks along with buffering. There are times when the standard I/O package falls short, typically when interacting with specialized I/O devices. There are also times when the standard I/O library is not available, i.e., when writing code for the operating system kernel. In such cases, the programmer will use low level I/O.

Today we will discuss the standard I/O package. To use the standard I/O library, your program must contain the line

#include <stdio.h>
which incorporates macro definitions (getc() is a macro on some systems), constant definitions (like EOF), type definitions (FILE) and function prototypes for everything in the standard I/O library. In the old days, it was sometimes necessary to specify the standard library for the linker. gcc will automatically include the standard I/O library, as well as other standard libraries, when it links your program.

The FILE Abstraction

The standard I/O package is build around the fundamental abstraction of a byte/character stream, called a FILE which allows the contents of a file to be read or written character by character. Programs manipulate character streams using a file pointer. For example, a program that had a data output file called data_out would contain this declaration:
FILE *data_out;

Before a file can be read or written, it must first be opened, so that the operating system can set up the necessary structures to make it function as a readable and/or writable character stream. When a program is done with a file, it is closed.

There are three predefined character streams provided to every C program, and a huge variety of programs can get by with just these:

Programs that read only from their standard input and write only to their standard output are called filters and are very common. cat and wc are good examples.

Character I/O

You have already seen one function that writes to standard output: printf(). There are quite a few other functions (in fact, many are macros) that read and write from standard input. Here is a program that converts data from its standard input to all lower case:

getchar() reads a single character from the program's standard input and returns the special inteteger value EOF (defined in stdio.h) when the end of file is reached. If you look at the manual page for getchar(), you'll see it does not return a char but an int. Its result is therefore assigned to the integer variable c in the sample code above. Why does getchar() not return a char, and what could go wrong if we assign its result to a variable of type char.

putchar() writes a single character to standard output, and returns that character (as an integer). It returns EOF if there are any errors.

In fact, there is a host of character I/O functions/facilities in the standard I/O package:
getchar(), getc(FILE *stream), fgetc(FILE *stream)
ungetc(char c, FILE *stream)
putchar(), putc(char c, FILE *strm), fputc(char c, FILE *strm)

Look at the manual pages for these calls. You'll notice that there is duplication of functionality (and some items may be macros). You will often see identical or similar facilities with different names in C. This is the result of modern C implementations incorporating facilities from multiple, widely used, implementations and from standardization efforts that created new facilities while older versions remained to support legacy code. Sometimes, the manual pages will tell you when newer, standards-based versions are preferred (or at least when use of some facility is discouraged, e.g., gets()).

Line I/O

Similarly, there are several line I/O services:
gets(char *s), fgets(char *s,int size, FILE *stream)
puts(char *s), fputs(char *s, FILE *stream)
These functions read (or write, respectively) a line of data which is terminated by a newline character ('\n') or EOF. fgets() will stop reading after size if the other conditions don't obtain. Never use gets(). It exists for legacy code, and is dangerous. It does no size checking, which means that any input can create a buffer overflow — the most common point of attack for malicious programs.

Formatted I/O

Formatted I/O is very useful when reading or writing text files (which includes terminal I/O). You may not have seen this in Java, though it is available via the PrintWriter class. The idea is that a format string specifies how data should be written (or how input is assumed to be formatted).

int printf(char *format, ...)
int fprintf(FILE *fp, char *format, ...)

Return the number of output characters successfully written, a negative value on error.

int scanf(char *format, ...)
int fscanf(FILE *fp, char *format, ...)

Return the number of items read and converted, or EOF in the event of an error or an end of file before any conversions. Great care must be taken with reading string data of unbounded size, which can be exploited by malicious or broken code. You will very often see these functions used without any error checking, which can cause problems when the input does not conform to your expectations.

The Power of Redirection and Pipes

Though not really a standard I/O library issue, the use of the standard pre-opened files stdin, stdout, and stderr coupled with the Unix principle that `everything is a file' is a source of great power and flexibility. It allows for highly modular program development, as small programs can be easily mixed and matched to make make larger, more complex programs.

Every file can be read or written as a stream of bytes, including the terminal. That alone makes simple filters very easy to write and greatly simplifies more complicated programs. The programmer's life is made easier.

The greatest advantage comes when you realize that, because the programmer doesn't know the source or destination of I/O, the caller of the program can change these at will. The Unix shell supports this with I/O redirection and pipes.

Consider a program that reads from the terminal. cat with no arguments copies its input to its output (computer output in green):

% cat
hi
hi
there
there
<CNTL-D>
%

The symbol < tells the shell that the standard input for the command should come from the specified file rather than from the terminal.

% cat < hello.c 
/* This is the traditional first program in C. */

#include 

int main() 
{
        printf("Hello, world!\n");
} 
%
The standard output of any command can likewise be redirected using >, and you can redirect both at the same time:
% cat < hello.c > new.c
%
copies hello.c to the file new.c. CAUTION: Redirecting a program's output will create the file if it doesn't exist, and will destroy anything already in the file if it does. To append the output of a command to a file, use >>.

If a file is a file, and we can redirect the input and output of a program to arbitrary files, then can we wire the output of one command directly to the input of another? Yes! That's what pipes are for. The shell uses | to do this.

% cat < hello.c | wc
    8    21    129
%
pipes the output of the cat command to the input of wc, which is a program that counts the lines, words, and characters in its input.

This flexibility supports one of the great advantages of program development in Unix environments. It is possible, even encouraged, to break a problem into small pieces, write smaller, simpler programs to do each piece, and then string the programs together to do more complex things. Using these tools, each program can be written by different programmer(s) and even in different languages.

And this style of development also forces one to think very carefully about the interfaces between the components. The more robust the interface, the more modular the code will be. Since the interface is based on a file model, it prevents each program module from being too dependent on internal representations of other modules. Also, because the data are constantly being sent to a file, it encourages data formats that are human readable (or at least readable with simple tools) — something that can be very useful for debugging.

Opening and Closing Files

FILE *fopen(char *fname, char *mode)
FILE *freopen(char *fname, char *mode, FILE *stream) int fclose(FILE *fp)
int feof(FILE *fp)

#include <unistd.h>
int access(char *pathname, int mode);

fopen() returns a pointer to a FILE structure representing the file opened for read, write, or append access. (The mode string may include a +. See the manual pages for more information.) NULL is returned on error (sets errno). Opening a file creates associated buffers.

freopen() flushes stream and closes the file descriptor associated with the stream. (We'll talk about file descriptors in a future lecture.) If fname is NULL, then the file formerly associated with stream is reopened with the specified mode. If fname is not NULL, then the specified file is opened and associated with stream. The return value is stream, unless there is an error, in which case NULL is returned and errno is set.

fclose() writes out any remaining buffered data and deallocates resources associated with the file. It returns 0 on success and EOF on failure (sets errno).

feof() returns non-zero if the file stream's end-of-file indicator has been set.

access() is used to test whether the current process has permissions required in mode (measured against the real UID and GID of the process). (access() is really a Unix function, I'm including it here anyway.)

Binary I/O

The operations above are intended for text files. Of course, one can do binary I/O using the character facilities above treating each byte as a separate uninterpreted unit. However, it is sometimes desirable to write out and read in more complex data structures. For example, a database may not want to suffer the performance penalties of parsing text (with fscanf()) when it gets index entries from a file. On the other hand, reading in the input byte-by-byte can be very error prone. So, C allows one to read and write structs and other data structures directly using:

size_t fread(void *ptr, size_t size, size_t nmemb,
             FILE *stream);

size_t fwrite(const void *ptr, size_t size, size_t nmemb,               FILE *stream);

ptr is the location of the data structure to be read or written, size is the size of a data element, nmemb is the number of elements (each size bytes long) to read or write. Each function returns the number of elements (not bytes) successfully read or written.

Random Access

All the above I/O operations assume that you want to read or write a file from start to finish. But there are plenty of applications where you don't want to do this, most especially with large databases. You might use a hash function or a smaller index structure to determine that a data value of interest is at a particular location of a file. Rather than read all the data from the start of the file until you get to the 400 millionth byte, Unix-based systems provide a means of going directly to that location in a file and reading or writing.
int fseek(FILE *stream, long offset, int whence);
long ftell(FILE *stream);
void rewind(FILE *stream);

fseek()sets the file position of the given stream to the specified offset from the beginning, the current position, or the end, depedning on the value of whence.

ftell() returns the current file position.

rewind() sets the file position to the beginning of file.


Author: Mark A. Sheldon
Modified: 07 February 2008