- Standard I/O, which provides, as the name implies, abstractions for normal file I/O. Input and output operations using the standard I/O library will, generally, be portable to other operating systems.
- System level or low level I/O which provides mechanisms for more direct interaction with the operating system calls for doing input and output.
Today we will discuss the standard I/O package. To use the standard I/O library, your program must contain the line
#include <stdio.h>
which incorporates macro definitions (getc()
is a
macro on some systems), constant definitions (like
EOF
), type definitions (FILE
) and
function prototypes for everything in the standard I/O
library. In the old days, it was sometimes necessary to
specify the standard library for the linker. gcc
will automatically include the standard I/O library, as well
as other standard libraries, when it links your program.
The FILE
Abstraction
The standard I/O package is build around the fundamental
abstraction of a byte/character stream, called a
FILE
which allows the contents of a file to be
read or written character by character. Programs manipulate
character streams using a file pointer. For example, a
program that had a data output file called
data_out
would contain this declaration:
FILE *data_out;
Before a file can be read or written, it must first be opened, so that the operating system can set up the necessary structures to make it function as a readable and/or writable character stream. When a program is done with a file, it is closed.
There are three predefined character streams provided to every C program, and a huge variety of programs can get by with just these:
stdin
is the standard input and is usually associated with the terminal of the person who executed the program.stdout
is the standard output and is usually associated with the terminal of the person who executed the program.stderr
is the standard error output and is usually the same as the standard output.
cat
and wc
are
good examples.
Character I/O
You have already seen one function that writes to standard
output: printf()
. There are quite a few other
functions (in fact, many are macros) that read and write from
standard input.
Here is a program that converts data from its standard input
to all lower case:
getchar()
reads a single character from the
program's standard input and returns the special inteteger
value EOF
(defined in stdio.h
) when
the end of file is reached. If you look at the manual page
for getchar()
, you'll see it does not return a
char
but an int
. Its result is
therefore assigned to the integer variable
c
in the sample code above. Why does
getchar()
not return a char
, and
what could go wrong if we assign its result to a variable of
type char
.
putchar()
writes a single character to
standard output, and returns that character (as an integer).
It returns EOF
if there are any errors.
In fact, there is a host of character I/O functions/facilities in the standard I/O package:
getchar() , getc(FILE *stream) , fgetc(FILE *stream) |
ungetc(char c, FILE *stream) |
putchar() , putc(char c, FILE *strm) , fputc(char c, FILE *strm)
|
Look at the manual pages for these calls. You'll notice
that there is duplication of functionality (and some items may
be macros). You will often see identical or similar
facilities with different names in C. This is the result of
modern C implementations incorporating facilities from
multiple, widely used, implementations and from
standardization efforts that created new facilities while
older versions remained to support legacy code. Sometimes,
the manual pages will tell you when newer, standards-based
versions are preferred (or at least when use of some facility
is discouraged, e.g., gets()
).
Line I/O
Similarly, there are several line I/O services:
gets(char *s) , fgets(char *s,int size, FILE *stream) |
puts(char *s) , fputs(char *s, FILE *stream) |
'\n'
)
or EOF
. fgets()
will stop reading
after size
if the other conditions don't obtain.
Never use gets()
. It exists for
legacy code, and is dangerous. It does no size checking,
which means that any input can create a buffer overflow
— the most common point of attack for malicious programs.
Formatted I/O
Formatted I/O is very useful when reading or writing text files (which includes terminal I/O). You may not have seen this in Java, though it is available via thePrintWriter
class.
The idea is that a format string specifies how data
should be written (or how input is assumed to be formatted).
int printf(char *format, ...)
int fprintf(FILE *fp, char *format, ...)
Return the number of output characters successfully written, a negative value on error.
int scanf(char *format, ...)
int fscanf(FILE *fp, char *format, ...)
Return the number of items read and converted, or
EOF
in the event of an error or an end of file
before any conversions. Great care must be taken with reading
string data of unbounded size, which can be exploited by
malicious or broken code. You will very often see these
functions used without any error checking, which can cause
problems when the input does not conform to your expectations.
The Power of Redirection and Pipes
Though not really a standard I/O library issue, the use of the standard pre-opened filesstdin
,
stdout
, and stderr
coupled with the
Unix principle that `everything is a file' is a source of
great power and flexibility. It allows for highly modular
program development, as small programs can be easily mixed and
matched to make make larger, more complex programs.
Every file can be read or written as a stream of bytes, including the terminal. That alone makes simple filters very easy to write and greatly simplifies more complicated programs. The programmer's life is made easier.
The greatest advantage comes when you realize that, because the programmer doesn't know the source or destination of I/O, the caller of the program can change these at will. The Unix shell supports this with I/O redirection and pipes.
Consider a program that reads from the terminal.
cat
with no arguments copies its input to its
output (computer output in green):
% cat
hi
hi
there
there
<CNTL-D>
%
The symbol <
tells the shell that the
standard input for the command should come from the specified
file rather than from the terminal.
The standard output of any command can likewise be redirected using% cat < hello.c /* This is the traditional first program in C. */ #includeint main() { printf("Hello, world!\n"); } %
>
, and you can redirect both
at the same time:
% cat < hello.c > new.c
%
copies hello.c
to the file new.c
.
CAUTION: Redirecting a program's output will create the file
if it doesn't exist, and will destroy anything already in the
file if it does. To append the output of a command to
a file, use >>
.
If a file is a file, and we can redirect the input and
output of a program to arbitrary files, then can we wire the
output of one command directly to the input of another? Yes!
That's what pipes are for. The shell uses
|
to do this.
% cat < hello.c | wc
8 21 129
%
pipes the output of the cat
command to the input
of wc
, which is a program that counts the lines,
words, and characters in its input.
This flexibility supports one of the great advantages of program development in Unix environments. It is possible, even encouraged, to break a problem into small pieces, write smaller, simpler programs to do each piece, and then string the programs together to do more complex things. Using these tools, each program can be written by different programmer(s) and even in different languages.
And this style of development also forces one to think very carefully about the interfaces between the components. The more robust the interface, the more modular the code will be. Since the interface is based on a file model, it prevents each program module from being too dependent on internal representations of other modules. Also, because the data are constantly being sent to a file, it encourages data formats that are human readable (or at least readable with simple tools) — something that can be very useful for debugging.
Opening and Closing Files
FILE *fopen(char *fname, char *mode)
FILE *freopen(char *fname, char *mode, FILE *stream)
int fclose(FILE *fp)
int feof(FILE *fp)
#include <unistd.h>
int access(char *pathname, int mode);
fopen()
returns a pointer to a FILE
structure representing the file opened for r
ead,
w
rite, or a
ppend access. (The mode
string may include a +
. See the manual pages for
more information.) NULL
is returned on error
(sets errno
). Opening a file creates associated
buffers.
freopen()
flushes stream
and
closes the file descriptor associated with the
stream. (We'll talk about file descriptors in a future
lecture.) If fname
is NULL
,
then the file formerly associated with stream
is
reopened with the specified mode
. If
fname
is not NULL
, then the
specified file is opened and associated with
stream
. The return value is stream
,
unless there is an error, in which case NULL
is
returned and errno
is set.
fclose()
writes out any remaining buffered
data and deallocates resources associated with the file.
It returns 0
on success and EOF
on
failure (sets errno
).
feof()
returns non-zero if the file stream's
end-of-file indicator has been set.
access()
is used to test whether the current
process has permissions required in mode
(measured against the real UID and GID of the
process). (access()
is really a Unix function, I'm
including it here anyway.)
Binary I/O
The operations above are intended for text files. Of course, one can do binary I/O using the character facilities above treating each byte as a separate uninterpreted unit. However, it is sometimes desirable to write out and read in more complex data structures. For example, a database may not want to suffer the performance penalties of parsing text (withfscanf()
) when it gets index entries from a file.
On the other hand, reading in the input byte-by-byte can be
very error prone. So, C allows one to read and write
struct
s and other data structures directly using:
size_t fread(void *ptr, size_t size, size_t nmemb,
FILE *stream);
size_t fwrite(const void *ptr, size_t size, size_t nmemb,
FILE *stream);
ptr
is the location of the data structure to be
read or written, size
is the size of a data
element, nmemb
is the number of elements (each
size
bytes long) to read or write. Each function
returns the number of elements (not bytes) successfully read or written.
Random Access
All the above I/O operations assume that you want to read or write a file from start to finish. But there are plenty of applications where you don't want to do this, most especially with large databases. You might use a hash function or a smaller index structure to determine that a data value of interest is at a particular location of a file. Rather than read all the data from the start of the file until you get to the 400 millionth byte, Unix-based systems provide a means of going directly to that location in a file and reading or writing.
int fseek(FILE *stream, long offset, int whence);
long ftell(FILE *stream);
void rewind(FILE *stream);
fseek()
sets the file position of the given stream
to the specified offset from the beginning, the current position,
or the end, depedning on the value of whence
.
ftell()
returns the current file position.
rewind()
sets the file position to the
beginning of file.
Modified: 07 February 2008