The Standard ML Basis Library

Introduction

The I/O subsystem provides standard functions for reading and writing files and devices. In particular, the subsystem provides:

buffered reading and writing;
arbitrary lookahead, using an underlying ``lazy streams'' mechanism;
dynamic redirection of input or output;
uniform interface to text and binary data;
layering of stream translations, through an underlying ``reader/writer'' interface;
unbuffered input/output, through the reader/writer interface or even through the buffered stream interface;
primitives sufficient to construct facilities for random access reading/writing to the same file.

In addition, the subsystem allows for efficient implementation, minimizing system calls and memory-memory copying.

The I/O system has three layers of interface. From top to bottom, they are

Imperative I/O: Buffered, conventional (side-effecting) input and output with redirection facility.
Stream I/O: Buffered ``lazy functional stream'' input; buffered conventional output.
Primitive I/O: Uniform interface for unbuffered reading and writing at the ``system call'' level, though not necessarily via actual system calls.

Operations are provided to move between the levels.

All conforming implementations must provide two instances of the I/O stacks: one for binary data and one for text. The latter provides a few additional operations to better support text oriented I/O. The library defines optional functors for building new I/O stacks.

Stream state

Input streams can be viewed as being in one of three states: active, truncated or closed. When initially created, the stream is active. Getting access to the underlying primitive reader (getReader) causes the stream to be truncated. Closing a stream causes the stream to be closed. A closed stream is also truncated.

Each input stream f can be viewed as a sequence of ``available'' elements (the buffer or sequence of buffers) and a mechanism (the reader) for obtaining more. After an operation (v, f') = input(f) it is guaranteed that v is a prefix of the available elements. In a truncated input stream, there is no mechanism for obtaining more, so the ``available'' elements comprise the entire stream. Reading from a truncated input stream will never block; after all buffered elements are read, input operations always return empty vectors.

Output streams can be viewed as being in one of three states: active, terminated or closed. When initially created, the stream is active. Getting access to the underlying primitive writer causes the stream to be terminated. Closing a stream causes the stream to be closed. A closed stream is also terminated. In a terminated output stream, there is no mechanism for outputting more, so any output operations will raise the IO.Io exception.

It is possible that a stream's underlying reader/writer, or its operating system file descriptor, could be closed while the stream is still active. When this condition is detected, the stream should raise the IO.Io exception with cause set to IO.ClosedStream.

End-of-stream

In Unix, and perhaps in other operating systems, there is no notion of ``end-of-stream.'' Instead, by convention a read system call that returns zero bytes is interpreted to mean end of stream. However, the next read to that stream could return more bytes. This situation would arise if, for example,

the user hits cntl-D on an interactive tty stream, and then types more characters;
input reaches the end of a disk file, but then some other process appends more bytes to the file.

Consequently, the following is not guaranteed to be true:

let val z = TextIO.StreamIO.endOfStream f
    val (a,f') = TextIO.StreamIO.input f
    val x = TextIO.StreamIO.endOfStream f'
 in x=z   (* not necessarily true! *)
end

whereas the following is guaranteed to be true:

let val z = TextIO.StreamIO.endOfStream f
    val (a,f') = TextIO.StreamIO.input f
    val x = TextIO.StreamIO.endOfStream f (* note, no prime! *)
 in x=z   (* guaranteed true! *)

Thus, the notion of ``end-of-stream'' for an input stream corresponds to a condition on the stream, rather than a place in the stream. For untruncated input streams, when an input operation returns an empty vector (or endOfStream returns true), this means that we are currently at the end of the stream. If further data are appended to the underlying file or stream, the next input operation will deliver new elements. Thus, a file may have more than one ``end-of-stream.'' If ``end-of-stream'' condition holds, an input will return the empty vector but the ``end-of-stream'' condition may become false as a result of this input operation.

Note that, after all buffered input is read from a truncated input stream, the input stream remains in a permanent end-of-stream condition.

Imperative I/O

The semantics of the imperative I/O level can be given by defining imperative streams as references to the underlying stream I/O stream types, and delegating I/O operations to that level. In addition, input at the imperative I/O level rebind the reference to the new ``lazy stream.'' For example, part of a structure matching IMPERATIVE_IO might look like:

structure ImperativeIO : IMPERATIVE_IO = struct
  structure StreamIO : STREAM_IO = ...
  datatype instream = INS of StreamIO.instream ref
  datatype outstream = OUTS of StreamIO.outstream ref
  fun input (INS(i as ref ins)) = let
        val (v, ins') = StreamIO.input ins
        in
          i := ins';
          v
        end
  fun output (OUTS(ref outs), v) = StreamIO.output (outs, v)
   ...
end

Translation

Text streams (TextIO) contain lines of text and control characters. Text lines are terminated with #"\n" characters.

In some environments, the external representation of a text file is different from its internal representation: for example, in MS-DOS, text files on disk contain CR-LF ("\r\n"), and in memory contain only LF ("\n") at the end of each line. Thus, on input, the CR-LF or CR terminators are translated to a single #"\n" character. The inverse translation is done on output. More substantial translation will be done on systems that support, for example, escape-coded Unicode text files.

Binary streams (BinIO) match the external files byte for byte.

Closing files on program exit

All streams created by open functions in TextIO and BinIO will be closed (and the outstreams among them flushed) when the SML program exits. The outstreams TextIO.stdOut and TextIO.stdErr will be flushed, but not closed, on program exit.

[ INDEX | TOP | Parent | Root ]