By now, web-style client/server interactions are very familiar, as are
mechanisms that involve running a program, such as ssh
over a network connection. Such applications immediately make one
think about structured network operations. But wouldn't it be nice to
leverage one of the most familiar tools in every programmer's
repertoire and abstract away all the network messiness? This is the
essential goal or remote procedure call (RPC).
The essential idea is quite simple. In a program, when we want to separate out some computation that we can use from another part of the program, we define a procedure (or function, or method) and then we call it whenever we need it:
The arguments (and return address) and result are typically passed on the stack. Since both the caller and callee are both running in the same process, they share memory (including the stack), and it all works quite easily.
What if we could just arrange for the procedure to be on another computer:
The goal is to :
- Make use of a well-understood programming model.
- Make network machinery transparent to the programmer.
Alas, it is not so simple. To do this, we have to solve problems in two areas: networking (e.g., how does a process on one computer send a request to an entity on another computer) and data representation.
Networking Model
This is not a networking course, so we won't follow this problem all the way down to the electrical signals on the wires. We will assume that there is a way to establish contact with another computer. In particular, there are two network protocols that are of interest: TCP (Transmission Control Protocol) and UDP (Universal Datagram Protocol). At the moment, we don't care which one is used, as long as we can send a request from one machine to another and get back a response. We also take for granted that we can look up a server by name or IP address.However, we do need to know how to send a request not just to some computer, but to a particular procedure on that computer. For this, we have a notion of ports. If a server is like an apartment building that houses lots of programs, a port is like the mailbox for a particular resident. A server program registers with its host by requesting a port, and then waits for requests to come in. A client sends its request to a port on the server.
RPC Call Binding
Typically, one does not export a single function for use by remote clients. The usual case is that you are exporting some service (abstraction), and there is a set of procedures that clients will need to use the service. To use a procedure, a client needs to bind to the correct remote procedure.A service, also called a program, is assigned a number. A service then assigns each procedure it exports a procedure number. A service then registers a dispatch routine with a port, and when a request for a particular procedure number arrives at the port, the dispatch routine calls the corresponding server function.
In addition to the service and procedure numbers, there is also a version number. This allows a server to support, say, NFS versions 2 and 3 on different ports, and a client that supports one version can ask for the appropriate program.
While some services have a standard port number assigned (eg,
httpd
usually listens on port 80, sendmail
usually listens on port 25), there is a way to ask a server what port
is associated with a particular service number. The
portmapper service listens on port 111, and, given a service
and version number, will respond with a port a client can use to get
that service. You can see what services (aka programs) are assigned
to which ports by using the rpcinfo
shell command:
Note that some services are available for either the% /usr/sbin/rpcinfo -p cs program vers proto port 100000 2 tcp 111 portmapper 100000 2 udp 111 portmapper 100024 1 udp 651 status 100024 1 tcp 654 status 100011 1 udp 949 rquotad 100011 2 udp 949 rquotad 100011 1 tcp 952 rquotad 100011 2 tcp 952 rquotad 100003 2 udp 2049 nfs 100003 3 udp 2049 nfs 100003 4 udp 2049 nfs 100003 2 tcp 2049 nfs 100003 3 tcp 2049 nfs 100003 4 tcp 2049 nfs 100021 1 udp 32768 nlockmgr 100021 3 udp 32768 nlockmgr 100021 4 udp 32768 nlockmgr 100021 1 tcp 32778 nlockmgr 100021 3 tcp 32778 nlockmgr 100021 4 tcp 32778 nlockmgr 100005 1 udp 988 mountd 100005 1 tcp 991 mountd 100005 2 udp 988 mountd 100005 2 tcp 991 mountd 100005 3 udp 988 mountd 100005 3 tcp 991 mountd 100001 3 udp 1012 rstatd 100001 2 udp 1012 rstatd 100001 1 udp 1012 rstatd %
tcp
or udp
protocols. Also notice that there may be multiple
instances of a server listening on a port for requests (to improve
performance when lots of clients are making requests).
To bind to a particular remote procedure then, a client contacts the
portmapper on port 111 (using its protocol of choice) and looks up a
port by service and version number. The client then uses the
procedure number to make a remote procedure call on the given port.
Data Representation
Once we can communicate information from one computer to another, we run into another problem: Different computing platform represent data differently. We can send raw bytes, but we cannot count on their consistent interpretation. Even the lowly integer can't be reliably transimitted. First, we have to specify how big the integer is. If we standardize on, say, 4 bytes, then we still have the problem of byte order: Does the first byte represent the high-order byte of the integer or the low-order byte? (You may recall from you computer architecture course the notion of little endian versus big endian representations. Little endian machines put the low-order byte of a multi-byte integer at the lower memory address; big endians are the other way around.) It only gets worse for structured data like arrays, linked lists, etc.It is not practical to standardize all hardware and software run-time systems on a single set of representation choices. Rather, we can specify a transmission standard, or a network representation, and require all platforms to convert to the netwrok representation when they put data on the network and convert back when they receive data.
For the RPC standard created by Sun Microsystems (which is used for the Sun Network File System, or NFS), the standard network representation is called XDR (for eXternal Data Representation). To use Sun RPC, a program uses an XDR library to convert between local and network representations.
So, in addition to the bindin process above, a client must marshal the data for the RPC argument, i.e., it must use XDR library calls to convert the data to network representation. Then it does the remote procedure call. The server receives the request, and the procedure on the server converts the data to its local format, performs the required computation, converts the result to network representation, and sends the converted result back to the client. The client converts the result from network to local representation and continues on its way.
That is a lot of work! But notice that much of the work is fairly stereotyped. That suggests some automation.
Automatically-Generated RPC Infrastructure
Since most of the messy work involoved in performing remote procedure calls is standard, the necessary code can actually be generated automatically. Obviously, the code that performs the actual client and server computation cannot be generated automatically, but the marshalling and unmarshalling of data and network binding can be.The rpcgen
program takes an RPC service protocol
specification in a file (whose name ends in .x
). The
specification defines the service/program, the procedures the service
supports, and the argument and return types of the procedures. From
that information, rpcgen
produces 4 files:
- An include (
.h
) file that can be used by both the client and the server code. - A client stub that the client will use to handle all the RPC messiness unrelated to its computation.
- A server stub that the server will use to handle all the RPC messiness unrelated to its computation.
- A file containing XDR-related functions used by both the client and server.
Modified: 12 April 2007