The non-programming questions are short-answer: Answer the question and explain your answer in full sentences (not just "yes" or "no" or "1024"). At most a couple a paragraphs should suffice. Brevity and concision will be important criterial for evaluating your response.
Create a directory called midterm1
in your
cs249
directory. All the code for your programming task
will go in there. For the short-answer questions, you may either
turn in hardcopy (by slipping your stapled solutions under my door)
or softcopy by putting your answers into text files in your
midterm1
directory. If you are submitting softcopy
text, put the answer to each question in a file named
short-answer-n
, where n
is
the question number.
When you are done, make a tarball and submit the result using the
drop
program.
You may not discuss the exam with your classmates at all, not even in high-level English. This is different from a regular homework. Bring all questions specifically about the exam to me. That way, I can ensure that everyone has the same information.
You can use any of the course notes and, of course, the relevant
man
pages for the exam.
Question 1 [10pts]
What is the generic pointer type in C and why do we need one? I.e., why can't we just choose something,(char *)
for
example, and use that as a generic pointer type?
Answer
- [2pts] C's generic pointer type is
(void *)
. - [4pts] The C standard guarantees that that any pointer can be
converted to a
(void *)
and back again without loss of data. The generic pointer must, therefore, be at least as large as the largest pointer type. While programmers often assume that all pointers are the same size, and on some computers this is true, it need not always be true. Intel processors have more than one pointer size, for example: a long pointer contains a segment and an offset while a short pointer just contains an offset. C implementations for such systems may encode certain pointers using the short format for efficiency. Converting a pointer to(void *)
forces the compiler to use the long version. - [4pts] The main motivation for having a generic pointer type is
abstraction. If a value has the generic pointer type, then all
you know is that it is a pointer to something, but you don't know
what. This means that you can't dereference the pointer. So if
you convert a pointer to
(void *)
, then this presents an obstacle to those who would attempt to poke through your abstraction boundary and manipulate your data elements directly. Of course, C will allow someone to cast a(void *)
to whatever they want, and if they happen to know your representation, they can subvert your abstractions. But they can't argue it was an accident!
Question 2 [10pts]
What, if any, limitations does the Unix file system model (embedded in the Virtual File System (VFS) specification) place on:- The length of file names.
- The length of file extensions (e.g.,
.txt
,.texinfo
, etc.). - The number of files in a directory.
Answer
There are no a priori restriction on any of these items in the abstract VFS model.- [2pts] The only place a file name is stored is as the contents of a directory, which is itself a file. The only real limit is thus the portion of the file system devoted to holding data — you could have a file name that takes up all the your disk space (or up to your quota if quotas are enforced)!
- [2pts] As far as the kernel is concerned, file name extensions
don't exist: they are just part of the file name and are merely
a user and application convention. A file name
doesn't have to have an extension at all (like
Makefile
). Or the whole file name could be after a.
, and so the theoretical limit is one character less than the maximum file name (because the.
is included). - [2pts] A directory is an ordinary file, so the maximum number of directory entries is however many will fit in all the data blocks in the file system. You could also argue that the maximum number of files in a directory is the maximum number of files you can have in the system, which is the same as number of inodes. However, it is possible to have multiple hard links to the same file, so you could have more directory entries than there are files in the system (but they wouldn't all be distinct).
You were not responsible for the following information, but you will
not be surprised to know that, while the general model doesn't have
any restrictions, practical implementations do. In particular, some
system calls will only accept path names of a limited size (due to
fixed sized internal buffers). For example, open()
may
return an error indication of “name too long.” You can use
the system calls pathconf()
and fpathconf()
to find out various system limitations, including the maximum path
length for a given file system.
Some file systems may, for their own internal reasons, impose
restrictions. The ext2
file system on Linux insists that
file names be no more than 255 characters long.
Question 3 [15pts]
You are working for a company that is extending Linux in various ways. Your boss, N. C. Kyoor, reminds you that current versions of Unix/Linux associate a file with, and specify permissions for, the file's owner, a single group, and the rest of the world. N. C. proposes a new scheme under which a file is associated with an explicit list of users. For example, you could create a file and then say that you and your partner may read and write the file, the other members of this class (each listed by user name) may read the file, and every one else is blocked from any access.What operating system/file system data structure(s) would have to change to support this security model, and how would they have to change? (You do not have to write structure declarations, just an English description.) Would this be a small change to Linux?
Answer
In Unix, we would try to get this behavior by creating new accounts and groups. In the example above, you and your partner could create a team account. The rest of the class could be in a new group. This is unsatisfying on two levels: First, creating accounts and groups is a heavy-weight task that requiresroot
permission. Second,
it doesn't give you enough control. There can be only one group
associated with the file, and everyone else must fall into the
other
categorty. It also becomes very awkward when you
and your partner create files owned by one of you rather than by the
team account.
[6pts] Unfortunately, this is a rather fundamental change to the Unix
security model. The notions of user
, group
,
and other
are deeply embedded in Unix. Not only would
ls
have to change, but the kernel code associated with
validating access permissions would need to change drastically.
[6pts] The data structures of interest are inodes and the
stat
structure. Rather than 9 mode bits (OK, a few more
for regular file, setuid, etc.), you would need to add a way to store
the various by-user permissions. On way to do this would be to
include a linked list of userID × mode_t
.
Or, you could create a list of permission groups in which each group
has a list of users.
[2pts] In either case, you are disrupting an imporant assumption in the design of the Unix file system model: VFS relies on inodes having a fixed size. This change affects several parts of the kernel code/algorithms, data structures, and would greatly complicate the layout of the data on disk. I wouldn't promise a quick turnaround on this project!
The overal quality of the explanation was was worth an additional 6 points.
Question 4 [10 points]
In the object-oriented representations we saw in class (in the VFS model and in the counter class example), the operations/methods were part of a separate structure referenced by the structure that represented an object. For example, a file object is astruct file
that contains a field f_op
,
which is a pointer to a table of operations (a struct
file_operations
), which in turn has a field
called write
. Likewise, a counter instance was a
structure that contained an ops
field that pointed to a
structure of pointers to functions that implemented the counter
methods.
Why? Why not represent a counter as a structure with
a decrement
field, or a file as a structure with
a write
field? What are the advantages and
disadvantages of the strategy shown?
There are two motivations for this approach:
- The main advantage is that it saves space. There is exactly
one table of operations for each kind of file system no matter how
many file objects are created. All the file instances have only
one field for the operations, and all instances point to the same
operation table. Similarly, each counter instance has one field to
point to the operation table, and all instances share the same
operation table. This allows each instance to be smaller: one
pointer field rather than a pointer for each operation.
The space savings over directly embedding a pointer to each operation in each object is:
(num_operations − 1) × num_objects
This can represent a considerable amount of storage, especially when the number of objects created is large (as inFILE
objects). Given that the Linux kernel runs in physical, rather than virtual memory, having the space be proportional to the number of file system types rather than the number of file objects can mean the difference between robust and useless (especially on a file server).Similarly, in Java, the overhead of storing a method dispatch table is paid only once for each class rather than once per object.
- The second consideration is time. It takes little time to initialize a single pointer, and num_operations times more time to initialize a full table of operations.
Question 5 [45pts]
Programming a Growable String Buffer
In this task, you will implement a growable string buffer. You have noticed how troublesome programming with fixed sized items and manual memory management can be. A growable string buffer is a data object that a client program can continue to append new data to without worrying about all the messy details of memory management. Java has aStringBuffer
class for this sort of thing, and you
will implement a basic version of it in C.
A string buffer has 3 elements of state:
- the contents, which contains a sequence of characters (possibly including embedded NUL characters);
- the size, which is the number of characters currently used in the buffer; and
- the capacity, which is the maximum number of characters the buffer can hold at the present time.
string_buffer.h
. Read this
file now or have it open while you read the following description.
A client program will create a new string buffer (using
mk_strbuff()
), which will have a size of 0 and a
capacity of 1 character. The client can then append new characters
to the end of the buffer as necessary (using
strbuff_append()
). Your implementation will append the
new character to the existing contents if there is sufficient room.
If there is not room, you will double the capacity of the string
buffer and then do the append. To append a string, you will just
append the characters one at a time.
Clients may also extract or modify characters in the buffer by
index. And, of course, when the client is done with a string buffer,
it may free its space by calling strbuff_destroy()
. A
client may reuse a string buffer by tuncating it to a size of 0.
The API requires the use of a couple of symbols defined by
errno.h
, so don't forget to include it.
You may find the memcpy()
function useful.
Requirements:
- Put your string buffer implementation into a file called
string_buffer.c
. - Your code must not crash and you must not abort the
program inside your implementation (i.e., you must not
call
exit()
instring_buffer.c
). The API defines the implementation's behavior in all error cases (and if it doesn't, then you must find and document a way to avoid crashing, and preferably notify the client of the problem). - Create a test program with source code in
test_string_buffer.c
, which includesstring_buffer.h
and any other header files that are necessary. - It should be possible for a client to have more than one string buffer active at a time with no interactions between the two.
- Once the client program creates a new string buffer and gets a pointer to it, it must never need to change the pointer for that string buffer. In other words, all the allocation and deallocation you do inside your implemenation are not visible to the client, who has a pointer to a single data value that represents the buffer.
- Your test cases should test every function in the given string buffer API. (Testing each function individually is called unit testing.) You should also check all feasible error modes. You may or may not be able so simulate running out of memory, but all the bounds checking, for example, should be tested.
Answer
Structures that can grow dynamically are extremely useful, and this sort of problem comes up over and over again. As the problem observed, the Java language includes pre-defined classes for growable string buffers and vectors that use the technique of repeated doubling and copying. In C, it is an idiom that one sees fairly often. (I urge you to look at the JavaStringBuffer
API if you
haven't already.)
The beauty of this approach is that you can load data of unknown size
into a structure in linear time, and then have constant time access to
the data elements. (In Algorithms, you will see how you can prove
that the cost of all the copying is amortized in such a way that the
asymptotic running time remains linear.) Another advantage in a
language like C without automatic storage management (garbage
collection) is that it isolates the storage management associated with
the structure in one place, thus improving modularity.
The point breakdown is as follows: 5 points for the data type declaration and 4 points for each function (there are 10 functions). 1 point out of the 4 for each function comes from the testing.
Here is my solution:
This solution is based more directly on the JavaStringBuffer
class than your code was required to be: It
extends the API with a function
called strbuff_ensure_capacity()
that, if the string
buffer's current capacity is not adequate, expands the string buffer's
capacity to the larger of the requested capacity and twice the current
capacity (plus two). This rather complex specification ensures that
the rate of growth is large enough to cover the copying overhead.
Using strbuff_ensure_capacity()
, one can take a different
approach to strbuff_append_str()
. Rather than have a
loop that appends each character of the string in turn, one can make
sure the string buffer has the space for the string, and then just
copy its contents in.
Things that were important to look out for:
- When allocating the string buffer structure, you also had to
allocate a 1-character content field (and, naturally, check that
both
malloc()
calls succeeded). - In
strbuff_to_string()
, you had tomalloc()
the result and copy the data in. (See Question 5 for more about this copying.) - You had to check the capacity and double the size of the content
storage if necessary in
strbuff_append()
. - In
strbuff_truncate()
, you did not have to free the contents or alter the contents in any way. You just had to set the size of the string buffer to the appropriate value. - In
strbuff_destroy()
, you had to free both the contents and the string buffer structure itself (and order is important). - Error checking is a must!
string_buffer.c
into a string buffer, a 3K file. I
have tested the implementation up to 9MB.
In this case, a lot of the testing code involves ensuring that the code matches the specification in error conditions. We also want to cover the other specified behaviors as well as any other programming situations we can anticipate.
A truly robust test suite would have a way to stress the code under
memory limitations. This was not required, nor is it illustrated
here. One way to do it would be to use the limit
command
of the shell (or the equivalent run-time functionality) to reduce the
size of the heap available to your program. In
fact, test_string_buffer
could take a heap size
argument.
(One known bug in the testing code is that if the file you give it to read into the testing buffer cannot be read (due to permissions problems or because it doesn't exist), then some tests will fail even if the string_buffer code is correct.)
Question 6 [10pts]
Consider yourstrbuff_to_string()
implementation. Briefly discuss the advantages and disadvantages of
each of strcpy()
, memcpy()
, and
memmove()
for getting the data into the output string.
Answer
This question got at some interesting issues. The specification said that your string buffer implementation should be able to contain embedded NUL characters ('\0'
). (This means that you
don't have to check for '\0'
in
strbuff_append()
.) But NUL characters are the
conventional string terminator in C. So, if your client will treat
the result of strbuf_to_string()
as a conventional
string, they'll only see up to the first NUL character. But you
really have to give back all the characters currently in the buffer,
because the caller can find out the size of the buffer and may want to
access the result beyond the first NUL character. That is, they may
be using your code to handle characters including '\0'
.
[2.5 pts] strcpy()
and strncpy()
are thus not good
choices for copying the data into the storage for the
result. They will copy only up to the first NUL.
strcpy()
does no length checking, so you you would have
to ensure that your buffer contents always ends with a NUL character
to avoid a wild trip through lots of inappropriate memory.
[2.5 pts] memcpy()
handles embedded NULs just fine, and it does
length checking. The only consideration is that it is not guaranteed
to work if the source and target memory areas overlap. In our case,
you are required to allocate new storage for the result string (to
avoid the interactions outlawed by the API). Thus, the memory is
guaranteed not to overlap.
[2.5 pts] memmove()
handles embedded NULs and overlapping memory.
So, in a belt and suspenders way, it would be the safest choice. But
you do pay some efficiency cost for its flexibility, and we know
that our source and target memory won't overlap.
Overall quality of the answer was an additional 2.5 points.
Modified: 23 May 2008