Very often in a website, there is repeated material on each page. For
example, each of the lecture pages in this course has a navbar
at the top
that
includes the links to the major components of the website.
You certainly could copy/paste the repeated items to each page, but that makes maintenance very difficult: if you decide to change the common content, you have to edit every page, and you have to make sure you don't miss any and that you edit them all in the same way. For example, adding a new section to your website that is accessible from the navbar would require changing every single page! For a small website, this is bad, but for a large website, it is completely unreasonable.
A solution is to break your page into pieces, each stored in a different file, and have the server cobble them together whenever it responds to a request for your web page. This technology is called Server Side Includes or SSI. Exactly how to use it depends on the server software. On the CS web server, we use Apache to serve web pages. For more detail, you can read the Apache tutorial on Server Side Includes.
Suppose we have three pages, A, B, and C, all of which should share a common header and footer, each of which is in a separate file. These shared pieces will be included into the main pages. Thus, we have five files, none of which is a complete page:
In the list above, there are two kinds of files:
.html
This kind of file includes
another file as part of itself. The .html
files are the container
files, so they have all the usual infrastructure: They start with
a doctype
element and have tags
like html
, head
, and so forth.
.part
This kind of file is included
in another file. It's just a fragment of a web page, such as just the
navbar or just the footer. It doesn't have
a doctype
element, thus, it is not a valid HTML document.
In the source code of the HTML files A, B, and C, we put a special marker, called a directive, that the server will look for and replace with the contents of the included files. The directive looks like an HTML comment, so that if the SSI doesn't happen for some reason, the directive won't otherwise mess up your code. Let's look at what page A's source code looks like:
<!--#include virtual="header.part" --> <p>Here is some content that is unique to <strong>page A</strong> of the website. <!--#include virtual="footer.part" -->
The text in quotes after the word virtual
is a relative URL for the
file to be included at that point in the main file.
Note that if you View Source
on ssi/A.html
, you
won't see these directives. Instead, you'll see the included content.
The server obeys the directives and constructs the page from the parts and
sends it out, and the browser has no way of knowing what the original file
was, or even that SSI was involved at all. SSI is completely invisible.
In order for the server to know that it has to do some extra work in
cases when the files contain directives,
we must make the
file executable. Files on the server can be readable,
writable, executable, and any combination of these three operations.
These are known as file permissions
and can be set through the
SFTP interface (CyberDuck, Fetch, WinSCP, FileZilla etc).
We will always have to set the execution bit on, in order for SSI directives to work. Figure 1 shows how the folder for our example looks like. All HTML files have an extra x value in their list of permissions. We will show in lab how you can do that for your files.
Note that it's the web server that does the work, so SSI will not work when viewing files on your local desktop.
Here is a copy of A.html as it appears on the server. If you view the source, you can see what the code looks like before the directives are included.
One common error is to put a space after the comment syntax:
<!-- #include ... -->
Unfortunately, this is treated as an ordinary comment, not an SSI directive, so be sure you do it exactly like this:
<!--#include virtual="relative/path/to/file.part" -->
Second, if you make a mistake in your SSI syntax or in the relative URL, you'll see something like the following in your main file, when you view it on the web:
[an error occurred while processing this directive]
For an example of this, see the file C-broken.html, in which we intentionally put in a typo, changing the line
<!--#include virtual="header.part" -->
to
<!--#include virtual="jeader.part" -->so that you can see such an error in action.
Finally, if you make a change to your .part
file, you may
not see any change when viewing the .html
file in your
browser. This is very confusing and frustrating: didn't you change that
web page?!
This happens because the server told the browser that
the .html
file hadn't changed (which is true), and so the
browser saved time by just showing you the copy in the cache, instead of
re-downloading the .html
file. On many browsers,
you can tell the browser to ignore the cache and
really reload the file by holding down the shift
key while
reloading the file (in a Mac: Shift + Command + R).
You also need to make sure that your .part
file is
accessible (readable by all) from the web.
Just as with URLs, you should use relative paths in
the virtual="path"
part of the SSI directive, so that they
will still work when you move your website to another server. (The
server should be one that supports SSI.)
However, the issue of relative paths can be a little tricky. Suppose
that the file harry.html
uses SSI to include a
file items/firebolt.part
. Furthermore,
the firebolt.part
file in the items
subdirectory has some code referring to Harry's
friend ron.html
. Is the URL for ron.html
relative to the HTML file or the PART file?
The answer is that it's relative to the HTML file. This is because the relative linking is done by the browser, which is completely unaware of the use of SSI, since all the SSI magic happens on the server, before the browser even gets to see the page. SSI is like dynamic copy/paste, so the code in a part file is just the same as it would be in the HTML file.
Overall, SSI is one of the right approaches to avoid redundancy in websites (there are others we won't cover). Done correctly, a visitor to your site will never know that you are using SSI; it's just the case that content that should be consistent from page to page (banner, nav bar, header, footer, and so forth) is indeed consistent.
However, SSI is often a pain for the people developing the website. The reason is that SSI is only done by the server. That means that if you have downloaded an HTML page from the server and are editing it locally on your desktop, and you view the local copy in the browser, all the SSI content will be missing. You won't see that banner, nav bar, and all that. If you view the source, you'll just see the comment. The layout of your page may be messed up because that content is missing.
So, what's the best thing to do? Our recommendation is the following.
(For concreteness, assume you are editing fred.html
and
that page includes nav.part
. )
fred.html
, download
both fred.html
and nav.part
.
nav.part
into the correct
place in fred.html
.
nav.part
. If you discover that that code needs
editing, you have to realize that you are essentially simultaneously
editing all the pages in your site. So, do the following:
fred.html
and test that it
works.
nav.part
file.
nav.part
file to the server,
and
nav.part
.
nav.part
file to their own working copy.
fred.html
, delete the copy of nav.part
before uploading fred.html
to the server. Otherwise,
you'll end up with two copies of the content on that page!
Yes, this is a bit tedious, but hopefully you will rarely discover that
the code from nav.part
needs to be edited, so you'll only
have the extra steps of copy/pasting and subsequently deleting.
Once you've implemented SSI, an alternative to working on your files locally is to edit your files directly on the server. This way, when you're making edits, you can view your files live on the server, and the SSI will work. No copy/pasting necessary.
To do this in CyberDuck, connect to the server and locate the file you want to edit, right click it and choose "Edit With ... Atom."
The file will appear in Atom, but you'll be editing the version on the
server. (Well, actually a local copy, but CyberDuck ensures that anytime
you save the file, the local copy is immediately copied to the server.)
You'll view the changed file using the
usual http://cs.wellesley.edu/~username/
URL, not
a file:///
URL.
Since SSI is done by the server, using it depends on the server software. SSI is supported by Apache and lighthttpd, but not by IIS, which is Microsoft's web server software. Apache is currently the most widely used web server software, but your client's web hosting site may not use it.
Should this occur, you could try to replace the SSI code with
JavaScript, but that may not be accessible for all users (for example,
screen readers may disable JavaScript code). An alternative is to
simply copy/paste the .part
code in
the .shtml
, but that loses all of the advantages of using
SSI. A last alternative is to use a more powerful and portable
server-side scripting language, such as PHP, but that's outside the
scope of this course.
To do SSI, you must:
.partfile (the one you'll be including in other files)
.htmlfiles
.htmlfile. You may need to use
shift+reloadin your browser to ignore the cache when refreshing the page