URLs

A companion reading for the HTML reading, since both URLs and HTML are necessary to fully understand the first version of Ottergram. But it's helpful to separate the two concepts.

URLs

One of the most important differences between books and web pages is hyperlinks. Sure, books can have references to other books (or even other sections of the same book), but you can't click on them and go there. The way we specify where to go is by a URL.

URLs are used within websites and web applications to connect pieces together, including

  • supporting files (CSS, JavaScript),
  • images, and
  • links to other web pages (hyperlinks)

With URLs, the referring file is usually an HTML page. The referenced file could be a supporting file (CSS files, JavaScript files), image files (the src attribute of a img tag could be a JPEG, GIF, or PNG file) or another HTML file (particularly for hyperlinks, the href attribute of the A tag). We'll use this "referring"/"referenced" terminology when necessary below.

There are two main kinds of URLs:

  • Absolute URLs work from anywhere
  • Relative URLs work when starting from the current page

Absolute URLs can be pasted into a browser and work correctly (go to the right place), and they can be embedded on any webpage on any server and they work correctly. That's what I meant by "work from anywhere".

Some examples of absolute URLs:

An absolute URL includes all the following information:

  • protocol (usually https://)
  • domain (such as www.wellesley.edu or www.nytimes.com). This is the name of the server.
  • path (such as /cs/curriculum or /2022/09/05/science/hail-weather-climate.html)
  • file (curriculum or hail-weather-climate.html)

Examples of Relative URLs (from the CS fun page):

  • cirque1.jpg
  • cookieparty.jpg
  • frisbee.jpg

Those URLS are all simple filenames, which means that the file has to be in the same folder as the HTML page (and on the same server)

What if the referenced file isn't in the same folder as the referring file? There are some simple rules that we'll learn below. But situations where both files are in the same folder are very common, so it's nice that this rule is so simple.

URL Comparison

Again, there are two kinds of URLs: relative and absolute:

  • absolute URLs start with http or https1 and specify the same destination regardless of starting location
  • relative URLs start with a name (or ..) and specify a destination as a series of steps from the starting location

If both the referring and referenced files are on different servers, you must use an absolute URL. If they are on the same server, you can use either an absolute or a relative URL.

Pros and Cons

To illustrate this, imagine a tiny website consisting of one file, A, and another file, B. The A file might be an HTML file and the B file might be one picture displayed on the page. We plan to add more pictures, so to keep them organized, they are all in a folder F2. The whole website is in a folder F1 which contains A and the sub-folder F2. Here's what it might look like:

directory structure for a tiny website

Tiny website contained in a folder F1 which has two items: a file A and folder F2 which contains a second file B

Because A will display B, it must use either a relative or absolute pathname to load and display B. This pathname is the red arrow in the picture above.

<img src="/F0/F1/F2/B" alt="B picture">
<img src="F2/B" alt="B picture">

Suppose we later decide to archive this webpage by moving (like mv) the A file into a new folder, old. The absolute link would still work, because we didn't move B and the folder it was in. The relative link would fail, because the relationship has changed. The following picture illustrates that:

directory structure after moving 'A' to 'old'

The tiny website after moving A into a folder old

However, that would be an odd and unlikely way to archive the webpage. Instead, we'd probably move the entire F1 folder into old, like this:

directory structure after moving the whole site to 'old'

The tiny website after moving the whole site (folder F1) into a folder old

Now, it's the relative link that still works, so the whole website still works, even after being moved.

In general, the advantage of an absolute URL is that it continues to work even if the referring file (page) moves but not the referenced file). If we think of a hyperlink as an arrow, the referring page or "source" is the tail of the arrow, the starting point. It's the page that has the hyperlink on it. The head of the arrow is the "destination" or the page we end up at when we click the link.

Relative URLs have the advantage that if the starting file and ending file are moved together to a different place, but continue to share the same relationship (for example, they are in the same folder), then the relative URL will continue to work after they are moved, while an absolute URL to a file that has moved will necessarily break.

That might sound odd, but it's actually really common. Suppose that a related set of pages is in one folder (possibly with subfolders) and we decide to move the whole folder to another place, on this server or even another server. If we do that, any absolute URLs from one page to another within that folder will break. But relative URLs will continue to work, because the relationship between the source and destination of the hyperlink is preserved.

(When I'm modifying web pages but I want to keep the original, I'm constantly copying an entire directory tree and then modifying the copy. As long as all the relationships are maintained, the relative URLs all work!)

Relationship Rules

Here are the rules for relative URLs, based on the relationship between the two files (referring/source and referenced/destination).

  1. a bare name, like fred.html is a file or folder in the same folder as the starting point.
  2. a slash means to go down into a folder. So stuff/fred.html means that stuff is a folder in the current folder (by rule 1) and fred.html is inside stuff
  3. a .. means to climb out of a folder and go to the parent folder. So ../fred.html means that fred.html is in the folder above the starting point.

These rules can be combined to yield long relative URLs like ../../africa/botswana.html which is a file in the africa folder that is two folders above this one (the one that the referring file is in).

Here's what a site might look like, with the leaves (barry.html, robin.html, beatles.html, stones.html) being particular HTML files and the upper nodes being folders.

example directory tree

With that directory structure, the follow are correct relative urls, starting from the given referring file.

  • robin.html starting from the barry.html file.
  • beatles.html starting from the stones.html file.
  • ../bands/stones.html starting from the robin.html file
  • ../otters/barry.html starting from the stones.html file

The ones with .. work because one .. goes up one level, and then you are just telling the browser to go to a sibling (bands or otters). The rest of the relative URL navigates down to the particular file.

If we use relative URLs like these, and we move the entire favorites folder to another place, all the relative URLs will still work.

If you'd like, you can learn more about URLs

Absolute versus Relative URLs

  • if the referenced file is on another server, the referring file must use an absolute URL; a relative URL can't refer to a file on another server.
  • if the two files are on the same server, you can use either an absolute or a relative URL. How to choose?
    • use a relative URL if the two pages are part of the "same site" and would be moved together (maintaining their relationship) if the site were to move.
    • use an absolute URL if the two pages are part of different sites and would probably not be moved together.

For example, pages within the cs204 site all use relative URLs for hyperlinks, but if one of those pages were to reference, say, the cs304 site, I would use an absolute URL.

Here's a metaphor that might help.

  • Imagine that our classroom and my office, both in the Science Center, are part of the same site (building)
  • To help you find my office, I could give:
    • GPS coordinates.
      • This is like an absolute URL -- the same from anyplace.
    • directions relative to our classroom (exit the classroom, take a left, go down the hall...).
      • This is like a relative URL
  • Both of these work fine, until we decide to move the entire Science Center to a new, warmer? location.
    • the GPS coordinates fail
    • the directions continue to work
  • Conclusion: when things will move together, maintaining their relationship, relative URLs are better.

Summary

We learned about URLs

  • Absolute URLs: completely specify the location of a particular file on the global internet
  • Relative URLs, and the rules for relative paths:
    • file or folder name when it's in the same folder
    • a slash when it's in a subfolder
    • a ../ when it's in the parent folder
  • Relative URLS are better when files will move together, maintaining the same relative location, if a website is moved to another location.
  • Relative URLs are the way to go for parts of a website, while absolute relative URLs are necessary for links to external resources.

  1. There's a variant kind of URL that starts with a slash rather than https and it means an absolute path on the current server. This is exactly like the absolute paths we learned about in the Unix reading. However, we won't be using this kind of URL in CS 204.