URLs¶
A companion reading for the HTML reading, since both URLs and HTML are necessary to fully understand the first version of Ottergram. But it's helpful to separate the two concepts.
URLs¶
One of the most important differences between books and web pages is hyperlinks. Sure, books can have references to other books (or even other sections of the same book), but you can't click on them and go there. The way we specify where to go is by a URL.
URLs are used within websites and web applications to connect pieces together, including
- supporting files (CSS, JavaScript),
- images, and
- links to other web pages (hyperlinks)
With URLs, the referring file is usually an HTML page. The
referenced file could be a supporting file (CSS files, JavaScript
files), image files (the src
attribute of a img
tag could be a
JPEG, GIF, or PNG file) or another HTML file (particularly for
hyperlinks, the href
attribute of the A
tag). We'll use this
"referring"/"referenced" terminology when necessary below.
There are two main kinds of URLs:
- Absolute URLs work from anywhere
- Relative URLs work when starting from the current page
Absolute URLs can be pasted into a browser and work correctly (go to the right place), and they can be embedded on any webpage on any server and they work correctly. That's what I meant by "work from anywhere".
Some examples of absolute URLs:
- https://www.wellesley.edu/cs/curriculum
- https://www.nytimes.com/2022/09/05/science/hail-weather-climate.html/
- https://stackoverflow.com/
An absolute URL includes all the following information:
- protocol (usually
https://
) - domain (such as
www.wellesley.edu
orwww.nytimes.com
). This is the name of the server. - path (such as
/cs/curriculum
or/2022/09/05/science/hail-weather-climate.html
) - file (
curriculum
orhail-weather-climate.html
)
Examples of Relative URLs (from the CS fun page):
cirque1.jpg
cookieparty.jpg
frisbee.jpg
Those URLS are all simple filenames, which means that the file has to be in the same folder as the HTML page (and on the same server)
What if the referenced file isn't in the same folder as the referring file? There are some simple rules that we'll learn below. But situations where both files are in the same folder are very common, so it's nice that this rule is so simple.
URL Comparison¶
Again, there are two kinds of URLs: relative and absolute:
- absolute URLs start with
http
orhttps
1 and specify the same destination regardless of starting location - relative URLs start with a name (or
..
) and specify a destination as a series of steps from the starting location
If both the referring and referenced files are on different servers, you must use an absolute URL. If they are on the same server, you can use either an absolute or a relative URL.
Pros and Cons¶
To illustrate this, imagine a tiny website consisting of one file,
A
, and another file, B
. The A
file might be an HTML file and the
B
file might be one picture displayed on the page. We plan to add
more pictures, so to keep them organized, they are all in a folder
F2
. The whole website is in a folder F1
which contains A
and the
sub-folder F2
. Here's what it might look like:
Because A
will display B
, it must use either a relative or
absolute pathname to load and display B
. This pathname is the red
arrow in the picture above.
<img src="/F0/F1/F2/B" alt="B picture">
<img src="F2/B" alt="B picture">
Suppose we later decide to archive this webpage by moving (like mv
)
the A
file into a new folder, old
. The absolute link would still
work, because we didn't move B
and the folder it was in. The
relative link would fail, because the relationship has changed. The
following picture illustrates that:
However, that would be an odd and unlikely way to archive the
webpage. Instead, we'd probably move the entire F1
folder into
old
, like this:
Now, it's the relative link that still works, so the whole website still works, even after being moved.
In general, the advantage of an absolute URL is that it continues to work even if the referring file (page) moves but not the referenced file). If we think of a hyperlink as an arrow, the referring page or "source" is the tail of the arrow, the starting point. It's the page that has the hyperlink on it. The head of the arrow is the "destination" or the page we end up at when we click the link.
Relative URLs have the advantage that if the starting file and ending file are moved together to a different place, but continue to share the same relationship (for example, they are in the same folder), then the relative URL will continue to work after they are moved, while an absolute URL to a file that has moved will necessarily break.
That might sound odd, but it's actually really common. Suppose that a related set of pages is in one folder (possibly with subfolders) and we decide to move the whole folder to another place, on this server or even another server. If we do that, any absolute URLs from one page to another within that folder will break. But relative URLs will continue to work, because the relationship between the source and destination of the hyperlink is preserved.
(When I'm modifying web pages but I want to keep the original, I'm constantly copying an entire directory tree and then modifying the copy. As long as all the relationships are maintained, the relative URLs all work!)
Relationship Rules¶
Here are the rules for relative URLs, based on the relationship between the two files (referring/source and referenced/destination).
- a bare name, like
fred.html
is a file or folder in the same folder as the starting point. - a slash means to go down into a folder. So
stuff/fred.html
means thatstuff
is a folder in the current folder (by rule 1) andfred.html
is insidestuff
- a
..
means to climb out of a folder and go to the parent folder. So../fred.html
means thatfred.html
is in the folder above the starting point.
These rules can be combined to yield long relative URLs like
../../africa/botswana.html
which is a file in the africa
folder that
is two folders above this one (the one that the referring file is in).
Here's what a site might look like, with the leaves (barry.html
,
robin.html
, beatles.html
, stones.html
) being particular HTML
files and the upper nodes being folders.
With that directory structure, the follow are correct relative urls, starting from the given referring file.
robin.html
starting from thebarry.html
file.beatles.html
starting from thestones.html
file.../bands/stones.html
starting from therobin.html
file../otters/barry.html
starting from thestones.html
file
The ones with ..
work because one ..
goes up one level, and then
you are just telling the browser to go to a sibling (bands
or
otters
). The rest of the relative URL navigates down to the
particular file.
If we use relative URLs like these, and we move the entire favorites
folder to another place, all the relative URLs will still work.
If you'd like, you can learn more about URLs
Absolute versus Relative URLs¶
- if the referenced file is on another server, the referring file must use an absolute URL; a relative URL can't refer to a file on another server.
- if the two files are on the same server, you can use either an
absolute or a relative URL. How to choose?
- use a relative URL if the two pages are part of the "same site" and would be moved together (maintaining their relationship) if the site were to move.
- use an absolute URL if the two pages are part of different sites and would probably not be moved together.
For example, pages within the cs204
site all use relative URLs for
hyperlinks, but if one of those pages were to reference, say, the
cs304
site, I would use an absolute URL.
Here's a metaphor that might help.
- Imagine that our classroom and my office, both in the Science Center, are part of the same site (building)
- To help you find my office, I could give:
- GPS coordinates.
- This is like an absolute URL -- the same from anyplace.
- directions relative to our classroom (exit the classroom, take a left, go down the hall...).
- This is like a relative URL
- GPS coordinates.
- Both of these work fine, until we decide to move the entire Science Center to a new, warmer? location.
- the GPS coordinates fail
- the directions continue to work
- Conclusion: when things will move together, maintaining their relationship, relative URLs are better.
Summary¶
We learned about URLs
- Absolute URLs: completely specify the location of a particular file on the global internet
- Relative URLs, and the rules for relative paths:
- file or folder name when it's in the same folder
- a slash when it's in a subfolder
- a
../
when it's in the parent folder
- Relative URLS are better when files will move together, maintaining the same relative location, if a website is moved to another location.
- Relative URLs are the way to go for parts of a website, while absolute relative URLs are necessary for links to external resources.
-
There's a variant kind of URL that starts with a slash rather than
https
and it means an absolute path on the current server. This is exactly like the absolute paths we learned about in the Unix reading. However, we won't be using this kind of URL in CS 204. ↩