HTML
This reading accompanies Chapter 2 of your book. That chapter does a nice job of introducing bits of HTML as needed. Read that first and feel free to work through their presentation. Here's an organized summary of what we learned, plus a bit more.
If you don't have Chapter 2, here's a link to a copy of a scan of the chapter. To respect the author's copyright, the link is only valid on-campus or with a password. Ask Scott if you don't have that password.
- Languages
- HTML template
- Tags
- Meaningless Tags
- Chrome Developer
- Some other useful tags:
- Here links
- The ALT Attribute
- Figures
- Comments
- Comment Syntax
- Validation of HTML Code
- Icon Declaring Validation
- The need for meaningful tags
- Semantic Tags
- Which Tag to Use?
- URLs
- URL Comparison
- Relationship Rules
- Ottergram
- Fragments
- Hyperlinks as Clickable Elements
- The End
Languages¶
We learned that web pages are written using three languages
- HTML, which is the skelton and organs
- CSS, the skin and clothes. We'll look at that in the next chapter.
- JavaScript, which defines the behavior. We'll get to that later.
HTML template¶
Our basic page had the following template:
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>Ottergram</title>
</head>
<body>
<header>
<h1>Ottergram</h1>
</header>
</body>
</html>
Tags¶
We learned the following tags. Look at W3Schools or MDN to learn more.
head
holds meta information about the documentmeta
tells the browser the character set. More about this much later in the course. We'll always useutf-8
title
is used for window titles, bookmarks, and is used by search engines. More important than you'd think.body
holds all the contentheader
holds headers and related stuff like logosh1
holds the text of a major headinglink
connects a separate file of CSS rules to an HTML file. The URL of the CSS file is thehref
attribute.ul
is a container for an unordered list (bullet list)li
is a container for a list itemimg
is replaced (a replaced element) with an image loaded from a separate file, specified using thesrc
attribute.a
demarks a clickable hyperlink
Meaningless Tags¶
All the tags above have some kind of meaning associated with them. They
are for some kind of content. However, HTML also comprises two
meaningless tags, span
and div
. A span
demarks some text or other
inline information. (Inline content is stuff like text that fills up a
line before flowing onto the next line.) A div
demarks a big block or
division of a document.
These tags are useful for styling and behavior (attaching JavaScript to them).
Chrome Developer¶
They described the Chrome Developer. This is a really useful tool. I have a demo of the Firefox Developer tools in the videos, and I've created one for the Chrome Developer tools, too. It's just
Some other useful tags:¶
em
to emphasize some text. Typically is italic.strong
which is likeem
but more so. Typically is bold.h2
toh6
for different levels of headersp
for a paragraph. Can't nest or contain other block elements.br
for a line break. Usually avoid this because it can break layoutsol
for an ordered (numbered) list
Tags should be properly nested:
<foo> <bar> </bar> </foo>
not
<foo> <bar> </foo> </bar>
Here links¶
Once, it was very popular on the web to have links like this:
It seemed so clever and intuitive, making the clickable text be the word "here." There are two big problems with this, though:
- Accessibility: Screen-reading software for the blind often will read the text of the links on a page so that the user can easily navigate to other pages. Links like those above read as "here," "here," "here" — useless.
- Indexing: Search engines pay special attention to the click text on a page, since those are often an important clue about the content of the destination page. The links above don't show what the important words are.
So what do you do instead? Just wrap the link tags around important words:
- Here are some apple pie recipes.
- Click here for peach pie recipes.
- Yo, check out the prune pie recipes.
Accessibility is very important in this class, so keep that in mind.
The ALT Attribute¶
An IMG tag looks like this:
<img src="url/of/picture.jpeg" alt="picture of something">
You noticed that we added an ALT attribute to the IMG tag that is a small
piece of text that can be used in place of the image in certain
circumstances. The ALT attribute is an important part of the HTML
standard. Perhaps its most important use supports accessibility.
Unfortunately, not everyone has good enough vision to see the images that
we use in our websites, but that doesn't mean they can't and don't use the
Web. Instead, they (typically) have software that reads a web page to
them, including links. When the software gets to an IMG tag, it reads the
ALT text. If there is no ALT text, it may read the SRC attribute, hoping
there's a hint there, but all too often the SRC attribute is something
like "../images/DCN87372.jpg"
and the visually impaired web
user is left to guess.
Therefore, you should always include a brief, useful value for the ALT attribute. If your page is an image gallery, then your ALT text could be a description of the image. However, describing the image is not, in general, the idea. For example, if the image is a link whose target is made clear by the image, then the ALT text should say something like, "Link to ..." so the user will know what to do with it. The sole exception is for images that are just used for formatting, such as blank pictures that fill areas or colorful bullets for bullet lists. In those cases, in fact, it's better to include an ALT attribute that is empty, so that the user doesn't have to listen to the SRC attribute being read. In both cases, the text should be useful for someone who wants to use your site but isn't sighted. It helps to turn off images and view your site to check.
Furthermore, you should avoid having critical information on your website conveyed only in images. There may be times when it is unavoidable, but to the extent that it is possible, we want our websites to be easily usable by all people, including the blind and visually impaired.
Accessibility is important in modern society. We build ramps as well as stairs, we put cutouts in curbs, and we allocate parking spaces for the handicapped. Indeed, most federal and state government websites are legally required to be accessible, and ALT attributes are just one part of that.
In this class, we expect you to always use the ALT attribute. If you find an image or an example where we've forgotten to use one, please bring it to our attention.
Here is a more thorough discussion of ALT
Figures¶
Now that we know about the img
tag, it's useful to know about a semantic
tag that can be used with it. We can use figure
to surround an img
tag, paired with figcaption
for the caption text:
<figure>
<img src="hermione-granger-256.jpeg" alt="Hermione Granger">
<figcaption>Hermione Granger as played by Emma Watson</figcaption>
</figure>
Here's what it would look like:
Note that images can be used without figure
; a figure
is often used as
in a book, where the text refers to a figure to provide additional
information. Also, the use of figcaption
doesn't remove the obligation
to provide alt
text. Still, this can be a useful tag to know about.
Comments¶
From the very first computer program, programmers have needed to
leave notes
in the code to help themselves and others understand
what's going on or what the code's purpose is. These notes are called
comments. Comments are a part of the program text (they're
not written separately, because then, well, they'd get separated),
but they are ignored by the computer. Comments aren't about what someone
can discover by reading the code, but should cover the background context
of the code, or its goal.
Because it's important to get in the habit of putting comments in your HTML code, we will require comments in this course. At this point, you won't have a lot to say, and that's fine. You will start by labeling each file with its name, your name, the date, and any sources you consulted (such as the source code of other web pages). Think of this as signing your work. Later, when you're designing a website with many coordinated pages, you can use comments on a page to talk about how it fits into the overall plan.
Comment Syntax¶
The HTML comment syntax is a little odd-looking. Here's an example:
<!-- I can say anything I want in a comment. -->
The syntax starts with a left angle bracket <
then an
exclamation point and two hyphens, then the comment (anything you want)
and ends with two hyphens and a right angle bracket >
.
Validation of HTML Code¶
How can you be sure you've followed every nit-picky rule that the HTML
standards committee devised? (The standards committee is the World Wide Web Consortium
or W3C.)
Even if you have memorized all the rules, checking a page would be tedious
and error-prone – perfect for a computer! Fortunately, the W3C
created an HTML validator. You can
validate by supplying a URL, by uploading a file, or even copy/pasting in
some HTML. An HTML validator is an excellent tool to help you debug your
HTML code.
Validation also helps with accessibility. One important aspect of accessibility is having the proper HTML syntax for each page in your site. Visitors with accessibility needs will use the alternative browsers and screen readers, and that software will be aided by syntactically correct HTML. Read the following for a longer discussion of why to validate your HTML pages.
Throughout the semester, if you need to validate a web page, you can find the HTML validator and others in the reference page.
There's a video about validation on the videos page.
Icon Declaring Validation¶
Once you get your page to validate, you can put some HTML code on your
page to give it a seal of approval
, declaring that it is valid (and
what standard it meets). You will see in lab examples of this strategy.
The very cool thing about this icon is that it is clickable, and clicking it will cause the validator to process your page again. Thus, you can modify your page, upload the changes, and click the icon to re-validate it, making validation very easy. In fact, we suggest that you put the icon on your page before it's valid, and use it during your debugging process.
The snippet of code is just the following, so go ahead and copy/paste it into your pages. The code doesn't use anything we don't know, so read it!
<p>
<a href="http://validator.w3.org/check?uri=referer">
<img
src="http://cs.wellesley.edu/~cs204/Icons/valid-html5v2.png"
alt="Valid HTML 5"
title="Valid HTML 5"
height="31" width="88">
</a>
</p>
The need for meaningful tags¶
As we've said, HTML was designed to structure the content of a web page.
That explains the existence of tags like <p>
,
<h1>
, <ol>
, etc. However, when web
developers started creating pages with a lot of content, it became clear
that to make better use of the available screen space, a way to organize
the page content in bigger chunks was needed. Then, CSS could be used to
arrange their position on the page. Therefore, the tag
<div>
was born (short for division), which is
currently the most used (and overused) tag in every webpage. While this
seemed to have solved the page layout problem, HTML code became difficult
to understand, other computer programs (e.g. search engines) couldn't make
sense of all divs in a page, if they wanted to use the organization of the
page for inferring the meaning of the content.
HTML5 introduced a series of new tags that have meaningful names and can be used universally to express what the content is about, beyond the existing simple tags. Additionally, to make the pages more alive with different kinds of content, several new tags that allow content to be embedded in a page were also added. In the following, we will give a short summary of some of these tags. Try to make use of them in your pages. They will make your code better and more readable to the programs of the future.
Semantic Tags¶
Here is a list of new HTML5 tags that are known as semantic tags, because their names have specific meaning.
Tag Name | Short Description |
---|---|
<header> |
Specifies a header for a document or section. |
<footer> |
Specifies a footer for a document or section. |
<section> |
Defines sections in a document (e.g. chapters). |
<nav> |
Defines a set of navigation links. |
<aside> |
Defines content which is relevant but not central (e.g. callouts, sidebars). |
<main> |
Defines the main content of a page. |
<article> |
Defines independent, self-contained content (e.g., blog post, news story). |
<abbr> |
Indicates an abbreviation or acronym.
See an example in action in the paragraph below for the word W3C. |
<figure> |
Indicates an figure or other graphical content |
<figcaption> |
A caption inside a figure element |
Which Tag to Use?¶
Given all the tags listed above, along with DIV, you might feel bewildered as to which one to use. Here is a helpful HTML5 sectioning flowchart from html5doctor.com . Click on the image to see a larger version:
URLs¶
The book doesn't cover URLs at all in Chapter 2, but they are important, so we'll learn about them here.
URLs are used within websites and web applications to connect pieces together, including
- supporting files (CSS, JavaScript),
- images, and
- links to other web pages
There are two main kinds of URLs:
- Absolute URLs work from anywhere
- Relative URLs work when starting from the current page
Examples of Absolute URLs:
https://www.wellesley.edu/cs/curriculum
includes protocol, domain and path//www.wellesley.edu/cs/curriculum
uses default protocol (same as referencing page)/cs/curriculum
uses default protocol and domain
Examples of Relative URLs:
curriculum
is a sibling of current pagecs/curriculum
is a child of a sibling..
is a parent folder, but you'd never do that in practice../mas
is a sibling of a parent
Here's what a site might look like:
URL Comparison¶
Again, there are two kinds of URLs: relative and absolute:
- absolute URLs start with a slash (or
http
orhttps
) and specify the same destination regardless of starting location - relative URLs start with a name (or
..
) and specify a destination as a series of steps from the starting location
The advantage of an absolute URL is that it continues to work even if the referring page moves (but not the one it refers to). If we think of a hyperlink as an arrow, the referring page or "source" is the tail of the arrow, the starting point. It's the page that has the hyperlink on it. The head of the arrow is the "destination" or the page we end up at when we click the link.
Relative URLs have the advantage that if the starting file and ending file are moved together to a different place, but continue to share the same relationship (for example, they are in the same folder), then the relative URL will continue to work after they are moved, while an absolute URL will necessarily break.
That might sound odd, but it's actually really common. Suppose that a related set of pages is in one folder and we decide to move the whole folder to another place, on this server or even another server. If we do that, any absolute URLs from one page to another within that folder break. But relative URLs will continue to work, because the relationship between the source and destination of the hyperlink is preserved.
Relationship Rules¶
Here are the rules for relative URLs, based on the relationship between the two files (source and destination).
- a bare name, like
fred.html
is a file or folder in the same folder as the starting point. - a slash means to go down into a folder. So
stuff/fred.html
means thatstuff
is a folder in the current folder (by rule 1) andfred.html
is insidestuff
- a
..
means to climb out of a folder and go to the parent folder. So../fred.html
means thatfred.html
is in the folder above the starting point.
These rules can be combined to yield long relative URLs like
../../africa/botswana.html
which is a file in the africa
folder that
is two folders above this one.
If you'd like, you can learn more about URLs
Ottergram¶
At the end of the chapter, the final page's HTML code is the following:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>ottergram</title>
<link rel="stylesheet" href="stylesheets/styles.css">
</head>
<body>
<header>
<h1>ottergram</h1>
</header>
<ul>
<li>
<a href="#">
<img src="img/otter1.jpg" alt="Barry the Otter">
<span>Barry</span>
</a>
</li>
<li>
<a href="#">
<img src="img/otter2.jpg" alt="Robin the Otter">
<span>Robin</span>
</a>
</li>
<li>
<a href="#">
<img src="img/otter3.jpg" alt="Maurice the Otter">
<span>Maurice</span>
</a>
</li>
<li>
<a href="#">
<img src="img/otter4.jpg" alt="Lesley the Otter">
<span>Lesley</span>
</a>
</li>
<li>
<a href="#">
<img src="img/otter5.jpg" alt="Barbara the Otter">
<span>Barbara</span>
</a>
</li>
</ul>
</body>
</html>
You can view that initial version of ottergram in your browser.
Let's take a moment to talk about the HTML.
- The bulk of the page is a list. Most web pages will not be like that, and indeed, neither will the finished Ottergram.
- Each list item is a hyperlink with the odd URL of
#
. That's a fragment identifier, and they use it for an odd reason that we'll talk about below. - Each hyperlink is wrapped around an
img
and aspan
. The span is essentially the caption of the image. Better semantic markup might have usedfigure
andfigcaption
.
Fragments¶
We discussed URLs above, but I left something out:
fragments. Notice the hyperlink at the beginning of this paragraph. It
goes to a particular place on this page. It's able to do so because
the destination has an id
attribute, and the URL specifies that id
after the #
character.
In general, a URL looks like https://machine.domain/path/to/file.html#fragment_id
The URLs in Ottergram consist solely of the #
, so there's no
fragment ID and the filename is missing as well. What that hyperlink
does is go to the same page. Try it!
Hyperlinks as Clickable Elements¶
A hyperlink to the same page we are on is not very useful. However, it is clickable. Later, in chapter 6, we're going to write some JavaScript code to hijack the click behavior and do something interesting and useful. For now, we just wanted to create a page with clickable elements. That's what these do.
The End¶
This is just the beginning of HTML. There's a lot more you could learn, but this will do for now.