HTML

This reading accompanies Chapter 2 of your book. That chapter does a nice job of introducing bits of HTML as needed. Read that first and feel free to work through their presentation. Here's an organized summary of what we learned, plus a bit more.

If you don't have Chapter 2, here's a link to a copy of a scan of the chapter. To respect the author's copyright, the link is only valid on-campus or with a password. Ask Scott if you don't have that password.

FEWD Chapter 2

Languages

We learned that web pages are written using three languages

  • HTML, which is the skelton and organs
  • CSS, the skin and clothes. We'll look at that in the next chapter.
  • JavaScript, which defines the behavior. We'll get to that later.

HTML template

Our basic page had the following template:

<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Ottergram</title>
    </head>
    <body>
        <header>
            <h1>Ottergram</h1>
        </header>
    </body>
</html>

Tags

We learned the following tags. Look at W3Schools or MDN to learn more.

  • head holds meta information about the document
  • meta tells the browser the character set. More about this much later in the course. We'll always use utf-8
  • title is used for window titles, bookmarks, and is used by search engines. More important than you'd think.
  • body holds all the content
  • header holds headers and related stuff like logos
  • h1 holds the text of a major heading
  • link connects a separate file of CSS rules to an HTML file. The URL of the CSS file is the href attribute.
  • ul is a container for an unordered list (bullet list)
  • li is a container for a list item
  • img is replaced (a replaced element) with an image loaded from a separate file, specified using the src attribute.
  • a demarks a clickable hyperlink

Meaningless Tags

All the tags above have some kind of meaning associated with them. They are for some kind of content. However, HTML also comprises two meaningless tags, span and div. A span demarks some text or other inline information. (Inline content is stuff like text that fills up a line before flowing onto the next line.) A div demarks a big block or division of a document.

These tags are useful for styling and behavior (attaching JavaScript to them).

Chrome Developer

They described the Chrome Developer. This is a really useful tool. I have a demo of the Firefox Developer tools in the videos, and I've created one for the Chrome Developer tools, too. It's just

Some other useful tags:

  • em to emphasize some text. Typically is italic.
  • strong which is like em but more so. Typically is bold.
  • h2 to h6 for different levels of headers
  • p for a paragraph. Can't nest or contain other block elements.
  • br for a line break. Usually avoid this because it can break layouts
  • ol for an ordered (numbered) list

Tags should be properly nested:

<foo> <bar> </bar> </foo>

not

<foo> <bar> </foo> </bar>

Once, it was very popular on the web to have links like this:

  • Click here for apple pie recipes
  • Click here for peach pie recipes
  • Click here for prune pie recipes

It seemed so clever and intuitive, making the clickable text be the word "here." There are two big problems with this, though:

  • Accessibility: Screen-reading software for the blind often will read the text of the links on a page so that the user can easily navigate to other pages. Links like those above read as "here," "here," "here" — useless.
  • Indexing: Search engines pay special attention to the click text on a page, since those are often an important clue about the content of the destination page. The links above don't show what the important words are.

So what do you do instead? Just wrap the link tags around important words:

Accessibility is very important in this class, so keep that in mind.

The ALT Attribute

An IMG tag looks like this:

<img src="url/of/picture.jpeg" alt="picture of something">

You noticed that we added an ALT attribute to the IMG tag that is a small piece of text that can be used in place of the image in certain circumstances. The ALT attribute is an important part of the HTML standard. Perhaps its most important use supports accessibility. Unfortunately, not everyone has good enough vision to see the images that we use in our websites, but that doesn't mean they can't and don't use the Web. Instead, they (typically) have software that reads a web page to them, including links. When the software gets to an IMG tag, it reads the ALT text. If there is no ALT text, it may read the SRC attribute, hoping there's a hint there, but all too often the SRC attribute is something like "../images/DCN87372.jpg" and the visually impaired web user is left to guess.

Therefore, you should always include a brief, useful value for the ALT attribute. If your page is an image gallery, then your ALT text could be a description of the image. However, describing the image is not, in general, the idea. For example, if the image is a link whose target is made clear by the image, then the ALT text should say something like, "Link to ..." so the user will know what to do with it. The sole exception is for images that are just used for formatting, such as blank pictures that fill areas or colorful bullets for bullet lists. In those cases, in fact, it's better to include an ALT attribute that is empty, so that the user doesn't have to listen to the SRC attribute being read. In both cases, the text should be useful for someone who wants to use your site but isn't sighted. It helps to turn off images and view your site to check.

Furthermore, you should avoid having critical information on your website conveyed only in images. There may be times when it is unavoidable, but to the extent that it is possible, we want our websites to be easily usable by all people, including the blind and visually impaired.

Accessibility is important in modern society. We build ramps as well as stairs, we put cutouts in curbs, and we allocate parking spaces for the handicapped. Indeed, most federal and state government websites are legally required to be accessible, and ALT attributes are just one part of that.

In this class, we expect you to always use the ALT attribute. If you find an image or an example where we've forgotten to use one, please bring it to our attention.

Here is a more thorough discussion of ALT

Figures

Now that we know about the img tag, it's useful to know about a semantic tag that can be used with it. We can use figure to surround an img tag, paired with figcaption for the caption text:

<figure>
    <img src="hermione-granger-256.jpeg" alt="Hermione Granger">
    <figcaption>Hermione Granger as played by Emma Watson</figcaption>
</figure>

Here's what it would look like:

Hermione Granger
Hermione Granger as played by Emma Watson

Note that images can be used without figure; a figure is often used as in a book, where the text refers to a figure to provide additional information. Also, the use of figcaption doesn't remove the obligation to provide alt text. Still, this can be a useful tag to know about.

Comments

From the very first computer program, programmers have needed to leave notes in the code to help themselves and others understand what's going on or what the code's purpose is. These notes are called comments. Comments are a part of the program text (they're not written separately, because then, well, they'd get separated), but they are ignored by the computer. Comments aren't about what someone can discover by reading the code, but should cover the background context of the code, or its goal.

Because it's important to get in the habit of putting comments in your HTML code, we will require comments in this course. At this point, you won't have a lot to say, and that's fine. You will start by labeling each file with its name, your name, the date, and any sources you consulted (such as the source code of other web pages). Think of this as signing your work. Later, when you're designing a website with many coordinated pages, you can use comments on a page to talk about how it fits into the overall plan.

Comment Syntax

The HTML comment syntax is a little odd-looking. Here's an example:

<!-- I can say anything I want in a comment.  -->

The syntax starts with a left angle bracket < then an exclamation point and two hyphens, then the comment (anything you want) and ends with two hyphens and a right angle bracket >.

Validation of HTML Code

How can you be sure you've followed every nit-picky rule that the HTML standards committee devised? (The standards committee is the World Wide Web Consortium or W3C.) Even if you have memorized all the rules, checking a page would be tedious and error-prone – perfect for a computer! Fortunately, the W3C created an HTML validator. You can validate by supplying a URL, by uploading a file, or even copy/pasting in some HTML. An HTML validator is an excellent tool to help you debug your HTML code.

Validation also helps with accessibility. One important aspect of accessibility is having the proper HTML syntax for each page in your site. Visitors with accessibility needs will use the alternative browsers and screen readers, and that software will be aided by syntactically correct HTML. Read the following for a longer discussion of why to validate your HTML pages.

Throughout the semester, if you need to validate a web page, you can find the HTML validator and others in the reference page.

There's a video about validation on the videos page.

Icon Declaring Validation

Once you get your page to validate, you can put some HTML code on your page to give it a seal of approval, declaring that it is valid (and what standard it meets). You will see in lab examples of this strategy.

The very cool thing about this icon is that it is clickable, and clicking it will cause the validator to process your page again. Thus, you can modify your page, upload the changes, and click the icon to re-validate it, making validation very easy. In fact, we suggest that you put the icon on your page before it's valid, and use it during your debugging process.

The snippet of code is just the following, so go ahead and copy/paste it into your pages. The code doesn't use anything we don't know, so read it!

<p>
  <a href="http://validator.w3.org/check?uri=referer">
     <img 
       src="http://cs.wellesley.edu/~cs204/Icons/valid-html5v2.png"
       alt="Valid HTML 5"
       title="Valid HTML 5"  
       height="31" width="88">
  </a> 
</p>

The need for meaningful tags

As we've said, HTML was designed to structure the content of a web page. That explains the existence of tags like <p>, <h1>, <ol>, etc. However, when web developers started creating pages with a lot of content, it became clear that to make better use of the available screen space, a way to organize the page content in bigger chunks was needed. Then, CSS could be used to arrange their position on the page. Therefore, the tag <div> was born (short for division), which is currently the most used (and overused) tag in every webpage. While this seemed to have solved the page layout problem, HTML code became difficult to understand, other computer programs (e.g. search engines) couldn't make sense of all divs in a page, if they wanted to use the organization of the page for inferring the meaning of the content.

HTML5 introduced a series of new tags that have meaningful names and can be used universally to express what the content is about, beyond the existing simple tags. Additionally, to make the pages more alive with different kinds of content, several new tags that allow content to be embedded in a page were also added. In the following, we will give a short summary of some of these tags. Try to make use of them in your pages. They will make your code better and more readable to the programs of the future.

Semantic Tags

Here is a list of new HTML5 tags that are known as semantic tags, because their names have specific meaning.

Tag Name Short Description
<header> Specifies a header for a document or section.
<footer> Specifies a footer for a document or section.
<section> Defines sections in a document (e.g. chapters).
<nav> Defines a set of navigation links.
<aside> Defines content which is relevant but not central (e.g. callouts, sidebars).
<main> Defines the main content of a page.
<article> Defines independent, self-contained content (e.g., blog post, news story).
<abbr> Indicates an abbreviation or acronym.

<abbr title="United Nations">UN</abbr>

See an example in action in the paragraph below for the word W3C.

<figure> Indicates an figure or other graphical content
<figcaption> A caption inside a figure element

Which Tag to Use?

Given all the tags listed above, along with DIV, you might feel bewildered as to which one to use. Here is a helpful HTML5 sectioning flowchart from html5doctor.com . Click on the image to see a larger version:

HTML5 Sectioning Flowchart

Click on the image to see a larger version

URLs

The book doesn't cover URLs at all in Chapter 2, but they are important, so we'll learn about them here.

URLs are used within websites and web applications to connect pieces together, including

  • supporting files (CSS, JavaScript),
  • images, and
  • links to other web pages

There are two main kinds of URLs:

  • Absolute URLs work from anywhere
  • Relative URLs work when starting from the current page

Examples of Absolute URLs:

  • https://www.wellesley.edu/cs/curriculum includes protocol, domain and path
  • //www.wellesley.edu/cs/curriculum uses default protocol (same as referencing page)
  • /cs/curriculum uses default protocol and domain

Examples of Relative URLs:

  • curriculum is a sibling of current page
  • cs/curriculum is a child of a sibling
  • .. is a parent folder, but you'd never do that in practice
  • ../mas is a sibling of a parent

Here's what a site might look like:

example directory tree

URL Comparison

Again, there are two kinds of URLs: relative and absolute:

  • absolute URLs start with a slash (or http or https) and specify the same destination regardless of starting location
  • relative URLs start with a name (or ..) and specify a destination as a series of steps from the starting location

The advantage of an absolute URL is that it continues to work even if the referring page moves (but not the one it refers to). If we think of a hyperlink as an arrow, the referring page or "source" is the tail of the arrow, the starting point. It's the page that has the hyperlink on it. The head of the arrow is the "destination" or the page we end up at when we click the link.

Relative URLs have the advantage that if the starting file and ending file are moved together to a different place, but continue to share the same relationship (for example, they are in the same folder), then the relative URL will continue to work after they are moved, while an absolute URL will necessarily break.

That might sound odd, but it's actually really common. Suppose that a related set of pages is in one folder and we decide to move the whole folder to another place, on this server or even another server. If we do that, any absolute URLs from one page to another within that folder break. But relative URLs will continue to work, because the relationship between the source and destination of the hyperlink is preserved.

Relationship Rules

Here are the rules for relative URLs, based on the relationship between the two files (source and destination).

  1. a bare name, like fred.html is a file or folder in the same folder as the starting point.
  2. a slash means to go down into a folder. So stuff/fred.html means that stuff is a folder in the current folder (by rule 1) and fred.html is inside stuff
  3. a .. means to climb out of a folder and go to the parent folder. So ../fred.html means that fred.html is in the folder above the starting point.

These rules can be combined to yield long relative URLs like ../../africa/botswana.html which is a file in the africa folder that is two folders above this one.

If you'd like, you can learn more about URLs

Ottergram

At the end of the chapter, the final page's HTML code is the following:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <title>ottergram</title>
    <link rel="stylesheet" href="stylesheets/styles.css">
  </head>
  <body>
    <header>
      <h1>ottergram</h1>
    </header>
    <ul>
      <li>
        <a href="#">
          <img src="img/otter1.jpg" alt="Barry the Otter">
          <span>Barry</span>
        </a>
      </li>
      <li>
        <a href="#">
          <img src="img/otter2.jpg" alt="Robin the Otter">
          <span>Robin</span>
        </a>
      </li>
      <li>
        <a href="#">
          <img src="img/otter3.jpg" alt="Maurice the Otter">
          <span>Maurice</span>
        </a>
      </li>
      <li>
        <a href="#">
          <img src="img/otter4.jpg" alt="Lesley the Otter">
          <span>Lesley</span>
        </a>
      </li>
      <li>
        <a href="#">
          <img src="img/otter5.jpg" alt="Barbara the Otter">
          <span>Barbara</span>
        </a>
      </li>
    </ul>
  </body>
</html>

You can view that initial version of ottergram in your browser.

Let's take a moment to talk about the HTML.

  • The bulk of the page is a list. Most web pages will not be like that, and indeed, neither will the finished Ottergram.
  • Each list item is a hyperlink with the odd URL of #. That's a fragment identifier, and they use it for an odd reason that we'll talk about below.
  • Each hyperlink is wrapped around an img and a span. The span is essentially the caption of the image. Better semantic markup might have used figure and figcaption.

Fragments

We discussed URLs above, but I left something out: fragments. Notice the hyperlink at the beginning of this paragraph. It goes to a particular place on this page. It's able to do so because the destination has an id attribute, and the URL specifies that id after the # character.

In general, a URL looks like https://machine.domain/path/to/file.html#fragment_id

The URLs in Ottergram consist solely of the #, so there's no fragment ID and the filename is missing as well. What that hyperlink does is go to the same page. Try it!

A hyperlink to the same page we are on is not very useful. However, it is clickable. Later, in chapter 6, we're going to write some JavaScript code to hijack the click behavior and do something interesting and useful. For now, we just wanted to create a page with clickable elements. That's what these do.

The End

This is just the beginning of HTML. There's a lot more you could learn, but this will do for now.