HTML Coding

(Reading: We strongly suggest you read chapters 1, 3, and 6 of Head First HTML with these notes.)

Today, we'll dig into how to use HTML to structure your web page and talk about some of the abstract concepts behind this language.

Contents:

HTML Elements and Tags

An HTML document is composed out of elements that begin and end with tags. For example:

<title> CS110 HTML Coding </title>
start tag contents end tag

The following are some HTML tags that you have seen. If you forget what a tag does or are looking for a new tag, you can look up tags in an HTML Reference, like the ones you find on the CS110 documentation page.

There are several other tags that you have seen, but still need to learn more about.

We'll learn more about these today.

Most elements have an end tag that matches the start tags. In a few special cases, particularly </p> and </li>, the end tag is optional and may be omitted because the browser can determine it from context.

Some elements (such as <br>, <hr>, and <img>) consist only of a start tag and do not have corresponding end tags or contents. These are called empty elements.

Tags serve as instructions telling the browser how to display the contents of elements. The browser is an interpreter for HTML code; it reads the HTML code and renders its elements based on rules for each kind of tag.

Nesting

Multiple tags can be nested: one fits inside the other like measuring cups or Russian dolls. If we have two tags, fred and barney, they can be nested like this:

<fred>
   Region A
      <barney>
         Region B
      </barney>
   Region C
</fred>

The fred tag applies to all three regions, Region A, Region B and Region C, while barney applies only to Region B that it surrounds. The Region A and Region C only have the fred tag apply to them.

When nesting two tags, the inner tag must be closed before the outer tag is closed. Your browser may not enforce this; it may be forgiving of errors, but you can't be sure that every browser will be so forgiving, so always follow this syntactic rule.

A nice way to see the structure of your document is using Firebug, a plug-in for the Firefox browser. We'll use Firebug a lot more later in the semester, but for now, we might take a minute in lecture to show the structure of a document.

Comments

From the very first computer program, programmers have needed to leave notes in the code to help themselves and others understand what's going on or what the code's purpose is. These notes are called comments. Comments are a part of the program text (they're not written separately, because then, well, they'd get separated), but they are ignored by the computer. Comments aren't about what someone can discover by reading the code, but should cover the background context of the code, or its goal.

Because it's important to get in the habit of putting comments in your HTML code, we will require comments in this course. At this point, you won't have a lot to say, and that's fine. You will start by labeling each file with its name, your name, the date, and any sources you consulted (such as the source code of other web pages). Think of this as signing your work. Later, when you're designing a website with many coordinated pages, you can use comments on a page to talk about how it fits into the overall plan.

HTML Comment Syntax

The HTML comment syntax is a little odd-looking. Here's an example:

<!-- I can say anything I want in a comment.  -->

The syntax starts with a left angle bracket < then an exclamation point and two hyphens, then the comment (anything you want) and ends with two hyphens and a right angle bracket >. You can see other examples of comments by doing View Source on this web page.

Tag Syntax

We learned that tags always begin with a left angle bracket < and close with a right angle bracket >. (Remember that the browser doesn't care whether the tag's name is upper or lower case.) You can provide additional information within the tag to further specify what it does, using attributes.

A good example of the use of attributes is seen with the anchor tag, <a>. To use it as a hyperlink, you have to specify where the link takes us. For example, to have a link that says Yahoo! and takes us to www.yahoo.com, we say:

<a href="http://www.yahoo.com">Yahoo!</a>

The href part is the attribute. For this tag, since the attribute is almost always required, we can almost think of it as an a href tag, but that way of thinking will confuse us later, so it's best to think of it as two separate ideas. Later, we'll see other attributes for the <a> tag.

Exercise 1

Create a file named little.html with the following contents:

<html>
  <head>
    <title>My Little Page</title>     
  </head>
  <body>
  
    <p>This is the text on my little page.

  </body>
</html>

Then add an H1 header and an H2 header to the page.

Now put in a link to your favorite website. Here are some of ours. (You can use View Source to see the URL where they go.)

Remember that (1) on a Mac, you can edit a file using TextWrangler, (2) you should put your HTML code in the BODY element, (3) be sure to end your filename with .html, and (4) you can view a page locally in your browser, using File / Open File.

In the head element of each CS110 lecture page, you'll see some HTML that looks something like:

   <link rel="stylesheet"
         type="text/css"
         href="http://cs.wellesley.edu/~cs110/cs110-main-style.css">

Like the anchor tag, the <link> tag also has an href attribute, and it also links one web page with another. However, it has a different purpose. Instead of producing a clickable link, the <link> tag tells the browser that there is some additional information about this page located in a different file. The href attribute of the <link> tag tells the browser where to find the other file. The href attribute contains a URL, which we'll learn about later.

The <link> tag contains other attributes depending on the purpose of the connection. In our case, it contains:

The <link> tag can be used for a variety of purposes, but most current browsers only use it for style sheets.

At this point, the <link> tag is still a mystery. We've told you what it's for, and something about its syntax, but not what goes in the other file. We'll learn more about the <link> tag, style sheets and CSS in later lectures.

Exercise 2

Modify little.html from the previous exercise by adding the following link element inside the head element:

<link rel="stylesheet"
      type="text/css"
      href="http://cs.wellesley.edu/~cs110/cs110-main-style.css">

How does this affect the appearance of the document?

In the modified document, how can you change the appearance of an unvisited hyperlink from blue text to text that is large and green? (Hint: use a header tag.)

Tag and Attribute Syntax

We can now generalize start tag syntax to include any number of attributes, as follows:

<tag attr1 = "value1" attr2 = "value2" ... attrN = "valueN"> 
   contents
</tag>

Browsers differ on how nit-picky they are about attributes. Many will let you get away with omitting the quotation marks when the value is a single solid word. Others will complain if you have line-breaks in your attributes. In general, it's best to comply with the strictest syntax rules, so that your site will work on the most browsers.

(Note that the newest version of HTML, called XHTML, requires attributes to be in quotation marks, so those who think they may someday switch to XHTML should use quotation marks. For the purposes of this course, you may use either syntax. You'll find that your instructors aren't always consistent.)

Using Images

One thing that we all want to do with our web pages is add pictures. Because the picture file is a separate file, we have to link to it, just like the href attribute of the anchor (<a>) tag.

Once you have an image file, say small_weasel.jpg you can use it on your web page like this:

<img src = "small_weasel.jpg" alt = "a small weasel">

with the following result:

a small weasel

Of course, this only works if the server can find the image file. The src attribute must be the URL of the image file. It can be an absolute URL or a relative URL. What did we use here?

When the browser asks for this page, the server sends it back and it also finds and sends back any image files that the page references. If the server doesn't find the file (or the file is corrupted in some way), the browser will show this:

a small weasel

Depending on your browser, you may see a broken-image icon above, the alt text, or possibly nothing at all.

Exercise 3

Copy the weasel image to a file named small_weasel.jpg in the same folder as little.html. Modify little.html to display an image of the weasel. (Of course, you could replace this image by any other image you'd like!)

Add a title attribute with value "small weasel image" to the image. What does this do?

The ALT Attribute

You noticed that we added an ALT attribute to the IMG tag that is a small piece of text that can be used in place of the image in certain circumstances. The ALT attribute is an important part of the HTML standard. Perhaps its most important use supports accessibility. Unfortunately, not everyone has good enough vision to see the images that we use in our websites, but that doesn't mean they can't and don't use the Web. Instead, they (typically) have software that reads a web page to them, including links. When the software gets to an IMG tag, it reads the ALT text. If there is no ALT text, it may read the SRC attribute, hoping there's a hint there, but all too often the SRC attribute is something like "../images/DCN87372.jpg" and the visually impaired web user is left to guess.

Therefore, you should always include a brief, useful value for the ALT attribute. If your page is an image gallery, then your ALT text could be a description of the image. However, describing the image is not, in general, the idea. For example, if the image is a link whose target is made clear by the image, then the ALT text should say something like, "Link to ..." so the user will know what to do with it. The sole exception is for images that are just used for formatting, such as blank pictures that fill areas or colorful bullets for bullet lists. In those cases, in fact, it's better to include an ALT attribute that is empty, so that the user doesn't have to listen to the SRC attribute being read. In both cases, the text should be useful for someone who wants to use your site but isn't sighted. It helps to turn off images and view your site to check.

Furthermore, you should avoid having critical information on your website conveyed only in images. There may be times when it is unavoidable, but to the extent that it is possible, we want our websites to be easily usable by all people, including the blind and visually impaired.

Accessibility is important in modern society. We build ramps as well as stairs, we put cutouts in curbs, and we allocate parking spaces for the handicapped. Indeed, most federal and state government websites are legally required to be accessible, and ALT attributes are just one part of that.

In this class, we expect you to always use the ALT attribute. If you find an image or an example where we've forgotten to use one, please bring it to our attention.

For more information, you can read the following

Resizing and Aspect Ratio

If you want to display a bunch of pictures, the web page appears neater if the pictures align well. You can align them vertically if they all have the same width, or horizontally if they all have the same height. For example, the following three pictures are all 150 pixels high.

a molehill the Eiffel Tower the Matterhorn

Regardless of the actual dimensions of an image, the browser will squeeze it into a set size if requested. You can do this with two new attributes, namely HEIGHT and WIDTH:

<img src="..." alt="..."  height="height-goes-here"   width="width-goes-here">

Replace the "height-goes-here" and "width-goes-here" with integers specified in pixels, which we'll discuss later. If both width and height are specified, both will be obeyed, but you have to be careful with that. Suppose the original image is 160x240: taller than it is wide. Technically, the ratio of the width to the height is called the aspect ratio. The Eiffel Tower picture has an aspect ratio of 160:240 or 2:3. If you set the height and width so that they don't have the same aspect ratio, the picture will look distorted. Here is the Eiffel Tower with the wrong aspect ratio:

the Eiffel Tower

If you use either the HEIGHT or the WIDTH attributes, but not both, the browser will usually calculate the other attribute so that the aspect ratio is preserved. Thus, the picture will have either the width or height you want, but will not be distorted. That's how we did that row of pictures above.

Exercise 4

Copy the eiffel-tower image to a file named eiffel-tower.jpeg in the same folder as little.html and small_weasel.jpg. Modify little.html to display an image of the weasel next to the tower so that both images have the same height.

Deprecated Tags and Attributes

Take a minute to look at the online reference for the <center> tag. We would give a link directly to it, but we want you to learn how to navigate to find the reference material on any tag. Here's how:

In this context, deprecated means that browsers will still support the <center> tag for the foreseeable future because of the zillions of old web pages that already use it, but that <center> should not be used in new web pages.

Centering is about presentation of the content, and the modern approach is to reserve presentation issues for CSS (style sheets), which we will be covering later in this class.

You should avoid using deprecated tags in this course. We are teaching you the modern, more powerful approach, and we would like you to adopt that style. If you don't know whether a tag is deprecated, check this reference.

Beauty in Websites

Ah, but you object that your web page looks ugly without centering, font changes, colors, and so forth. That may be; we're not going to try to contradict your aesthetic sense. However, for the first week of this course, we don't want to confuse anyone by introducing style sheets right away. So, we ask you to be patient. We will get to style sheets very soon.

In the meantime, consider the fact that many visitors to your site might not gain any advantage from style sheets anyway, because they are visually impaired, or because they are using an old or alternative browser that doesn't honor style sheets. For such users, the most important thing is the content and having that content well-structured and clearly conveyed. So, until we get to style sheets, consider that you are designing your website with accessibility in mind.

Bugs and Debugging

An HTML document may contain many different kinds of errors that prevent it from rendering as you expect. Errors in code are known as bugs, and the process of finding and correcting such errors is called debugging.

Here are some common types of HTML bugs:

Exercise 5

Copy the following buggy HTML code into a file named buggy.html and debug the code until it looks like you think it's intended to look.

<html>
<head>
<title>A Buggy HTML File</title>
</head>
<body>

<h1>A Buggy HTML File</h1>

<p> This HTML file contains several <em>bugs</em> (i.e., errors).
<br>
Can you <em>debug</em> them (i.e., find and fix them)?

<p>Here's an unordered list with three items: 

<ul>

    <li>A <bold>bold</bold> item.</li>

    <li>An ordered sublist with two items: 

        <ol> 

          <p> A CAPITALIZED item 

          <li> An <a href="http://www.wellesley.edu">
                  <em>italic link</a></em>
   
    <li> A final <code>item<code>, using code font.

</ul>

<p>Here's <a href="http://cs.wellesley.edu another link</a>

</body>
</html>

Validation of HTML Code

How can you be sure you've followed every nit-picky rule that the HTML standards committee devised? (The standards committee is the World Wide Web Consortium or W3C.) Even if you have memorized all the rules, checking a page would be tedious and error-prone – perfect for a computer! Fortunately, the W3C created an HTML validator. You can validate by supplying a URL, by uploading a file, or even copy/pasting in some HTML. Visit that page and try validating your little.html file, debugged buggy.html file, or even the file for this lecture. An HTML validator is an excellent tool to help you debug your HTML code.

Validation also helps with accessibility. One important aspect of accessibility is having the proper HTML syntax for each page in your site. Visitors with accessibility needs will use the alternative browsers and screen readers, and that software will be aided by syntactically correct HTML. Read the following for a longer discussion of why to validate your HTML pages.

Throughout the semester, if you need to validate a web page, you can find the HTML validator and others in the validators section of the CS110 documentation page.

If you haven't already, try validating little.html. You'll see that it doesn't validate. The reasons are slightly technical, so bear with us, but you'll see how to make your own documents valid.

Document Type

There are several different, incompatible HTML versions, and so a document that is valid HTML 3.2 is invalid HTMl 4.01 and vice versa. Neither would be valid XHTML 1.0. So, the validator needs to know what version of HTML you're using. More importantly, the browser or screen reader or other software needs to know what syntax to expect. Therefore, the first thing any web page needs to do is announce the syntax it's using. This is done with a special DOCTYPE (for document type) tag.

In this course, we're using HTML 4.01. The correct DOCTYPE for this is at the top of pretty much every web page in the CS110 site. It looks like this:

      <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
      "http://www.w3.org/TR/html4/strict.dtd">
    

We urge you just to copy/paste that tag to your own pages and don't worry about it further.

Charset

A browser needs to know what characters (letters, numbers, and punctuation) are in the HTML file. Are they western European characters, or Russian, Greek, Sanskrit, Korean, Japanese, Chinese, or !Kung or something else? This information is called the character set or charset for short. For our purposes, we can use something called UTF-8.

To do so, we put a <meta> tag in the <head> of our document. It looks like this:

 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Again, we urge you just to copy/paste that code and don't worry about it.

Exercise 6

Copy the DOCTYPE and CHARSET code into little.html and/or the debugged buggy.html and check that the result is now valid.

Icon Declaring Validation

Once you get your page to validate, the validator will present a bit of HTML code that you can copy/paste onto your page to give it a seal of approval, declaring that it is valid (and what standard it meets). You can see an example on the bottom of the CS110 pages, including this one.

The very cool thing about this icon is that it is clickable, and clicking it will cause the validator to process your page again. Thus, you can modify your page, upload the changes, and click the icon to re-validate it, making validation very easy. In fact, we suggest that you put the icon on your page before it's valid, and use it during your debugging process.

The snippet of code is just the following, so go ahead and copy/paste it into your pages. The code doesn't use anything we don't know, so read it!

<p>
  <a href="http://validator.w3.org/check?uri=referer"><img 
       src="http://www.w3.org/Icons/valid-html401"
       alt="Valid HTML 4.01 Strict"
       title="Valid HTML 4.01 Strict"  
       height="31" width="88">
  </a> 
</p>

Here is a link to a new version of the web page about fun CS activities with the additional HTML code needed for validation. The page does not yet validate successfully, because the <img> tag around the image of the frisbee game is not placed within a block tag such as a <p> tag - add a <p> tag to see that the page now validates for strict HTML 4.01.

Homework: Find a Well-Designed Website

We'd like you to start thinking about web design: how the pages of a site are laid out, the color and font choices, how visitors navigate around the site, and other issues that influence how you like a website. Some of this is pretty intuitive, and you probably already have some good ideas about this, so we're interested in sites that you think are well designed.

Please email your instructor and send him/her the URL of a website that you think is well designed. If you'd like, you can say why, but that's not necessary. We'll collect these and discuss them in class.

Solutions to Exercises