Building Web Pages and using HTML Forms

In Fall 2020, many of you, but not all of you, said you knew HTML and CSS, at least a little. Some of these sections are intended to get those who don't know HTML and CSS started. They are at the bottom, since most of you won't need them. If you do, follow these links to Learning HTML and Learning CSS.

The outline for this reading is

  • HTML basics: if you know all this, you'll be fine
  • CSS basics: a few useful parts of CSS
  • FORM basics: you should know all this
  • Validity: checking that you've followed the "rules" as laid down by the World Wide Web Consortium (W3C)
  • Accessibility: an important aspect of web pages, so that they are accessible to all

Basic HTML Page

Here is a basic page with two hyperlinks to other pages. Your web applications will comprise lots of such pages and links, so everyone should feel comfortable with doing these. Do a "view source" to see the actual HTML. If there's anything you don't understand, brush up or ask. Some observations:

  • many basic structural tags: HTML, HEAD, BODY, META, LINK
  • tags like TITLE for meta information
  • page tags like H1, H2, OL, UL, LI, P
  • images via IMG
  • hyperlinks using the A tag
  • adding JavaScript with the SCRIPT tag

If you're comfortable with all of those, you're in good shape. That's not a comprehensive list (table tags would be helpful, generic containers like SPAN and DIV, etc), but it's a good start.

If not, you should read some of the material in Learning HTML

Basic CSS

The page above doesn't look great, but any cosmetics it has are due to CSS. View the CSS file to take a look. Here's a brief list of the basic CSS skills that you should know.

  • box properties such as margin, border, padding and width
  • fonts: family, size, weight, style
  • colors and background colors
  • display:block and such

There are also techniques for applying CSS, so you should know about these kinds of selectors:

  • TAG, such as BODY or H1
  • ID, such as #iconlist
  • CLASS, such as .fruit and .veggie
  • descendant selectors, such as #iconlist LI

If you understand all those, you're in excellent shape.

If not, you should read some of the material in Learning CSS

Basic Forms

Web applications don't just deliver information from the user. They also get information from the user, even if it's just guidelines for a search. Of course, your projects will do more than that, including accepting information from the user that will be stored in a database. The fundamental way web applications get information from the user is through forms, so it's important to be comfortable with those, too.

Here is an sample form. Again, you can do view source to see the underlying HTML. Feel free to test it out! This form just reflects your input back to you. A real web application might squirrel it away in a database.

Here are some of the things you should know:

  • the FORM tag and its attributes, like method and action
  • the input tag and its attributes, like type and name
  • the select tag and its child tag, option
  • the label tag

If you know those, you've got a good start on knowing forms.

If not, you should read some of the material in Learning Forms

GET vs POST

The form above used the POST method. What does that mean, and what other choices are there?

vWhen a web browser sends a request to a web server, it does the request via one of two main ways (methods): GET and POST. (There are other methods like HEAD, DELETE, and PUT, but they are relatively rare.)

You may find it helpful to think of GET versus POST as the difference between sending information by either a postcard or a letter in an envelope.

  • Both are ways to send information. GET and POST both send information from the browser to the server.
  • The general term is a request. A request can use either GET or POST.
  • They require slightly different handling by the sender and the receiver.

Ordinary web requests, like when you click on a hyperlink or load an image using the IMG tag, are all GET requests. (Not surprisingly, since the web browser is trying to GET some data.)

Either GET or POST can be used when submitting form data (you choose which using the METHOD attribute), but there are some important differences.

  • GET: all the form data is in the URL
  • POST: all the form data is in the body of the request

Let's explore the many differences between GET and POST.

Examples

For the sake of concreteness, here's an example of a GET request with a bunch of information in the URL. We have to imagine there's a form, probably for some kind of map application, such as mapquest.com. The form asks us a bunch of information about the place we want to look up, such as:

  • location name
  • address
  • latitude
  • longitude
  • geohash

After filling out the form, we submit it, and the request goes out looking like this:

GET /maps?name=Wellesley+College&address=106+central+st+wellesley+ma+02481&latitude=42.29528&longitude=-71.30667&geohash=drt297s5f5t

Host: localhost:8080
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:104.0) Gecko/20100101 Firefox/104.0
Accept: image/avif,image/webp,*/*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Referer: http://localhost:8080/upload/
Sec-Fetch-Dest: image
Sec-Fetch-Mode: no-cors
Sec-Fetch-Site: same-origin
Sec-GPC: 1
Pragma: no-cache
Cache-Control: no-cache

You can easily imagine requests with even more information in them.

You can infer some aspects of the format, just by looking at the example:

  • information is sent as name=value pairs, with the name separated from the value by an equals sign: =
  • pairs are separated from each other with ampersands: &
  • the form information is separated from the endpoint of the URL with a question mark: ?

There are additional rules, but you don't need to know any of this, because browsers will construct these URLs correctly and we'll use software (Flask, in particular) to parse them.

(For the sake of completeness, I will mention that there are additional headers sent with the request, saying things like the destination host, the kind of browser (user-agent) the kind of data we are requesting (e.g. text/html) and so forth. The complete request might look like the following. But just skim this; you don't need to know any of the details.)

GET /maps?name=Wellesley+College&address=106+central+st+wellesley+ma+02481&latitude=42.29528&longitude=-71.30667&geohash=drt297s5f5t
Host: latitude.to
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:104.0) Gecko/20100101 Firefox/104.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://duckduckgo.com/

Again, for the sake of concreteness, but not because you need to learn any of the format of a POST request, I'll give two examples of a POST request.

The first is the POST equivalent of the GET request to the maps application. You can see that it's pretty much the same, except that the verb at the beginning is POST instead of GET and the form information is after the headers, in what is known as the body of the request.

POST /maps
Host: latitude.to
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:104.0) Gecko/20100101 Firefox/104.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://duckduckgo.com/

name=Wellesley+College&address=106+central+st+wellesley+ma+02481&latitude=42.29528&longitude=-71.30667&geohash=drt297s5f5t

The second example is a much bigger POST, where we've uploaded a file along with the form. The file and form data are both in the body of the request. Again, you need not know any of these details, but sometimes it helps to see a concrete example.

POST /upload/ HTTP/1.0
Host: localhost:8080
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:104.0) Gecko/20100101 Firefox/104.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Content-Type: multipart/form-data; boundary=---------------------------233979501131065611281744094786
Content-Length: 9097
Origin: http://localhost:8080
Connection: keep-alive
-----------------------------233979501131065611281744094786
Content-Disposition: form-data; name="nm"

1451
-----------------------------233979501131065611281744094786
Content-Disposition: form-data; name="pic"; filename="1451.jpg"
Content-Type: image/jpeg

d8ffe0ff1000464a4649010000010100
01000000dbff840009000a090a080b08
090b0a0b0b0b0e0b0c100b0a130d1517
1410160f1212160e0f120f14140f1412
1318141620191e1a1819212b241c1c13
321d3322372a222501300b060b0a0e0d
0c0b0e0c0c0e100d1d0e0d14220c1514
0e17081e0c17161011100b1713100b14
...

-----------------------------233979501131065611281744094786--

Example Take-Home

I've given three examples, but told you that you can ignore most of the details. What do I want you to remember from these examples? Just a few facts:

  • GET and POST have similar but slightly different formats
  • POST has a body, while GET does not. In our metaphor, the body of the request is like the inside of the envelope. A GET request is like a postcard: it has no "inside".
  • The form data goes in the URL of a GET request, in the headers
  • The form data goes in the body of a POST request

Those are the main points to learn.

Usage

In addition to the format differences between GET and POST, there are expected difference in how they are used.

  • GET is supposed to be used when reading information from the web server
  • POST is supposed to be used when updating information on the web server

(If the distinction between reading data versus updating data reminds you of the difference between SQL queries using the SELECT statement and modifying the state of the database using INSERT/UPDATE/DELETE, that's exactly right. We'll talk about that as part of this course.)

These previous facts have some consequences:

With GET, since all the data is in the URL, and URLs aren't infinite in length, GET is unsuitable for long things, like submitting a blog post or uploading a file.

Because GET is for reading data, browsers and other computers can cache the results, rather that bothering the web server again. See below for more on caches.

Since all the data is in the URL, a GET request that retrieves something useful (helpful search results) can be bookmarked, saved in a browser history, emailed to a friend, etc. Google maps used to do this, and it was useful for sending someone a particular map or set of directions.

If there is sensitive information being submitting (e.g. SSN or credit card number), having the sensitive information end up in the URL by using the GET method should probably be avoided.

Because POST is for updating information, the browser will typically prevent you (with a warning) from re-submitting a form. This is a good thing. For example, you wouldn't want to accidentally resubmit an order to Amazon.com. But you should think about this. If re-submitting is harmless; don't use POST.

Here is a screenshot of an unnecessary warning I get when retrieving a classlist, because they used POST instead of GET:

confirm reposting of form

All I did was try to "refresh" the page, to see if there had been any change in registrations. Had they used GET, that would have been fine, but because they used POST, I have to confirm this. This confirmation would be good if I was doing something like ordering a book; I don't want to pay for two copies just because I refreshed a page.

W3Schools has a very nice, concise summary of GET vs POST

More on GET versus POST

Let's go back to the metaphor of a request as a piece of snail mail (postcard or envelope).

  • Postcards and Envelopes both have the address on the outside, where it's visible. In GET and POST, the address is the URL.
  • Postcards have all the other information on the outside as well. There is no "inside". Consequently, a postcard is limited in length. You can send "having a great time, wish you were here" using a postcard, but you can't send the long story about bumping into an old friend while touring the Uffizi Gallery in Florence, Italy.
  • Envelopes have the address on the outside and as much content as you want (and can pay postage for) inside the envelope. That's where you can put your long story, along with the pictures you took.
  • Similarly, we can use GET for short requests, and POST for long requests, especially those with payloads like pictures.
  • Furthermore, if you're sending personal or private information (your SSN, your credit card number), you'd use an envelope (POST) not a postcard (GET). This isn't perfect security, but it's better than a postcard!

(For best security, we'd use HTTPS; we'll talk about that later in the course.)

One last bit of information that may help. The metaphor breaks down a bit when we talk about how the URL contains the information. In a postcard, there's a vertical line between the "address" half of the card, and the "message" part of the card. But in a GET request, all the info is in the URL.

Suppose our web app allows us to request info about a book given its title and author. Suppose the URL for this request is getbook. The form the user fills out has, say, two fields, title and author. A request might have the following information:

name value
title The Hobbit
author Tolkien

If the web app uses a GET request, all this info has to end up in the URL. This requires an encoding step that your browser knows how to do, and your app knows how to reverse. The encoded url might be:

/getbook?title=The+Hobbit&author=Tolkien

You can see that the form data is after a question mark in the URL represented as name-value pairs separated by ampersands, with equals signs between the name and the value. There are no spaces allowed in URLs (the space got converted to a plus sign), etc. We do not have to learn these encoding/decoding rules! It's sufficient just know that they exist so that the browser can send a modest amount of not-too-complex information in a single URL. Flask will automatically decode this for us.

Accessibility

The sample form above seems good and works fine, but it has a serious flaw: it is not accessible, which means that some users (potential customers!) will have difficulty understanding it, possibly because they are visually impaired and using a screen reader or other assistive technology. These people matter, and it's our moral/ethical responsibility (and in some cases legal responsibility) to ensure that they have equal access.

What is missing is labels, which associate an input (technically called a control) such as the name field or the SELECT menu, with a bit of text that explains it. This is done with the label tag. Here's more on label.

This is important because when the user is filling out the form and wants to know what an input is asking, the screen reader can read the associated label text.

There are two ways to use label, the simple, structural way, and the flexible, id-based way. Let's look at both:

LABEL using Structure

The simple structural way is to put the label text next to the input and wrap both with the label:

<label>zip code: <input name="zip"></label>

which looks like:

LABEL using ID

If we use ID instead of structure, we can put the label and the control in different places in the page, connecting them using an ID. This is more flexible, though a bit more complex. It's done by giving the control an ID, and use the for attribute of the LABEL to specify the ID of the element it labels. Here's an example where we wanted to put the form inputs in a table column, with the text labels in another column:

<table>
<tr><td><label for="state-elt">state</label></td>
        <td><input id="state-elt" name="state"/></td></tr>
<tr><td><label for="zip-elt">zip</label></td>
        <td><input id="zip-elt" name="zip"/></td></tr>
</table>

which looks like:

Notice that because the inputs are in different cells of a table, the label can't "wrap" (surround) the input as in the structural technique.

Adding these necessary improvements to the sample form yields: sample form improved.

Testing Accessibility

Accessibility is a big complex subject, and professional websites have trained developers, automated tools, and human testers to ensure accessibility. All that is outside the scope of this course, but I will introduce three important tools and require you to use them.

All HTML must be valid, which means it satisfies the structural rules set out by the WWW consortium (W3C). Valid HTML is important because screen readers and such can do a better job understanding the structure of the page if it follows the rules. Most browsers are much more forgiving, so don't assume that if it looks good in a browser that it's good.

You can validate your HTML using this website from the W3C: https://validator.nu/. That site works in three modes: you can give it a URL and it'll retrieve the page using the URL and validate it. The .htaccess barrier that we set up (see the aside about access, above) allows access for the validator, so it will be able to validate your pages. The other modes are file upload and direct input. The last mode allows you to just copy/paste your HTML from your browser to the validator, which only takes a minute and is very easy. That will be necessary when you are developing with Flask, since the ports that we are using aren't accessible outside the campus firewall.

An additional way to test the accessibility is to add a link (usually wrapped around a "badge" that the page is accessible) such that the validator checks the validity of the page that was linked from (the referrer). You can test that with the sample forms linked from this page; you'll see the badge icon at the bottom. However, we won't be able to use that technique in our Flask applications, due to the firewall issues.

All CSS (see below) must be valid, for similar reasons as the HTML.

The W3C also provides a CSS validator that works the same way as the HTML validator: https://jigsaw.w3.org/css-validator

Check the page for common accessibility issues with the WAVE, the Web Accessibility Evaluation tool, https://wave.webaim.org/. It's important to note that getting no errors from the WAVE tool doesn't mean your site is accessible — only a person can decide that — but it's a useful tool nevertheless. Not passing the WAVE test is certainly undesirable.

Like the earlier validators, WAVE can retrieve a publicly hosted page given its URL and evaluate it. It doesn't have a mode where you can copy/paste your code, but there are two browser plugins that will evaluate the page in your browser. In less than 1 minute, I installed the Chrome plug-in, viewed my page page, and evaluated it. See screenshot below, showing four errors, all of which are fixed in the improved form.

The result of running the WAVE tool on the sample form
Running the WAVE tool on the sample form shows four errors, flagged in red

I recommend that you:

  • install the WAVE plugin to your browser (Chrome or Firefox)
  • run it on all three of the sample web pages (home.html, form.html and form-improved.html).
  • practice using the HTML and CSS validators, too.

Requirements

In this course, I require that

  • All HTML pages pass the HTML validator
  • All CSS pass the CSS validator
  • All pages have no errors in the WAVE validator

Caches

Caches are an important part of the web browser and therefore the web developer. In the examples above, we often made use of external CSS files and external JavaScript file. That allows two pages to share those common files. But one of the great advantages of external style sheets and external JavaScript files is also, for a web developer, a slight bother. To understand that, you have to understand caches.

The word “cache” (pronounced “cash”) is an ordinary, but uncommon, English word (more of an SAT word for most people). However, it's used all the time by computer scientists because caches are used all the time by computers, in all kinds of ways, because caching is a general technique for speeding things up. In particular, a cache speeds things up by storing to avoid re-doing.

Specifically, your web browser will cache (keep a copy of) files you visit or reference (images, external CSS, JS libraries, etc) in a folder on your local machine. (That folder is called, of course, the cache.) If the web browser needs that file again, say on another page of your site that uses the same external style sheet, the browser doesn't have to re-download the file; it just grabs a copy from the cache. This makes the web browser faster.

So, why is the browser cache a problem for a web designer? Because if you make a change to the external style sheet, the web browser may continue to use the old cached copy, instead of getting the new improved copy from the server. This means that when you view your page, you won't see your changes — very frustrating.

The solution is to tell the web browser to ignore the cache when you re-load the page. In most web browsers, this is done by holding down the shift key when you click on the reload icon.

So, just remember:

When in doubt, use shift+reload

Conclusion and Summary

This reading has had many facets because HTML, CSS, and Forms are so important to this course, while also being topics that many (but not all) of you know something about, so they don't deserve a lot of our time. Nevertheless, you should know:

  • the basics of using HTML to create the structure of a web page
  • the basics of CSS for styling a page
  • the basics of forms, so that you can collect information from the user
  • how to use validators to check that you've used HTML and CSS correctly
  • how to make your page, particularly your forms, reasonably accessible
  • how caches affect our debugging of web apps
  • the distinction between GET and POST

If you're feeling good about all of those, you can stop here. If you need more introduction to HTML, CSS or Forms, keep reading.

Learning HTML

HTML is a relatively simple language, compared to programming languages like Python or JavaScript, or even SQL. It is a markup language, which means it's just about structure: this is part of that and so forth.

The people in this class are, I sincerely believe, capable of teaching themselves HTML in short order. There are many online tutorials, of course, which you can find with a quick web search. Our own CS110 web site has a lot of information on writing web pages in HTML. If you're starting from scratch, I recommend reading the following pages:

  • HTML. 21 pages. This talks generally about syntax of tags and URLs.
  • Here is the MDN (Mozilla Developer Network) HTML Tutorial, with links to beginning HTML and also to forms
  • Here is the W3Schools HTML tutorial
  • Tables This teachs you about tables in HTML, which is quite useful for formatting, um, tables. Since many of our query results are tabular, this can sometimes be useful for us.

Note that you must know HTML, not just how to build a web page with some nice software like Dreamweaver that writes the HTML for you. Dreamweaver and that ilk are terrific, but they're no good for our purposes, because our scripts will be writing the HTML for the results of queries. That means we need to understand HTML. (To be fair, we'll be writing templates, so you can work with something like Dreamweaver to create the overall HTML, but then you'll need to be able to edit the HTML to turn it into a template.)

Learning Forms

The following reading introduces FORMs to those who know some HTML but not forms.

  • Forms. 9 pages This talks about forms, which are crucial for web applications.
  • MDN on Forms This is the Mozilla Developer's Network introduction to forms. If you master all of that, you'll know more than I do, but the first few parts should be sufficient.

The following is an alternative introduction. You might start with this and then go back to the links in the list above if you want more information:

Basic Page with a Form

Here is a page with a form, where the form just "echoes" the form data back to you, the user. Keep this "echoing" script in mind, because while it is useless in deployed systems, it can be very useful in debugging your web application, by clarifying whether the problem is in processing the form data or in getting the form data to the server.

Here is the code for just the form:

  <form action="/~cs304/php/form-echo-html.php"
        method="get">
    <!-- modern browsers will insist that
         "required" elemements are non-empty -->
    <p><label>stimulus:
            <input required
                   type="text" name="stimulus">
    </label></p>
    <p><label>response:
            <input required
                   type="text" name="response">
    </label></p>
    <p><label>reason:
      <select required name="reason">
        <!-- invalid option must have an empty
             value for "required" to work -->
        <option value="">choose reason
        <option>just 'cuz
        <option>none o' your bizness
        <option>I dunno
      </select></label></p>
    <p><label>Why: <br>
            <textarea required name="why" rows="5" cols="30"></textarea>
    </label></p>
    <p><input type="submit">
  </form>

I have omitted the CSS to style the form; you can look in the source code for the form-echo.html page for that info.

Form Structure

The form tag is a container, meaning that one important aspect of its function is to enclose a set of inputs. These inputs are, essentially, name/value pairs. Thus, we can think of the form above as producing something like:

namevalue
stimulusstrange
responsecharm
reasonI dunno
why

It could be for lots of reasons, but maybe it's my deep and lasting regard for particle physics

In the table above, the name column is defined by the form author, and the values in the value column are typed or chosed by the user.

The form tag also specifies two other important things:

  • the action attribute specifies where the data is to go when the form is submitted. That is, this is the server-side script that will process the data.
  • the method attribute specifies how the data will be sent:
    • the GET method (the default) encodes the data and tacks it onto the URL following a question mark. The server-side script then parses this URL and decodes the form input.
    • the POST method encodes the data and supplies it as additional input following the URL of the request. The server-side script can then read the data from standard input (like Java's System.in.readln).

A form should also have a submit button, or the user won't be able to send the data to the server. (It's possible, of course, to have other mechanisms to trigger form submission, say by processing the enter key or some JavaScript form submission, but you will usually have a submit button.)

One important thing to note about forms: a page can have as many forms as it needs, but forms don't nest (unlike some, but not all, HTML elements). The reason for that is that the collection of name/value pairs within the form element are sent to the target of the action attribute of the form element, and nested forms would make that confusing.

Learning CSS

Making your website pretty is useful as well as nice, because something that looks good and is well laid out is often easier to use. Still, the amount of CSS you could learn is certainly greater than the amount you need to learn. Here are links to the first few CSS readings in CS 110:

Some reading for later, if ever.

Apache and public_html

In Fall 2020, we'll be combining all our pages with Flask, so you can skip reading this section. But you're welcome to read this if you're curious.

So far in this course, we've been working in your ~/cs304 folder, which works fine for Flask and will continue to work well. However, if we want our web pages delivered by Apache rather than Flask, we need to to put them in our ~/public_html folder. That folder, and only that folder, is read by Apache and consists of our globally accessible web pages.

You can download all these examples to the public_html of your Tempest account like this:

cd ~/public_html/ 
cp -r ~cs304/pub/downloads/forms . 

You are welcome to learn from my example, but don't use it lock, stock and barrel for your assignment. Prepare yourself for the project by creating an interesting and relevant form.