Internationalization

Internationalization is important in modern web-applications, because we (typically) want our apps to not focus narrowly on a particular demographic, but to be available and useable by everyone in the world.

Motivation

The web allows us to deliver web applications globally, but its accessibility world-wide has to do with a lot more than connecting network switches. It's easy to get stuck thinking just about your own culture, but we can try to do better. This effort is called internationalization and localization, which have the wonderful abbreviations of i18n (the letter i, 18 letters, and the letter n) and l10n.

The main ideas I'd like us to discuss in class are:

  • Names: In the U.S. we're used to asking for first name, last name, and (sometimes) middle name. That doesn't work in all parts of the world. Please read this essay about Hispanic last names.
  • Character sets: Not all languages can be handled by the keys on a U.S. keyboard or the characters in the the ASCII character set. If you don't know about character sets, you can get into weird situations where strings just don't print properly. We'll talk about Unicode and UTF-8. How do we store these strings in our databases?
  • Collation: How to sort international character sets correctly. We won't talk much about this, but it's good to have the idea
  • Dates: Are 5/6/14 and 6.5.14 the same date? Different cultures format dates differently. If you want people to show up on the right day, you'll want to do this correctly.
  • Times: What is 14:00?
  • Number Formats: Do 1,234.56 and 1.234,56 mean the same thing?

Character Sets

The two best articles I've run across are the following. Please read both:

  • Joel Spolsky's Unicode introduction. This is terrific, well-written and even amusing. Spolsky is the co-founder and CEO of Stack Exchange, so we should all know his name. It's about 9 pages long; well worth your time.
  • Python Unicode HOWTO. Encodings in Python. You don't have to read all of this; you can stop when you get to the section on Unicode literals in Python Source Code.