Go to larve.net home page

Why serving XHTML documents as US-ASCII?

I have converted all the documents on larve.net from ISO-8859-1 to US-ASCII, and I now serve the documents as US-ASCII:

Host: larve.net

HTTP/1.1 200 OK
Date: Thu, 10 Aug 2000 04:53:40 GMT
Content-Length: 2271
Content-Location: http://larve.net/index.html
Content-Type: text/html;charset=us-ascii
Etag: "15i41be:s38muof0"
Last-Modified: Sun, 06 Aug 2000 18:37:32 GMT
Server: Jigsaw/2.1-20000809 jre/1.2.2_006 javacomp/1.2.15

Why did I go through the hassle of:

Most of my documents as of August 2000 are written in XHTML 1.0, i.e. in XML. XML's default character encoding is UTF-8 or UTF-16. Therefore, if you want to use another encoding, you need to specify it using a processing instruction.

Unfortunately, IE 4.5 and 5 on Macintosh doesn't like that very much, so using a processing instruction at the top of an XHTML document makes the document inaccessible to a lot of Mac users. Since US-ASCII is a subset of UTF-8, serving the documents as US-ASCII and not putting a processing instruction is legit.

The other reason for using US-ASCII is that you should preferably use the smallest character set for your encoding, and most of my documents served as ISO-8859-1 were actually encoded as US-ASCII, and I can live with US-ASCII without any problem.

I do not serve the documents as UTF-8 yet because of the reason I ust gave and also because not all the browsers support UTF-8 yet. Moreover, Netscape's font for UTF-8 under Linux is kind-of ugly (I have been told that it is basically bigger because UTF-8 has more characters, which makes sense).

If you have comments about this policy, feel free to send an email to me at hugo@larve.net.

When I did that, I wrote a very small program telling me if a file has characters out of the US-ASCII character set. It is trivial to do, but it's not worth doing it twice so I am making it available here: is-us-ascii.pl.

Hugo Haas
$Id: us-ascii.html 678 2000-09-26 20:23:52Z hugo $