AF8 File Not Found Errors
While doing a
Google search, I came across the FAQ page (
http://www.boutell.com/wusage/faq.html#wusage.af8.error)
which refers to a "AF8 404 Not Found" error.
After many month
of error-free operation, I too started encountering these "AF8"
document not found errors. I was confused about them because none of my
documents contain the sequence "AF8". I believe I have found the
answer after some research and experimentation. As background, I run an Apache
1.3.20 server on
SuSE Linux 7.3 (2.4.10 kernel), with my browser being
Internet Explorer 5.5 on Windows 2000 Professional.
Some of my HTML documents do NOT contain a character encoding meta-tag, such as
<meta http-equiv="charset" content="iso-8859-1">
since they were
auto-generated bya program. These documents simply contained the bare
<html></html> tags:
<html>
<head>[...]</head>
<body>[...]</body>
</html>
Unfortunately, if
the
http-equiv meta-tag does NOT exist, then IE 5.5 will attempt deduce the
document encoding using its "Auto-Select" feature (appearing in the
drop down menu under "View/Encodings").
On a subset of
these "bare" HTML documents, IE will decide that the document is in
UTF-7 encoding (see
http://www.landfield.com/rfcs/rfc2152.html for
definition of UTF-7). Figuring out why this happened only on a small
subset of these "bare" HTML documents required some detective work,
but it appears that the common pattern is that these documents contain a
sequence of characters of the form:
+[...]-
The "+"
symbol is an escape character in the UTF-7 encoding scheme that indicates the
start of UTF-7 encoding. The "-" symbol indicates the end of a UTF-7
sequence. The presence of these characters confuses the browser into thinking
that the document is encoded in UTF-7.
For document
rendering purposes, the problem does not make itself known, because the
spurious "+[..]-" sequence usually doesn't translate into a valid
UTF-7 character, so the browser will display these characters literally, as
is.
The problem
appears in hyperlinks that originate from this document. IE 5.5 assumes that
the document is in UTF-7, so it translates all hyperlinks on the document to
UTF-7 format before sending it to the HTTP server. Some ASCII characters in
the <a> tag become translated into UTF-7 format. For example, if the
document contains the HTML fragment
<a href="http://server/some_link_with_underscore">Some Link With Underscore</a>
then IE 5.5 converts the HREF into the UTF-7 sequence:
http://server/some+AF8-link+AF8-with+AF8-underscore
because in UTF-7,
the "underscore" character is "+AF8-".
The solution is
to include the character set meta-tag in every document. For most English
documents, this should be
<meta http-equiv="charset" content="iso-8859-1">
or
<meta http-equiv="charset" content="utf-8">
"iso-8859-1"
is the "Latin-1" character set, also known as "Western
European", and "UTF-8" is another encoding of the Unicode
character set (a sibling of UTF-7; see
http://www.w3.org/International/O-charset.html for
more details; see
http://www.unicode.org/ for
a description of Unicode). Both encoding schemes are backwards compatible with
ASCII. That is to say, that all ASCII characters are also iso-8859-1
characters, and all ASCII characters are also UTF-8 characters.
--
BrianPark - 23 Mar 2004