IE doesn't understand HTML or HTTP

The so-called "web"-browser Internet Explorer version 7 (and probably all versions below) doesn't get HTML or HTTP. In a recent project for a client we've been building an advanced, 100% JavaScript client application that had to run in IE7 (as a minimal requirement). The application is rather complex and has to live as a component in every (darn) webpage/webapp on the client's intranet, including all legacy apps; essentially, the applications include this component by inserting a script tag in their pages. The component's most basic job is to help the user navigate to the existing multitude of intranet applications, supplying these applications with appropriate input (e.g., navigate to the customer mangagement app with a customer reference as input). Now these legacy applications are of different ages and have been developed in various technologies, with varying quality and standards-awareness. Now these requirements brings a bunch of problems, and I won't start complaining too much here. Instead I'll just focus on one issue: character encodings! One of the aspects of this scenarios is that the applications are using different character-encodings, e.g., ISO-8859-1 and UTF-8. This gives the following requirements.

Cross page (cross encoding) navigation. For example, our client app often has to navigate from a UTF-8 encoded page to an ISO-8859-1 encoded page, transferring properly encoded parameters from the first page to the second.
Localization. The client app is available in various languages; together with the requirement that it must exist in UTF-8 and ISO-8859-1 encoded pages, this means that the actual JavaScript being sent on the wire must be available in different languages and encodings, based on the browsers settings.

You'd expect these things to be pretty basic stuff for an HTTP based app; something that would be easy in a version 7 browser.

Cross page (cross encoding) navigation

For various reasons irrelevant here, our client app navigates to an application by dynamically creating a form with method "GET", and plugging in the URL and the parameters for the receiving application, i.e.,


<form method="GET" action="URL">
<input type="hidden" name="pname" value="pvalue" >
…
</form>

There is an implicit attribute called 'accept-charset' which according to HTML 4.01 satisfies:

"This attribute specifies the list of character encodings for input data that is accepted by the server processing this form. The value is a space- and/or comma-delimited list of charset values. The client must interpret this list as an exclusive-or list, i.e., the server is able to accept any single character encoding per entity received. The default value for this attribute is the reserved string "UNKNOWN". User agents may interpret this value as the character encoding that was used to transmit the document containing this FORM element."

The problem with IE7 is that it ignores the accept-charset property entirely on forms. In fact, it instead does the default behaviour for "UNKNOWN", i.e., "the character encoding that was used to transmit the document containing this form element." Actually, to be precise it doesn't even do that either: It turns out the IE7 encodes the form according to the current value of document.charset. Fortunately it is possible to set document.charset using JavaScript, this means that it is possible to control the encoding of form input data by setting document.charset just before the form submits. And this works. Problem solved. Kind of … Unfortunately, this hack turns out to trigger a somewhat strange and nasty bug: After form submission, IE navigates to the desired page and input data is encoded according to document.charset, which is just what we want. However, if you click the 'back'-button on the next page, IE navigates back to the original page, but the document.charset property somehow persists which means that the page is not interpreted with the correct encoding. OK, so we should clean up after ourselves: One would think that restoring the document.charset to it's original value after calling form.submit() would work: it doesn't. After some experimentation I found that the following works. To submit a form with input data encoded in a particular encoding:

on document load, store the original page encoding
on form submit change the value of document.charset to the desired encoding, e.g., "ISO-8859-1"
on 'beforeunload' restore document.charset to it's original value.

Script character encodings

Our JavaScript client app sometimes lives in a UTF-8 encoded page, sometimes in an ISO-8859-1 encoded page. The client app is loaded by including a <script> tag that points to a servlet that delivers the JavaScript code (customized for the requesting user and his preferred locale). By default when IE loads script, interprets the bytes it receives as characters in the encoding of the current page. However, it doesn't tell the server which encoding that is. More specifically, our servlet receives a request for the JavaScript client with these HTTP headers:


Accept : */* 
Referer : http://localhost:10045/wps/portal/sn
Accept-Language : da
UA-CPU : x86
Accept-Encoding : gzip, deflate
User-Agent : Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.1; MS-RTC LM 8; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)
Connection : Keep-Alive
Cache-Control : no-cache

Furthermore, even if we explicitly tell IE which encoding we have chosen, it will still ignore that and use the page encoding. More specifically, even if we return this HTTP:


Content-Type: application/javascript; charset=UTF-8

IE ignores the Content-Type charset. In other words: if we choose as a default (when no Accept-Charset header is present) to deliver the script in UTF-8 encoded, it breaks for ISO-8859-1 pages, and vice versa. So what to do? IE apparently doesn't speak HTTP. It turns out that one should use a script tag with a charset property to tell IE how to interpret the script. Ironically this means that on UTF-8 encoded pages one would need to say <script charset="ISO-8859-1">. In other words, it is not the HTTP headers describing the resource that describes its encoding. This must be know in advance by all documents that link to the resource by embedding a charset property. Yet another reason that these are tough times to be RESTful…

Higher Order Blog