on the client's intranet, including all legacy apps; essentially, the applications include this component by inserting a script tag in their pages. The component's most basic job is to help the user navigate to the existing multitude of intranet applications, supplying these applications with appropriate input (e.g., navigate to the customer mangagement app with a customer reference as input).
Now these legacy applications are of different ages and have been developed in various technologies, with varying quality and standards-awareness. Now these requirements brings a bunch of problems, and I won't start complaining too much here. Instead I'll just focus on one issue: character encodings! One
of the aspects of this scenarios is that the applications are using different character-encodings, e.g., ISO-8859-1 and UTF-8. This gives the following requirements.
- Cross page (cross encoding) navigation. For example, our client app often has to navigate from a UTF-8 encoded page to an ISO-8859-1 encoded page, transferring properly encoded parameters from the first page to the second.
You'd expect these things to be pretty basic stuff for an HTTP based app; something that would be easy in a version 7 browser.
Cross page (cross encoding) navigation
For various reasons irrelevant here, our client app navigates to an application by dynamically creating a form with method "GET", and plugging in the URL and the parameters for the receiving application, i.e.,
<form method="GET" action="URL">
<input type="hidden" name="pname" value="pvalue" >
There is an implicit attribute called 'accept-charset' which according to HTML 4.01 satisfies
"This attribute specifies the list of character encodings for input data that is accepted by the server processing this form. The value is a space- and/or comma-delimited list of charset values. The client must interpret this list as an exclusive-or list, i.e., the server is able to accept any single character encoding per entity received. The default value for this attribute is the reserved string "UNKNOWN". User agents may interpret this value as the character encoding that was used to transmit the document containing this FORM element."
The problem with IE7 is that it ignores the accept-charset property entirely on forms
. In fact, it instead does the default behaviour for "UNKNOWN", i.e., "the character encoding that was used to transmit the document containing this form element." Actually, to be precise it doesn't even do that either: It turns out the IE7 encodes the form according to the current value of document.charset
. Fortunately it is possible to set document.charset
just before the form submits. And this works. Problem solved. Kind of …
Unfortunately, this hack turns out to trigger a somewhat strange and nasty bug: After form submission, IE navigates to the desired page and input data is encoded according to document.charset
, which is just what we want. However, if you click the 'back'-button on the next page, IE navigates back to the original page, but the document.charset property somehow persists
which means that the page is not interpreted with the correct encoding. OK, so we should clean up after ourselves: One would think that restoring the document.charset
to it's original value after calling form.submit()
would work: it doesn't.
After some experimentation I found that the following works. To submit a form with input data encoded in a particular encoding:
- on document load, store the original page encoding
- on form submit change the value of document.charset to the desired encoding, e.g., "ISO-8859-1"
- on 'beforeunload' restore document.charset to it's original value.
Script character encodings
Accept : */*
Referer : http://localhost:10045/wps/portal/sn
Accept-Language : da
UA-CPU : x86
Accept-Encoding : gzip, deflate
User-Agent : Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.1; MS-RTC LM 8; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)
Connection : Keep-Alive
Cache-Control : no-cache
Furthermore, even if we explicitly tell IE which encoding we have chosen, it will still ignore that and use the page encoding. More specifically, even if we return this HTTP:
IE ignores the Content-Type charset. In other words: if we choose as a default (when no Accept-Charset header is present) to deliver the script in UTF-8 encoded, it breaks for ISO-8859-1 pages, and vice versa. So what to do? IE apparently doesn't speak HTTP.
It turns out that one should use a script tag with a charset property to tell IE how to interpret the script. Ironically this means that on UTF-8 encoded pages one would need to say <script charset="ISO-8859-1">. In other words, it is not the HTTP headers describing the resource that describes its encoding. This must be know in advance by all documents that link to the resource by embedding a charset property. Yet another reason that these are tough times to be RESTful…