Support non utf8 encoding of HTTP headers injected by the SP

Basics

Technical

Logistics

Basics

Technical

Logistics

Description

The SP needs a control for the encoding to used (currently utf8 by default) to encode header values injected by the SP. The problem caused by utf8 is that certain servlet 2.3 containers implement the (silly) requirement that all headers must be encoded using ISO-8859-1 (!).

Environment

None

Activity

Show:

Scott CantorMay 10, 2007 at 11:49 AM

Well, I think Chad has noted that it's not that simple. The rule appears to be that the header encoding depends on the content encoding, and that makes it very difficult to fix. The browser can send any encoding, so you can't just pick one and expect it to work.

Technically, I would have to be able to select the encoding based on the request, and that would have bad ripple effects, or I'd have to transcode the cached data every request, which is really slow.

For the time being, it would help to know if Chad's suggestion of setting the encoding manually to UTF8 "works", at least in general. Not that that's the answer, but it would help us understand the problem.

I fear there's no real way to solve this in general, and we're stuck picking a bad solution and/or relying on the Java SP as an eventual better solution. But if I just started using ISO-8859 as an option, not only would you still lose many Unicode characters, but you still would break if the browser set a different encoding.

Leif JohanssonMay 10, 2007 at 3:06 AM
Edited

Look in connectors/util/java/org/apache/tomcat/util/buf/ByteChunk.java (relative to the tomcat 5.5.17 source):

/** Default encoding used to convert to strings. It should be UTF8,
as most standards seem to converge, but the servlet API requires
8859_1, and this object is used mostly for servlets.
*/
public static final String DEFAULT_CHARACTER_ENCODING="iso-8859-1";

When we changed this to utf-8 headers with non-7-bit ascii from the SP turn up ok.

Leif JohanssonMay 10, 2007 at 2:32 AM
Edited

Yes clearly it is a very bad situation, however currently for strict v2.3 servlet containers (like some recent versions of tomcat) utf8 headers from the SP get encoded into utf8 (again) which looses all codepoints I'll try to get you a precise reference.

Scott CantorMay 9, 2007 at 10:06 AM

Can you give us a reference to the specification language? I just want to make sure I understand the situation fully before I change something. ISO-8859 would destroy a lot of Unicode code points if I used it, and you wouldn't get them back inside the Java code, so this doesn't exactly seem like a solution.

Details
Assignee
Scott Cantor
Reporter
Leif Johansson
Components
Attribute Resolution / Filtering
Affects versions
1.3f

Created May 9, 2007 at 5:29 AM

Updated June 22, 2021 at 8:38 PM

Support non utf8 encoding of HTTP headers injected by the SP

Description

Environment

Activity

Scott CantorMay 10, 2007 at 11:49 AM

Leif JohanssonMay 10, 2007 at 3:06 AMEdited

Leif JohanssonMay 10, 2007 at 2:32 AMEdited

Scott CantorMay 9, 2007 at 10:06 AM

DetailsAssigneeScott CantorScott CantorReporterLeif JohanssonLeif JohanssonComponentsAttribute Resolution / FilteringAffects versions1.3f

Details

Assignee

Reporter

Components

Affects versions

Leif JohanssonMay 10, 2007 at 3:06 AM
Edited

Leif JohanssonMay 10, 2007 at 2:32 AM
Edited

Details
Assignee
Scott Cantor
Reporter
Leif Johansson
Components
Attribute Resolution / Filtering
Affects versions
1.3f