Encoding problem with Metadata Attribute Extractor

Description

We discovered that there is an problem when using a Metadata Attribute Extractor [1] with values (like the MDUI DisplayName) containing non-ASCII characters. If there is an MDUI element like:

<mdui:UIInfo>
<mdui:DisplayName xml:lang="de">Universität Bern</mdui:DisplayName>
<mdui:DisplayName xml:lang="en">University of Bern</mdui:DisplayName>
...
</mdui:UIInfo>

and if the user's browser sends a German or no Accepted-Languages header, the SP will use the first of the above two DisplayNames to populate the attribute in the environment. However, this currently results in the following error in native.log:

2013-01-24 14:56:41 ERROR XMLTooling.ParserPool [11234] shib_check_user: fatal error on line 1, column 4542, message: invalid character 0x1A

The invalid (umlaut) character then is the "ä" in "Universität", which is UTF-8 encoded.

[1] https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPAttributeExtractor#NativeSPAttributeExtractor-MetadataAttributeExtractorVersion25andAbove

Environment

None

Activity

Scott Cantor 
June 18, 2013 at 2:24 AM

Closing on release.

Scott Cantor 
May 16, 2013 at 8:36 PM

Tested successfully. Interestingly, older Apache versions can't handle UTF-8 in environment variables (headers work). Newer versions are showing the correct data for either method.

Scott Cantor 
May 15, 2013 at 9:18 PM

Sigh, no idea what I was thinking, the code is using local code page for this data extraction.

Fixed

Details

Assignee

Reporter

Original estimate

Components

Fix versions

Affects versions

Created January 25, 2013 at 8:55 AM
Updated June 18, 2013 at 2:24 AM
Resolved May 16, 2013 at 8:36 PM