Encoding problem with Metadata Attribute Extractor
Basics
Technical
Logistics
Basics
Technical
Logistics
Description
We discovered that there is an problem when using a Metadata Attribute Extractor [1] with values (like the MDUI DisplayName) containing non-ASCII characters. If there is an MDUI element like:
<mdui:UIInfo> <mdui:DisplayName xml:lang="de">Universität Bern</mdui:DisplayName> <mdui:DisplayName xml:lang="en">University of Bern</mdui:DisplayName> ... </mdui:UIInfo>
and if the user's browser sends a German or no Accepted-Languages header, the SP will use the first of the above two DisplayNames to populate the attribute in the environment. However, this currently results in the following error in native.log:
2013-01-24 14:56:41 ERROR XMLTooling.ParserPool [11234] shib_check_user: fatal error on line 1, column 4542, message: invalid character 0x1A
The invalid (umlaut) character then is the "ä" in "Universität", which is UTF-8 encoded.
Tested successfully. Interestingly, older Apache versions can't handle UTF-8 in environment variables (headers work). Newer versions are showing the correct data for either method.
Scott Cantor
May 15, 2013 at 9:18 PM
Sigh, no idea what I was thinking, the code is using local code page for this data extraction.
We discovered that there is an problem when using a Metadata Attribute Extractor [1] with values (like the MDUI DisplayName) containing non-ASCII characters. If there is an MDUI element like:
<mdui:UIInfo>
<mdui:DisplayName xml:lang="de">Universität Bern</mdui:DisplayName>
<mdui:DisplayName xml:lang="en">University of Bern</mdui:DisplayName>
...
</mdui:UIInfo>
and if the user's browser sends a German or no Accepted-Languages header, the SP will use the first of the above two DisplayNames to populate the attribute in the environment. However, this currently results in the following error in native.log:
2013-01-24 14:56:41 ERROR XMLTooling.ParserPool [11234] shib_check_user: fatal error on line 1, column 4542, message: invalid character 0x1A
The invalid (umlaut) character then is the "ä" in "Universität", which is UTF-8 encoded.
[1] https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPAttributeExtractor#NativeSPAttributeExtractor-MetadataAttributeExtractorVersion25andAbove