Thursday, February 4, 2010

Browser URL Encoding Decoding and XSS

This page has been (lightly) updated and moved to https://trustfoundry.net/browser-url-encoding-decoding-and-xss/

Cross-site scripting attacks can be very difficult to reproduce because of browser issues.  This problem is exacerbated by the fact that there is very little information regarding URL encoding and decoding.  Hopefully this will help you understand the problem and the browser oddities that can make XSS difficult to reproduce.

First, most browsers URL encode any special characters in the URL, so if you type in < in a URL, the browser converts it to %3C (as required by RFC 1738 Section 2.2). Update: Now all browser (including IE) follow the RFC. As of IE11 URL encoding no longer an attack vector (https://msdn.microsoft.com/library/bg182625(v=vs.85).aspx#utf8).

In most instances, web applications take that input and then URL decode it so they can work with the actual user input, not a URL encoded version.  This is incidentally helpful to an attacker who can only pass in URL encoded input because the application unencodes it for the attacker.  If this is the case, an attacker does not need to worry about any browsers doing URL encoding because the application will decode the attack for them.

In some instances the application will not unencode the input.  This means that the attacker needs to find a way to bypass the browser's URL encoding.  Internet Explorer before IE11 doesn't conform to RFC 1738 and passes along URLs without URL encoding it. Built-in XSS filters will commonly disable the attack, but you shouldn't rely on an browser's XSS filter to prevent XSS in your site.

A second way to prevent the browser from URL encoding the input is to use the enctype="text/plain" tag and to submit the form as a POST.  According to the Browser Security Handbook, this is supported by current versions of IE, FF, and Opera.  To use this you have to use a POST, fortunately in almost every instance you will be able to convert GET requests to POST requests.   Here is some HTML I use to submit the attack as a POST and prevent the browser from encoding it.


Another interesting oddity is that when you copy URLs out of Firefox or Chrome they are URL encoded, which can be very annoying.  To prevent this simply type a character in the URL and erase it, before you copy the URL.

There is not much information about how browsers handle URL encoding so please let me know if you find anything different or have anything to add.