|
|
| Home > CAPEC List > Individual CAPEC Dictionary Definition (Release 1.1) | View the CAPEC List |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Individual CAPEC Dictionary Definition (Release 1.1)
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Using UTF-8 Encoding to Bypass Validation Logic | |||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Attack Pattern ID | Pattern Abstraction: Detailed 80 | ||||||||||||||||||||||||||||||||||||
| Typical Severity | High | ||||||||||||||||||||||||||||||||||||
| Description | Summary This attack is a specific variation on leveraging alternate encodings to bypass validation logic. This attack leverages the possibility to encode potentially harmful input in UTF-8 and submit it to applications not expecting or effective at validating this encoding standard making input filtering difficult. UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. Legal UTF-8 characters are one to four bytes long. However, early version of the UTF-8 specification got some entries wrong (in some cases it permitted overlong characters). UTF-8 encoders are supposed to use the ``shortest possible'' encoding, but naive decoders may accept encodings that are longer than necessary. According to the RFC 3629, a particularly subtle form of this attack can be carried out against a parser which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain illegal octet sequences as characters. Attack Execution Flow
| ||||||||||||||||||||||||||||||||||||
| Attack Prerequisites | The application's UTF-8 decoder accepts and interprets illegal UTF-8 characters or non-shortest format of UTF-8 encoding. Input filtering and validating is not done properly leaving the door open to harmful characters for the target host. | ||||||||||||||||||||||||||||||||||||
| Typical Likelihood of Exploit | High | ||||||||||||||||||||||||||||||||||||
| Methods of Attack |
| ||||||||||||||||||||||||||||||||||||
| Examples-Instances | Description Perhaps the most famous UTF-8 attack was against unpatched Microsoft Internet Information Server (IIS) 4 and IIS 5 servers. If an attacker made a request that looked like this—http://servername/scripts/. Related Vulnerability CVE-2000-0884 | ||||||||||||||||||||||||||||||||||||
| Attacker Skill or Knowledge Required | Low - an attacker can inject different representation of a filtered character in UTF-8 format. Medium - an attacker may craft subtle encoding of input data by using the knowledge that she has gathered about the target host. | ||||||||||||||||||||||||||||||||||||
| Probing Techniques | Attacker may try to inject dangerous characters using UTF-8 different representation using (example of invalid UTF-8 characters). The attacker hopes that the targeted system does poor input filtering for all the different possible representations of the malicious characters. Malicious inputs can be sent through an HTML form or directly encoded in the URL. The attacker can use scripts or automated tools to probe for poor input filtering. | ||||||||||||||||||||||||||||||||||||
| Indicators-Warnings of Attack | A web page that contains overly long UTF-8 codes constitute a protocol anomaly, and could be an indication that an attacker is attempting to exploit a vulnerability on the target host. A attacker can use a fuzzer in order to probe for a UTF-8 encoding vulnerability. The fuzzer should generate suspicious network activity noticeable by an intrusion detection system. An IDS filtering network traffic may be able to detect illegal UTF-8 characters. | ||||||||||||||||||||||||||||||||||||
| Obfuscation Techniques | According to OWASP, sometimes cross-site scripting attackers attempt to hide their attacks in Unicode encoding. | ||||||||||||||||||||||||||||||||||||
| Solutions and Mitigations | The Unicode Consortium recognized multiple representations to be a problem and has revised the Unicode Standard to make multiple representations of the same code point with UTF-8 illegal. The UTF-8 Corrigendum lists the newly restricted UTF-8 range (See references). Many current applications may not have been revised to follow this rule. Verify that your application conform to the latest UTF-8 encoding specification. Pay extra attention to the filtering of illegal characters. The exact response required from an UTF-8 decoder on invalid input is not uniformly defined by the standards. In general, there are several ways a UTF-8 decoder might behave in the event of an invalid byte sequence: 1. Insert a replacement character (e.g. '?', '�'). 2. Ignore the bytes. 3. Interpret the bytes according to a different character encoding (often the ISO-8859-1 character map). 4. Not notice and decode as if the bytes were some similar bit of UTF-8. 5. Stop decoding and report an error (possibly giving the caller the option to continue). It is possible for a decoder to behave in different ways for different types of invalid input. RFC 3629 only requires that UTF-8 decoders must not decode "overlong sequences" (where a character is encoded in more bytes than needed but still adheres to the forms above). The Unicode Standard requires a Unicode-compliant decoder to "…treat any ill-formed code unit sequence as an error condition. This guarantees that it will neither interpret nor emit an ill-formed code unit sequence." Overlong forms are one of the most troublesome types of UTF-8 data. The current RFC says they must not be decoded but older specifications for UTF-8 only gave a warning and many simpler decoders will happily decode them. Overlong forms have been used to bypass security validations in high profile products including Microsoft's IIS web server. Therefore, great care must be taken to avoid security issues if validation is performed before conversion from UTF-8, and it is generally much simpler to handle overlong forms before any input validation is done. To maintain security in the case of invalid input, there are two options. The first is to decode the UTF-8 before doing any input validation checks. The second is to use a decoder that, in the event of invalid input, returns either an error or text that the application considers to be harmless. Another possibility is to avoid conversion out of UTF-8 altogether but this relies on any other software that the data is passed to safely handling the invalid data. Another consideration is error recovery. To guarantee correct recovery after corrupt or lost bytes, decoders must be able to recognize the difference between lead and trail bytes, rather than just assuming that bytes will be of the type allowed in their position. For security reasons, a UTF-8 decoder must not accept UTF-8 sequences that are longer than necessary to encode a character. If you use a parser to decode the UTF-8 encoding, make sure that parser filter the invalid UTF-8 characters (invalid forms or overlong forms). Look for overlong UTF-8 sequences starting with malicious pattern. You can also use a UTF-8 decoder stress test to test your UTF-8 parser (See Markus Kuhn's UTF-8 and Unicode FAQ in reference section) Assume all input is malicious. Create a white list that defines all valid input to the software system based on the requirements specifications. Input that does not match against the white list should not be permitted to enter into the system. Test your decoding process against malicious input. | ||||||||||||||||||||||||||||||||||||
| Attack Motivation- |
| ||||||||||||||||||||||||||||||||||||
| Context Description | Bruce Schneier was one of the first to raise the security issues with Unicode in the July 15, 2000 issue of Crypto-Gram newsletter. He pointed out that with the Unicode character set, it is possible that there could be multiple representations of a single character. In a security context it is primordial to determine the meaning of a character. According to the RFC 3629, a particularly subtle form of this attack can be carried out against a parser which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain illegal octet sequences as characters. For example, a parser might prohibit the NUL character when encoded as the single-octet sequence 00, but erroneously allow the illegal two-octet sequence C0 80 and interpret it as a NUL character. Another way that Unicode can cause problems is that the application or operation system can assign the same interpretation to different code points. Thus, even though the Unicode specification dictates that the code points should be treated differently, the application actually treats them the same. | ||||||||||||||||||||||||||||||||||||
| Injection Vector | The injection vector is an illegal sequences of bytes matching an UTF-8 characters or a "non-shortest form" in UTF-8 encoding format. | ||||||||||||||||||||||||||||||||||||
| Payload | The interpretation of malicious characters can cause unexpected responses from the target host. | ||||||||||||||||||||||||||||||||||||
| Activation Zone | The request or command interpreter is responsible for interpreting the request sent by the client. | ||||||||||||||||||||||||||||||||||||
| Payload Activation Impact | The malicious characters can defeat the data filtering mechanism and have many different outcomes such as path manipulation, remote code execution, etc. | ||||||||||||||||||||||||||||||||||||
| Related Weaknesses |
| ||||||||||||||||||||||||||||||||||||
| Related Attack Patterns |
| ||||||||||||||||||||||||||||||||||||
| Related Security Principles |
| ||||||||||||||||||||||||||||||||||||
| Related Guidelines |
| ||||||||||||||||||||||||||||||||||||
| Purpose | Penetration | ||||||||||||||||||||||||||||||||||||
| CIA Impact |
| ||||||||||||||||||||||||||||||||||||
| Technical Context |
| ||||||||||||||||||||||||||||||||||||
| References | G. Hoglund and G. McGraw. Exploiting Software: How to Break Code. Addison-Wesley, February 2004. CWE – Input Validation David Wheeler - http://www.dwheeler.com/sec Michael Howard and David LeBlanc - Writing Secure Code, chap12, Microsoft Press Bruce Schneier - Crypto-Gram Newsletter, July 15, 2000 - http://www.schneier.com/cry WikiPedia page about UTF-8, http://en.wikipedia.org/wik RFC 3629 - http://www.faqs.org/rfcs/rf IDS Evasion with Unicode, by Eric Hacker, Jan. 3, 2001 - http://www.securityfocus.co Corrigendum #1: UTF-8 Shortest Form - http://www.unicode.org/vers UTF-8 and Unicode FAQ for Unix/Linux, by Markus Kuhn - http://www.cl.cam.ac.uk/~mg UTF-8 decoder capability and stress test, by Markus Kuhn - http://www.cl.cam.ac.uk/%7E | ||||||||||||||||||||||||||||||||||||
| Source |
| ||||||||||||||||||||||||||||||||||||