CAPEC

Common Attack Pattern Enumeration and Classification
Common Attack Pattern Enumeration and Classification

A Community Knowledge Resource for Building Secure Software

Home > CAPEC List > Individual CAPEC Dictionary Definition (Release 1.1)   View the CAPEC List

Individual CAPEC Dictionary Definition (Release 1.1)
Individual CAPEC Dictionary Definition (Release 1.1)

Using Unicode Encoding to Bypass Validation Logic
Attack Pattern ID
Pattern Abstraction: Detailed

71

Typical Severity

High

Description

Summary


An attacker may provide a unicode string to a system component that is not unicode aware and use that to circumvent the filter or cause the classifying mechanism to fail to properly understanding the request.  That may allow the attacker to slip malicious data past the content filter and/or possibly cause the application to route the request incorrectly.

Unicode is a system for encoding character strings in a 16-bit  representation so that characters from a number of different languages can be represented. Unicode involves using 2 bytes for every character instead of the customary single byte found in ASCII encoding. Any system that is unicode aware may be capable of converting unicode strings into ASCII byte strings. If the native language of the system or the APIs that are being used require normal byte strings, then the system may provide for a translation from unicode.
    

Attack Execution Flow

  1. Try unicode encoding for parts of the input in order to try to get past the filters. For instance, ce, by encoding certain characters in the URL (e.g. dots and sleshes) an attacker may try to get access to restricted resources on the web server or force browse to protected pages (thus subverting the authorization service). An attacker can also attempt other injection style attacks using this attack pattern: command injection, SQL injection, etc.

Attack Prerequisites

Filtering is performed on data that has not be properly canonicalized.

Typical Likelihood of Exploit

Medium

Methods of Attack
  • Modification of Resources
  • API Abuse
  • Injection
Examples-Instances

Description


Attack Example: Unicode Encodings in the IIS Server

A very common technique for a unicode attack involves traversing directories looking for interesting files. An example of this idea applied to the Web is

http://target.server/some_directory/../../../winnt

In this case, the attacker is attempting to traverse to a directory that is not supposed to be part of standard Web services. The trick is fairly obvious, so many Web servers and scripts prevent it. However, using alternate encoding tricks, an attacker may be able to get around badly implemented request filters.

In October 2000, a hacker publicly revealed that Microsoft’s IIS server suffered from a variation of this problem. In the case of IIS, all the attacker had to do was provide alternate encodings for the dots and/or slashes found in a classic attack. The unicode translations are

.    yields    C0 AE
/    yields    C0 AF
\    yields    C1 9C

Using this conversion, the previously displayed URL can be encoded as

http://target.server/some_directory/%C0AE/%C0AE/%C0AE%C0AE
/%C0AE%C0AE/winnt

Related Vulnerability

CVE-2000-0884

Attacker Skill or Knowledge Required

Medium: An attacker needs to understand unicode encodings and have an idea (or be able to find out) what system components may not be unicode aware.

Resources Required

Indicators-Warnings of Attack

Unicode encoded data is passed to APIs where it is not expected

Solutions and Mitigations

Ensure that the system is Unicode aware and can properly process Unicode data. Do not make an assumption that data will be in ASCII.

Ensure that filtering or input validation is applied to canonical data.

Assume all input is malicious. Create a white list that defines all valid input to the software system based on the requirements specifications. Input that does not match against the white list should not be permitted to enter into the system.

Attack Motivation-Consequences
  • Privilege Escalation
  • Run Arbitrary Code
  • Data Modification
  • Denial of Service
Context Description


Building “Equivalent” Requests

A large number of commands are subject to parsing or filtering. In many cases a filter only considers one particular way to format a command. The fact is that the same command can usually be encoded in thousands of different ways. In many cases, an alternative encoding for the command will produce exactly the same results as the original command. Thus, two commands that look different from the logical perspective of a filter end up producing the same semantic result. In many cases, an alternatively encoded command can be used to attack a software system, because the alternative
command allows an attacker to perform an operation that would otherwise be blocked.

Mapping the API Layer

A good approach to help identify and map possible alternate encodings involves writing a small program that loops through all possible inputs to a given API call. This program can, for example, attempt to encode filenames in a variety of ways. For each iteration of the loop, the “mungified” filename can be passed to the API call and the result noted.

The following code snippet loops through many possible values that can be used as a prefix to the string \test.txt. Results of running a program like this can help us to determine which characters can be used to perform a ../../ (dots and slashes) relative traversal attack.

int main(int argc, char* argv[])
{
   for(unsigned long c=0x01010101;c != -1;c++)
   {
        char _filepath[255];
        sprintf(_filepath, "%c%c%c%c\\test.txt", c >> 24, c >> 16, c >> 8, c&0x000000FF );

        try
       {
       FILE *in_file = fopen(_filepath, "r");

       if(in_file)
      {
             printf("checking path %s\n", _filepath);
             puts("file opened!");
             getchar();
             fclose(in_file);
      }
      }
      catch(...)
     {

     }
  }
  return 0;
}

Slight (but still automatic) modifications can be made to the string in creative ways. Ultimately, the modified string boils down to an attempt to use different tricks to obtain the same file. For example, one resulting attempt might try a command like this:

sprintf(_filepath, "..%c\\..%c\\..%c\\..%c\\scans2.txt", c, c, c, c);

A good way to think about this problem is to think of layers. The API call layer is what the examples shown here are mapping. If an engineer has placed any filters in front of the API call, then these filters can be considered additional layers, wrapping the original set of possibilities. By pondering all the possible inputs that can be provided at the API layer, we can begin uncovering and exercising any filters that the software has in place. If we know that the software definitely uses file API calls, we can try all kinds of filename encoding tricks that we know about. If we get lucky, eventually one set of encoding tricks will work, and we can get our data successfully through the filters and into the API call.

Drawing on the techniques described in Chapter 5, we can list a number of possible escape codes that can be injected into API calls (many of which help with the filter avoidance problem). If the data are eventually being piped into a shell, for example, we might be able to get control codes to take effect. A particular call may write data to a file or a stream that are eventually meant to be viewed on a terminal or in a client program. As a simple example, the following string contains two backspace characters that
are very likely to show up in the terminal’s execution:

write("echo hey!\x08\x08");

When the terminal interprets the data we have passed in, the output will be missing the last two characters of the original string. This kind of trick has been used for ages to corrupt data in log files. Log files capture all kinds of data about a transaction. It may be possible to insert NULL characters (for
example, %00 or '\0') or to add so many extra characters to the string that the request is truncated in the log. Imagine a request that has more than a thousand extra characters tacked on at the end. Ultimately, the string may be trimmed in the log file, and the important telltale data that expose an attack will be lost.

Character Conversion

Cases where one part of the software converts data before the data are passed on to the next part also make good targets. In these “data chains,” characters often get converted many times. For example, if a user supplies the + character to a standard-issue Web server, it will be converted into a
space before it’s used on the file system.

From G. Hoglund and G. McGraw. Exploiting Software: How to Break Code. Addison-Wesley, February 2004.

Related Weaknesses
CWE-IDWeakness NameWeakness Relationship Type
176Failure to Handle Unicode EncodingTargeted
171Cleansing, Canonicalization, and Comparison ErrorsTargeted
179Incorrect Behavior Order: Early ValidationTargeted
180Incorrect Behavior Order: Validate Before CanonicalizeTargeted
173Failure to Handle Alternate EncodingTargeted
172Encoding ErrorTargeted
184Incomplete BlacklistTargeted
183Permissive WhitelistTargeted
74Failure to Sanitize Data into a Different Plane (aka 'Injection')Targeted
20Insufficient Input ValidationTargeted
Related Attack Patterns
IDNameRelationship TypeRelationship Description
Using Alternate Encodings to Bypass Validation LogicMore Detailed
64Using Slashes and URL Encoding Combined to Bypass Validation LogicSimilar
79Using Slashes in Alternate EncodingSimilar
72URL EncodingSimilar
43Exploiting Multiple Input Interpretation LayersSimilar
Relevant Security Requirements

Canonicalize data prior to performing any validation or filtering on it. Be aware of alternate encodings.

Purpose

Penetration

CIA Impact
Confidentiality ImpactIntegrity ImpactAvailability Impact
MediumHighMedium
Technical Context
Architectural ParadigmFrameworkPlatformLanguage
AllAllAllAll
References

G. Hoglund and G. McGraw. Exploiting Software: How to Break Code. Addison-Wesley, February 2004.

CWE – Input Validation

Source
Submission(s)
SubmitterOrganizationDateComment
G. Hoglund and G. McGraw. Exploiting Software: How to Break Code. Addison-Wesley, February 2004.Cigital, Inc2007-03-01
Modification(s)
ModifierOrganizationDateComment
Eugene LebanidzeCigital, Inc2007-02-26Fleshed out content to CAPEC schema from the original descriptions in "Exploiting Software"
Sean BarnumCigital, Inc2007-03-05Review and revise
Richard StruseVOXEM, Inc2007-03-26Review and feedback leading to changes in Name, Related Attack Patterns
Sean BarnumCigital, Inc2007-04-13Modified pattern content according to review and feedback
 
Page Last Updated: April 18, 2008