CAPEC - CAPEC-71: Using Unicode Encoding to Bypass Validation Logic (Version 3.9)


Common Attack Pattern Enumeration and Classification A Community Resource for Identifying and Understanding Attacks

Home > CAPEC List > CAPEC-71: Using Unicode Encoding to Bypass Validation Logic (Version 3.9)

CAPEC-71: Using Unicode Encoding to Bypass Validation Logic

Attack Pattern ID: 71

Abstraction: Detailed

View customized information:

Description

An attacker may provide a Unicode string to a system component that is not Unicode aware and use that to circumvent the filter or cause the classifying mechanism to fail to properly understanding the request. That may allow the attacker to slip malicious data past the content filter and/or possibly cause the application to route the request incorrectly.

Likelihood Of Attack

Medium

Typical Severity

High

Relationships

This table shows the other attack patterns and high level categories that are related to this attack pattern. These relationships are defined as ChildOf and ParentOf, and give insight to similar items that may exist at higher and lower levels of abstraction. In addition, relationships such as CanFollow, PeerOf, and CanAlsoBe are defined to show similar attack patterns that the user may want to explore.

Nature	Type	ID	Name
ChildOf	Standard Attack Pattern - A standard level attack pattern in CAPEC is focused on a specific methodology or technique used in an attack. It is often seen as a singular piece of a fully executed attack. A standard attack pattern is meant to provide sufficient details to understand the specific technique and how it attempts to accomplish a desired goal. A standard level attack pattern is a specific type of a more abstract meta level attack pattern.	267	Leverage Alternate Encoding
PeerOf	Detailed Attack Pattern - A detailed level attack pattern in CAPEC provides a low level of detail, typically leveraging a specific technique and targeting a specific technology, and expresses a complete execution flow. Detailed attack patterns are more specific than meta attack patterns and standard attack patterns and often require a specific protection mechanism to mitigate actual attacks. A detailed level attack pattern often will leverage a number of different standard level attack patterns chained together to accomplish a goal.	80	Using UTF-8 Encoding to Bypass Validation Logic

This table shows the views that this attack pattern belongs to and top level categories within that view.

View Name	Top Level Categories
Domains of Attack	Software
Mechanisms of Attack	Manipulate Data Structures

Execution Flow

Explore

Survey the application for user-controllable inputs: Using a browser or an automated tool, an attacker follows all public links and actions on a web site. They record all the links, the forms, the resources accessed and all other potential entry-points for the web application.

Techniques
Use a spidering tool to follow and record all links and analyze the web pages to find entry points. Make special note of any links that include parameters in the URL.
Use a proxy tool to record all user input entry points visited during a manual traversal of the web application.
Use a browser to manually explore the website and analyze how it is constructed. Many browsers' plugins are available to facilitate the analysis or automate the discovery.

Experiment

Probe entry points to locate vulnerabilities: The attacker uses the entry points gathered in the "Explore" phase as a target list and injects various Unicode encoded payloads to determine if an entry point actually represents a vulnerability with insufficient validation logic and to characterize the extent to which the vulnerability can be exploited.

Techniques
Try to use Unicode encoding of content in Scripts in order to bypass validation routines.
Try to use Unicode encoding of content in HTML in order to bypass validation routines.
Try to use Unicode encoding of content in CSS in order to bypass validation routines.

Prerequisites

Filtering is performed on data that has not be properly canonicalized.

Skills Required

[Level: Medium]

An attacker needs to understand Unicode encodings and have an idea (or be able to find out) what system components may not be Unicode aware.

Indicators

Unicode encoded data is passed to APIs where it is not expected

Consequences

This table specifies different individual consequences associated with the attack pattern. The Scope identifies the security property that is violated, while the Impact describes the negative technical impact that arises if an adversary succeeds in their attack. The Likelihood provides information about how likely the specific consequence is expected to be seen relative to the other consequences in the list. For example, there may be high likelihood that a pattern will be used to achieve a certain impact, but a low likelihood that it will be exploited to achieve a different impact.

Scope	Impact	Likelihood
Confidentiality Access Control Authorization	Bypass Protection Mechanism
Confidentiality Integrity Availability	Execute Unauthorized Commands
Integrity	Modify Data
Availability	Unreliable Execution

Mitigations

Ensure that the system is Unicode aware and can properly process Unicode data. Do not make an assumption that data will be in ASCII.

Ensure that filtering or input validation is applied to canonical data.

Assume all input is malicious. Create an allowlist that defines all valid input to the software system based on the requirements specifications. Input that does not match against the allowlist should not be permitted to enter into the system.

Example Instances

A very common technique for a Unicode attack involves traversing directories looking for interesting files. An example of this idea applied to the Web is

http://target.server/some_directory/../../../winnt

In this case, the attacker is attempting to traverse to a directory that is not supposed to be part of standard Web services. The trick is fairly obvious, so many Web servers and scripts prevent it. However, using alternate encoding tricks, an attacker may be able to get around badly implemented request filters.

In October 2000, an adversary publicly revealed that Microsoft's IIS server suffered from a variation of this problem. In the case of IIS, all the attacker had to do was provide alternate encodings for the dots and/or slashes found in a classic attack. The Unicode translations are

. yields C0 AE
/ yields C0 AF
\ yields C1 9C

Using this conversion, the previously displayed URL can be encoded as

http://target.server/some_directory/%C0AE/%C0AE/%C0AE%C0AE/%C0AE%C0AE/winnt

See also: CVE-2000-0884

Related Weaknesses

A Related Weakness relationship associates a weakness with this attack pattern. Each association implies a weakness that must exist for a given attack to be successful. If multiple weaknesses are associated with the attack pattern, then any of the weaknesses (but not necessarily all) may be present for the attack to be successful. Each related weakness is identified by a CWE identifier.

CWE-ID	Weakness Name
176	Improper Handling of Unicode Encoding
179	Incorrect Behavior Order: Early Validation
180	Incorrect Behavior Order: Validate Before Canonicalize
173	Improper Handling of Alternate Encoding
172	Encoding Error
184	Incomplete List of Disallowed Inputs
183	Permissive List of Allowed Inputs
74	Improper Neutralization of Special Elements in Output Used by a Downstream Component ('Injection')
20	Improper Input Validation
697	Incorrect Comparison
692	Incomplete Denylist to Cross-Site Scripting

Taxonomy Mappings

CAPEC mappings to ATT&CK techniques leverage an inheritance model to streamline and minimize direct CAPEC/ATT&CK mappings. Inheritance of a mapping is indicated by text stating that the parent CAPEC has relevant ATT&CK mappings. Note that the ATT&CK Enterprise Framework does not use an inheritance model as part of the mapping to CAPEC.

Relevant to the ATT&CK taxonomy mapping (see parent)

Relevant to the OWASP taxonomy mapping

Entry Name
Unicode Encoding

References

[REF-1] G. Hoglund and G. McGraw. "Exploiting Software: How to Break Code". Addison-Wesley. 2004-02.

Content History

Submissions
Submission Date	Submitter	Organization
2014-06-23 (Version 2.6)	CAPEC Content Team	The MITRE Corporation
2014-06-23 (Version 2.6)
Modifications
Modification Date	Modifier	Organization
2017-01-09 (Version 2.9)	CAPEC Content Team	The MITRE Corporation
2017-01-09 (Version 2.9)	Updated Related_Attack_Patterns
2018-07-31 (Version 2.12)	CAPEC Content Team	The MITRE Corporation
2018-07-31 (Version 2.12)	Updated References
2020-07-30 (Version 3.3)	CAPEC Content Team	The MITRE Corporation
2020-07-30 (Version 3.3)	Updated Execution_Flow, Mitigations
2020-12-17 (Version 3.4)	CAPEC Content Team	The MITRE Corporation
2020-12-17 (Version 3.4)	Updated Taxonomy_Mappings
2021-06-24 (Version 3.5)	CAPEC Content Team	The MITRE Corporation
2021-06-24 (Version 3.5)	Updated Related_Weaknesses
2022-09-29 (Version 3.8)	CAPEC Content Team	The MITRE Corporation
2022-09-29 (Version 3.8)	Updated Example_Instances

More information is available — Please select a different filter.

Page Last Updated or Reviewed: July 31, 2018


	Site Map \| Terms of Use \| Manage Cookies \| Cookie Notice \| Privacy Policy \| Contact Us \| Use of the Common Attack Pattern Enumeration and Classification (CAPEC), and the associated references from this website are subject to the Terms of Use. Copyright © 2007–2026, The MITRE Corporation. CAPEC and the CAPEC logo are trademarks of The MITRE Corporation.

Common Attack Pattern Enumeration and Classification

CAPEC-71: Using Unicode Encoding to Bypass Validation Logic