An attacker engages in activities to decipher and/or decode protocol
information for a network or application communication protocol used for
transmitting information between interconnected nodes or systems on a
packet-switched data network. While this type of analysis inherently
involves the analysis of a networking protocol, it does not require the
presence of an actual or physical network. Although certain techniques for
protocol analysis benefit from manipulating live 'on-the-wire' interactions
between communicating components, static or dynamic analysis techniques
applied to executables as well as to device drivers such as network
interface drivers, can also be used to reveal the function and
characteristics of a communication protocol implementation.
Depending upon the methods used, protocol reverse engineering can involve
similar methods as those employed when reverse engineering an executable, or
the process may involve observing, interacting, and modifying actual
communications occurring between hosts. The goal of protocol reverse
engineering is to derive the data transmission syntax, as well as to extract
the meaningful content, including packet or content delimiters used by the
protocol. This type of analysis is often performed on closed-specification
protocols, or proprietary protocols, but is also useful for analyzing
publicly available specifications to determine how particular
implementations deviate from published specifications.
There are several challenges inherent to protocol reverse engineering
depending upon the nature of the protocol being analyzed. There may also be
other types of factors which complicate the process such as encryption or ad
hoc obfuscation of the protocol. In general there are two kinds of
networking protocols, each associated with its own challenges and analysis
approaches or methodologies. Some protocols are human-readable, which is to
say they are text-based protocols. Examples of these types of protocols
include HTTP, SMTP, and SOAP. Additionally, application-layer protocols can
be embedded or encapsulated within human-readable protocols in the data
portion of the packet. Typically, human-readable protocol implementations
are susceptible to automatic decoding by the appropriate tools, such as
Wireshark/ethereal, tcpdump, or similar protocol sniffers or
analyzers.
The presence of well-known protocol specifications in addition to easily
identified protocol delimiters, such as Carriage Return or Line Feed
characters (CRLF) result in text-based protocols susceptibility to direct
scrutiny through manual processes. Protocol reverse engineering against
protocol implementation such as HTTP is often performed to identify
idiosyncratic implementations of a protocol by a server or client. In the
case of application-layer protocols which are embedded within text-based
protocols, analysis techniques typically benefit from the well-known nature
of the encapsulating protocols and can focus on discovering the semantic
characteristics of the proprietary protocol or API, since the syntax and
protocol delimiters of the underlying protocols can be readily
identified.
When performing protocol analysis of machine-readable (non text-based)
protocols difficulties emerge as the protocol itself was designed to be read
by computing process. Such protocols are typically composed entirely in
binary with no apparent syntax, grammar, or structural boundaries. Examples
of these types of protocols are IP, UDP, and TCP. Binary protocols with
published specifications can be automatically decoded by protocol analyzers,
but in the case of proprietary, closed-specification, binary protocols there
are no immediate indicators of packet syntax such as packet boundaries,
delimiters, or structure, or the presence or absence of encryption or
obfuscation. In these cases there is no one technology that can extract or
reveal the structure of the packet on the wire, so it is necessary to use
trial and error approaches while observing application behavior based on
systematic mutations introduced at the packet-level. Tools such as Protocol
Debug (PDB) or other packet injection suites are often employed. In cases
where the binary executable is available, protocol analysis can be augmented
with static and dynamic analysis techniques.