CCSDS_study project
This commit is contained in:
300
netzob-030/doc/documentation/source/overview/index.rst
Normal file
300
netzob-030/doc/documentation/source/overview/index.rst
Normal file
@@ -0,0 +1,300 @@
|
||||
.. currentmodule:: netzob
|
||||
|
||||
.. _overview:
|
||||
|
||||
|
||||
Overview of Netzob
|
||||
==================
|
||||
|
||||
Netzob has been initiated by security auditors of
|
||||
`AMOSSYS <http://www.amossys.fr>`_ and the `CIDre research team of
|
||||
Supélec <http://www.rennes.supelec.fr/ren/rd/cidre/>`_ to address the
|
||||
reverse engineering of communication protocols.
|
||||
|
||||
Originaly, the development of Netzob has been initiated to support
|
||||
security auditors and evaluators in their activities of modeling and
|
||||
simulating undocumented protocols. The tool has then been extended to
|
||||
allow smart fuzzing of unknown protocol.
|
||||
|
||||
The following picture depicts the main modules of Netzob:
|
||||
|
||||
.. figure:: http://www.netzob.org/img/overview_archi.png
|
||||
:align: center
|
||||
:alt: Architecture of Netzob
|
||||
|
||||
Architecture of Netzob
|
||||
|
||||
- **Import module:** Data import is available in two ways: either by
|
||||
leveraging the channel-specific captors (currently network and IPC --
|
||||
Inter-Process Communication), or by using specific importers (such as
|
||||
PCAP files, structured files and OSpy files).
|
||||
- **Protocol inference modules:** The vocabulary and grammar inference
|
||||
methods constitute the core of Netzob. It provides both passive and
|
||||
active reverse engineering of communication flows through automated
|
||||
and manuals mechanisms.
|
||||
- **Simulation module:** Given vocabulary and grammar models previously
|
||||
inferred, Netzob can understand and generate communication traffic
|
||||
between multiple actors. It can act as either a client, a server or
|
||||
both.
|
||||
- **Export module:** This module permits to export an inferred model of
|
||||
a protocol in formats that are understandable by third party software
|
||||
or by a human. Current work focuses on export format compatible with
|
||||
main traffic dissectors (Wireshark and Scapy) and fuzzers (Peach and
|
||||
Sulley).
|
||||
|
||||
And here is a screenshot of the main graphical interface:
|
||||
|
||||
.. figure:: https://dev.netzob.org/attachments/96/netzob_UI.png
|
||||
:align: center
|
||||
:alt:
|
||||
|
||||
The following sections will describe in more details the available
|
||||
mechanisms.
|
||||
|
||||
|
||||
Import and capture data
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The first step in the inferring process of a protocol in Netzob is to
|
||||
capture and to import messages as samples. There are different methods
|
||||
to retrieve messages depending of the communication channel used (files,
|
||||
network, IPC, USB, etc.) and the format (PCAP, hex, raw binary flows,
|
||||
etc.).
|
||||
|
||||
The figure below describes the multiple communication channels and
|
||||
therefore possible sniffing point's Netzob aims at addressing.
|
||||
|
||||
.. figure:: http://www.netzob.org/img/overview_multipleFlows.png
|
||||
:align: center
|
||||
:alt: Multiple communication flows arround an application
|
||||
|
||||
Multiple communication flows arround an application
|
||||
|
||||
The current version (version 0.4) of Netzob deals with the following
|
||||
data sources :
|
||||
|
||||
- **Live network communications**
|
||||
- **Captured network communications** (PCAPs)
|
||||
- **Inter-Process Communications** (IPCs)
|
||||
- **Text and binary files**
|
||||
- **API flows** through `oSpy <http://code.google.com/p/ospy/>`_ file
|
||||
format support
|
||||
|
||||
Otherwise, if you plan to reverse a protocol implemented over an
|
||||
supported communication channel, Netzob's can manipulates any
|
||||
communications flow through an XML representation. Therefore, this
|
||||
situation only requires a specific development to capture the targeted
|
||||
flow and to save it using a compatible XML.
|
||||
|
||||
.. figure:: http://www.netzob.org/img/overview_extraImport.png
|
||||
:align: center
|
||||
:width: 800 px
|
||||
:alt: Importing data from an unknown communication channel using the XML definition
|
||||
|
||||
Importing data from an unknown communication channel using the XML
|
||||
definition
|
||||
|
||||
|
||||
Inferring message format and state machine with Netzob
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The vocabulary of a communication protocol defines all the words which
|
||||
are integrated in it. For example, the vocabulary of a malware's
|
||||
communication protocol looks like a set of possible commands : {"attack
|
||||
`www.google.fr <http://www.google.fr>`_", "dnspoison
|
||||
this.dns.server.com", "execute 'uname -a'", ...}. Another example of a
|
||||
vocabulary is the set of valids words in the HTTP protocol : { "GET
|
||||
/images/logo.png HTTP/1.1 ...", "HTTP/1.1 200 OK ...", ...}.
|
||||
|
||||
Netzob's vocabulary inferring process has been designed in order to
|
||||
retrieve the set of all possible words used in a targeted protocol and
|
||||
to identify their structures. Indeed words are made of different fields
|
||||
which are defined by their value and types. Hence a word can be
|
||||
described using the structure of its fields.
|
||||
|
||||
We describe the learning process implemented in Netzob to
|
||||
semi-automatically infer the vocabulary and the grammar of a protocol.
|
||||
This process, illustrated in the following picture, is performed in
|
||||
three main steps:
|
||||
|
||||
#. **Clustering messages and partitioning these messages in fields.**
|
||||
#. **Characterizing message fields and abstracting similar messages in
|
||||
symbols.**
|
||||
#. **Inferring the transition graph of the protocol.**
|
||||
|
||||
.. figure:: http://www.netzob.org/img/overview_inferenceSteps.png
|
||||
:align: center
|
||||
:width: 800 px
|
||||
:alt: The main functionalities
|
||||
|
||||
The main functionalities
|
||||
|
||||
|
||||
Step 1: clustering Messages and Partitioning in Fields
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
To discover the format of a symbol, Netzob supports different
|
||||
partitioning approaches. In this article we describe the most accurate
|
||||
one, that leverages sequence alignment processes. This technique permits
|
||||
to align invariants in a set of messages. The `Needleman-Wunsh
|
||||
algorithm <http://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm>`_
|
||||
performs this task optimally. Needleman-Wunsh is particularly effective
|
||||
on protocols where dynamic fields have variable lengths (as shown on the
|
||||
following picture).
|
||||
|
||||
.. figure:: http://www.netzob.org/img/overview_needleman.png
|
||||
:align: center
|
||||
:alt: Sequence alignment with Needleman-Wunsh algorithm
|
||||
|
||||
Sequence alignment with Needleman-Wunsh algorithm
|
||||
|
||||
When partitioning and clustering processes are done, we obtain a
|
||||
relevant first approximation of the overall message formats. The next
|
||||
step consists in determining the characteristics of the fields.
|
||||
|
||||
If the size of those fields is fixed, as in TCP and IP headers, it is
|
||||
preferable to apply a basic partitioning, also provided by Netzob. Such
|
||||
partitioning works by aligning each message by the left, then
|
||||
separating successive fixed columns from successive dynamic columns.
|
||||
|
||||
To regroup aligned messages by similarity, the Needleman-Wunsh algorithm
|
||||
is used in conjunction with a clustering algorithm. The applied
|
||||
algorithm is `UPGMA <http://en.wikipedia.org/wiki/UPGMA>`_.
|
||||
|
||||
|
||||
Step 2 : characterization of Fields
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The field type identification partially derives from the partitioning
|
||||
inference step. For fields containing only invariants, the type merely
|
||||
corresponds to the invariant value. For other fields, the type is
|
||||
automatically materialized, in first approximation, with a regular
|
||||
expression, as shown on next figure. This form enables easy validation of
|
||||
the data compliance with a specific type. Moreover, Netzob offers the
|
||||
possibility to visualize the definition domain of a field. This helps to
|
||||
manually refine the type associated with a field.
|
||||
|
||||
.. figure:: http://www.netzob.org/img/overview_fieldType.png
|
||||
:align: center
|
||||
:alt: Characterization of field type
|
||||
|
||||
Characterization of field type
|
||||
|
||||
Some intra-symbol dependencies are automatically identified. The size
|
||||
field, present in many protocol formats, is an example of intra-symbol
|
||||
dependency. A search algorithm has been designed to look for potential
|
||||
size fields and their associated payloads. By extension, this technique
|
||||
permits to discover encapsulated protocol payloads.
|
||||
|
||||
Environmental dependencies are also identified by looking for specific
|
||||
values retrieved during message capture. Such specific values consist of
|
||||
characteristics of the underlying hardware, operating system and network
|
||||
configuration. During the dependency analysis, these characteristics are
|
||||
searched in various encoding.
|
||||
|
||||
|
||||
Step 3: inferring the Transition Graph of the Protocol
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The third step of the learning process discovers and extracts the
|
||||
transition graph from a targeted protocol (also called the grammar).
|
||||
More formally, the grammar of a communication protocol defines the set
|
||||
of valid sentences which can be produced by a communication. A sentence
|
||||
is a sorted set of words which may be received or emmited by a protocol
|
||||
handler. An exemple of a simple sentence is :
|
||||
|
||||
::
|
||||
|
||||
["attack www.google.fr", "attack has failed", "attack www.kernel.org", "root access granted."]
|
||||
|
||||
which can be described using the following simple automata with S0 the
|
||||
initial state :
|
||||
|
||||
.. figure:: http://www.netzob.org/img/overview_exampleSimpleGrammar.png
|
||||
:align: center
|
||||
:alt: Schema of a simple grammar
|
||||
|
||||
Schema of a simple grammar
|
||||
|
||||
The learning process step is achieved by a set of active experiments
|
||||
that stimulate a real client or server implementation using successive
|
||||
sequences of input symbols and analyze its responses.
|
||||
|
||||
In Netzob, the automata used to represent or model a communication
|
||||
protocol is an extended version of a Mealy automata which includes
|
||||
semi-stochastic transitions, contextualized and parametrized inputs and
|
||||
outputs. The first academic presention of this model is included in a
|
||||
dedicated scientific paper provided in the documentation section.
|
||||
|
||||
The model is inferred through a dedicated **active** process which
|
||||
consists in stimulating an implementation and to analyze its responses.
|
||||
In this process, we use the previously infered vocabulary to discover
|
||||
and to learn the grammar of the communication protocol. Each stimulation
|
||||
is computed following an extension of the **Angluin L** algorithm\*.
|
||||
|
||||
Protocol simulation
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
One of our main goal is to generate realistic network traffic from
|
||||
undocummented protocols. Therefore, we have implemented a dedicated
|
||||
module that, given vocabulary and grammar models previously infered, can
|
||||
simulate a communication protocol between multiple bots and masters.
|
||||
Besides their use of the same model, each actors is independent from the
|
||||
others and is organized around three main stages.
|
||||
|
||||
The first stage is a dedicated library that reads and writes from the
|
||||
network channel. It also parses the flow in messages according to
|
||||
previous protocols layers. The second stage uses the vocabulary to
|
||||
abstract received messages into symbols and vice-versa to specialize
|
||||
emitted symbols into messages. A memory buffer is also available to
|
||||
manage dependency relations. The last stage implements the grammar model
|
||||
and computes which symbols must be emitted or received according to the
|
||||
current state and time.
|
||||
|
||||
Smart fuzzing with Netzob
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
A typical example of dynamic vulnerability analysis is the robustness
|
||||
tests. It can be used to reveal software programming errors which can
|
||||
leads to software security vulnerabilities. These tests provide an
|
||||
efficient and almost automated solution to easily identify and study
|
||||
exposed surfaces of systems. Nevertheless, to be fully efficient, the
|
||||
fuzzing approaches must cover the complete definition domain and
|
||||
combination of all the variables which exist in a protocol (IP adresses,
|
||||
serial numbers, size fields, payloads, message identifer, etc.). But
|
||||
fuzzing typical communication interface requires too many test cases due
|
||||
to the complex variation domains introduced by the semantic layer of a
|
||||
protocol. In addition to this, an efficient fuzzing should also cover
|
||||
the state machine of a protocol which also brings another huge set of
|
||||
variations. The necessary time is nearly always too high and therefore
|
||||
limits the efficiency of this approach.
|
||||
|
||||
With all these contraints, achieving robustness tests on a target is
|
||||
feasible only if the expert has access to a specially designed tool for
|
||||
the targeted protocol. Hence the emergence of a large number of tools to
|
||||
verify the behavior of an application on one or more communication
|
||||
protocols. However in the context of proprietary communications
|
||||
protocols for which no specifications are published, fuzzers do not
|
||||
provide optimal results.
|
||||
|
||||
Netzob helps the security evaluator by simplifying the creation of a
|
||||
dedicated fuzzer for a proprietary or undocumented protocol. It provides
|
||||
to the expert means to execute a semi-automated inferring process to create a
|
||||
model of the targeted protocol. This model can afterward be refined by
|
||||
the evaluator. Finally, the created model is included in the fuzzing
|
||||
module of Netzob which considers the vocabulary and the grammar of the
|
||||
protocol to generate optimized and specific test cases. Both mutation
|
||||
and generation are available for fuzzing.
|
||||
|
||||
Export protocol model
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The following export formats are currently provided by Netzob:
|
||||
|
||||
- XML format
|
||||
- human readable (Wireshark like)
|
||||
- Peach fuzzer export: this enables efficiency combination of Peach
|
||||
Fuzzer on previously undocumented protocols.
|
||||
|
||||
Besides, you can write your own exporter to manipulate the inferred
|
||||
protocol model in your favorite tool.
|
||||
Reference in New Issue
Block a user