CCSDS_study project

This commit is contained in:
2026-05-05 21:54:35 +08:00
commit 9be41f9270
585 changed files with 91275 additions and 0 deletions

View File

@@ -0,0 +1,300 @@
.. currentmodule:: netzob
.. _overview:
Overview of Netzob
==================
Netzob has been initiated by security auditors of
`AMOSSYS <http://www.amossys.fr>`_ and the `CIDre research team of
Supélec <http://www.rennes.supelec.fr/ren/rd/cidre/>`_ to address the
reverse engineering of communication protocols.
Originaly, the development of Netzob has been initiated to support
security auditors and evaluators in their activities of modeling and
simulating undocumented protocols. The tool has then been extended to
allow smart fuzzing of unknown protocol.
The following picture depicts the main modules of Netzob:
.. figure:: http://www.netzob.org/img/overview_archi.png
:align: center
:alt: Architecture of Netzob
Architecture of Netzob
- **Import module:** Data import is available in two ways: either by
leveraging the channel-specific captors (currently network and IPC --
Inter-Process Communication), or by using specific importers (such as
PCAP files, structured files and OSpy files).
- **Protocol inference modules:** The vocabulary and grammar inference
methods constitute the core of Netzob. It provides both passive and
active reverse engineering of communication flows through automated
and manuals mechanisms.
- **Simulation module:** Given vocabulary and grammar models previously
inferred, Netzob can understand and generate communication traffic
between multiple actors. It can act as either a client, a server or
both.
- **Export module:** This module permits to export an inferred model of
a protocol in formats that are understandable by third party software
or by a human. Current work focuses on export format compatible with
main traffic dissectors (Wireshark and Scapy) and fuzzers (Peach and
Sulley).
And here is a screenshot of the main graphical interface:
.. figure:: https://dev.netzob.org/attachments/96/netzob_UI.png
:align: center
:alt:
The following sections will describe in more details the available
mechanisms.
Import and capture data
~~~~~~~~~~~~~~~~~~~~~~~
The first step in the inferring process of a protocol in Netzob is to
capture and to import messages as samples. There are different methods
to retrieve messages depending of the communication channel used (files,
network, IPC, USB, etc.) and the format (PCAP, hex, raw binary flows,
etc.).
The figure below describes the multiple communication channels and
therefore possible sniffing point's Netzob aims at addressing.
.. figure:: http://www.netzob.org/img/overview_multipleFlows.png
:align: center
:alt: Multiple communication flows arround an application
Multiple communication flows arround an application
The current version (version 0.4) of Netzob deals with the following
data sources :
- **Live network communications**
- **Captured network communications** (PCAPs)
- **Inter-Process Communications** (IPCs)
- **Text and binary files**
- **API flows** through `oSpy <http://code.google.com/p/ospy/>`_ file
format support
Otherwise, if you plan to reverse a protocol implemented over an
supported communication channel, Netzob's can manipulates any
communications flow through an XML representation. Therefore, this
situation only requires a specific development to capture the targeted
flow and to save it using a compatible XML.
.. figure:: http://www.netzob.org/img/overview_extraImport.png
:align: center
:width: 800 px
:alt: Importing data from an unknown communication channel using the XML definition
Importing data from an unknown communication channel using the XML
definition
Inferring message format and state machine with Netzob
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The vocabulary of a communication protocol defines all the words which
are integrated in it. For example, the vocabulary of a malware's
communication protocol looks like a set of possible commands : {"attack
`www.google.fr <http://www.google.fr>`_", "dnspoison
this.dns.server.com", "execute 'uname -a'", ...}. Another example of a
vocabulary is the set of valids words in the HTTP protocol : { "GET
/images/logo.png HTTP/1.1 ...", "HTTP/1.1 200 OK ...", ...}.
Netzob's vocabulary inferring process has been designed in order to
retrieve the set of all possible words used in a targeted protocol and
to identify their structures. Indeed words are made of different fields
which are defined by their value and types. Hence a word can be
described using the structure of its fields.
We describe the learning process implemented in Netzob to
semi-automatically infer the vocabulary and the grammar of a protocol.
This process, illustrated in the following picture, is performed in
three main steps:
#. **Clustering messages and partitioning these messages in fields.**
#. **Characterizing message fields and abstracting similar messages in
symbols.**
#. **Inferring the transition graph of the protocol.**
.. figure:: http://www.netzob.org/img/overview_inferenceSteps.png
:align: center
:width: 800 px
:alt: The main functionalities
The main functionalities
Step 1: clustering Messages and Partitioning in Fields
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To discover the format of a symbol, Netzob supports different
partitioning approaches. In this article we describe the most accurate
one, that leverages sequence alignment processes. This technique permits
to align invariants in a set of messages. The `Needleman-Wunsh
algorithm <http://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm>`_
performs this task optimally. Needleman-Wunsh is particularly effective
on protocols where dynamic fields have variable lengths (as shown on the
following picture).
.. figure:: http://www.netzob.org/img/overview_needleman.png
:align: center
:alt: Sequence alignment with Needleman-Wunsh algorithm
Sequence alignment with Needleman-Wunsh algorithm
When partitioning and clustering processes are done, we obtain a
relevant first approximation of the overall message formats. The next
step consists in determining the characteristics of the fields.
If the size of those fields is fixed, as in TCP and IP headers, it is
preferable to apply a basic partitioning, also provided by Netzob. Such
partitioning works by aligning each message by the left, then
separating successive fixed columns from successive dynamic columns.
To regroup aligned messages by similarity, the Needleman-Wunsh algorithm
is used in conjunction with a clustering algorithm. The applied
algorithm is `UPGMA <http://en.wikipedia.org/wiki/UPGMA>`_.
Step 2 : characterization of Fields
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The field type identification partially derives from the partitioning
inference step. For fields containing only invariants, the type merely
corresponds to the invariant value. For other fields, the type is
automatically materialized, in first approximation, with a regular
expression, as shown on next figure. This form enables easy validation of
the data compliance with a specific type. Moreover, Netzob offers the
possibility to visualize the definition domain of a field. This helps to
manually refine the type associated with a field.
.. figure:: http://www.netzob.org/img/overview_fieldType.png
:align: center
:alt: Characterization of field type
Characterization of field type
Some intra-symbol dependencies are automatically identified. The size
field, present in many protocol formats, is an example of intra-symbol
dependency. A search algorithm has been designed to look for potential
size fields and their associated payloads. By extension, this technique
permits to discover encapsulated protocol payloads.
Environmental dependencies are also identified by looking for specific
values retrieved during message capture. Such specific values consist of
characteristics of the underlying hardware, operating system and network
configuration. During the dependency analysis, these characteristics are
searched in various encoding.
Step 3: inferring the Transition Graph of the Protocol
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The third step of the learning process discovers and extracts the
transition graph from a targeted protocol (also called the grammar).
More formally, the grammar of a communication protocol defines the set
of valid sentences which can be produced by a communication. A sentence
is a sorted set of words which may be received or emmited by a protocol
handler. An exemple of a simple sentence is :
::
["attack www.google.fr", "attack has failed", "attack www.kernel.org", "root access granted."]
which can be described using the following simple automata with S0 the
initial state :
.. figure:: http://www.netzob.org/img/overview_exampleSimpleGrammar.png
:align: center
:alt: Schema of a simple grammar
Schema of a simple grammar
The learning process step is achieved by a set of active experiments
that stimulate a real client or server implementation using successive
sequences of input symbols and analyze its responses.
In Netzob, the automata used to represent or model a communication
protocol is an extended version of a Mealy automata which includes
semi-stochastic transitions, contextualized and parametrized inputs and
outputs. The first academic presention of this model is included in a
dedicated scientific paper provided in the documentation section.
The model is inferred through a dedicated **active** process which
consists in stimulating an implementation and to analyze its responses.
In this process, we use the previously infered vocabulary to discover
and to learn the grammar of the communication protocol. Each stimulation
is computed following an extension of the **Angluin L** algorithm\*.
Protocol simulation
~~~~~~~~~~~~~~~~~~~
One of our main goal is to generate realistic network traffic from
undocummented protocols. Therefore, we have implemented a dedicated
module that, given vocabulary and grammar models previously infered, can
simulate a communication protocol between multiple bots and masters.
Besides their use of the same model, each actors is independent from the
others and is organized around three main stages.
The first stage is a dedicated library that reads and writes from the
network channel. It also parses the flow in messages according to
previous protocols layers. The second stage uses the vocabulary to
abstract received messages into symbols and vice-versa to specialize
emitted symbols into messages. A memory buffer is also available to
manage dependency relations. The last stage implements the grammar model
and computes which symbols must be emitted or received according to the
current state and time.
Smart fuzzing with Netzob
~~~~~~~~~~~~~~~~~~~~~~~~~
A typical example of dynamic vulnerability analysis is the robustness
tests. It can be used to reveal software programming errors which can
leads to software security vulnerabilities. These tests provide an
efficient and almost automated solution to easily identify and study
exposed surfaces of systems. Nevertheless, to be fully efficient, the
fuzzing approaches must cover the complete definition domain and
combination of all the variables which exist in a protocol (IP adresses,
serial numbers, size fields, payloads, message identifer, etc.). But
fuzzing typical communication interface requires too many test cases due
to the complex variation domains introduced by the semantic layer of a
protocol. In addition to this, an efficient fuzzing should also cover
the state machine of a protocol which also brings another huge set of
variations. The necessary time is nearly always too high and therefore
limits the efficiency of this approach.
With all these contraints, achieving robustness tests on a target is
feasible only if the expert has access to a specially designed tool for
the targeted protocol. Hence the emergence of a large number of tools to
verify the behavior of an application on one or more communication
protocols. However in the context of proprietary communications
protocols for which no specifications are published, fuzzers do not
provide optimal results.
Netzob helps the security evaluator by simplifying the creation of a
dedicated fuzzer for a proprietary or undocumented protocol. It provides
to the expert means to execute a semi-automated inferring process to create a
model of the targeted protocol. This model can afterward be refined by
the evaluator. Finally, the created model is included in the fuzzing
module of Netzob which considers the vocabulary and the grammar of the
protocol to generate optimized and specific test cases. Both mutation
and generation are available for fuzzing.
Export protocol model
~~~~~~~~~~~~~~~~~~~~~
The following export formats are currently provided by Netzob:
- XML format
- human readable (Wireshark like)
- Peach fuzzer export: this enables efficiency combination of Peach
Fuzzer on previously undocumented protocols.
Besides, you can write your own exporter to manipulate the inferred
protocol model in your favorite tool.