CCSDS_study project

2026-05-05 21:54:35 +08:00
commit 9be41f9270
585 changed files with 91275 additions and 0 deletions
--- a/netzob-030/doc/documentation/source/overview/index.rst
+++ b/netzob-030/doc/documentation/source/overview/index.rst
@@ -0,0 +1,300 @@
+.. currentmodule:: netzob
+
+.. _overview:
+
+
+Overview of Netzob
+==================
+
+Netzob has been initiated by security auditors of
+`AMOSSYS <http://www.amossys.fr>`_ and the `CIDre research team of
+Supélec <http://www.rennes.supelec.fr/ren/rd/cidre/>`_ to address the
+reverse engineering of communication protocols.
+
+Originaly, the development of Netzob has been initiated to support
+security auditors and evaluators in their activities of modeling and
+simulating undocumented protocols. The tool has then been extended to
+allow smart fuzzing of unknown protocol.
+
+The following picture depicts the main modules of Netzob:
+
+.. figure:: http://www.netzob.org/img/overview_archi.png
+   :align: center
+   :alt: Architecture of Netzob
+
+   Architecture of Netzob
+
+-  **Import module:** Data import is available in two ways: either by
+   leveraging the channel-specific captors (currently network and IPC --
+   Inter-Process Communication), or by using specific importers (such as
+   PCAP files, structured files and OSpy files).
+-  **Protocol inference modules:** The vocabulary and grammar inference
+   methods constitute the core of Netzob. It provides both passive and
+   active reverse engineering of communication flows through automated
+   and manuals mechanisms.
+-  **Simulation module:** Given vocabulary and grammar models previously
+   inferred, Netzob can understand and generate communication traffic
+   between multiple actors. It can act as either a client, a server or
+   both.
+-  **Export module:** This module permits to export an inferred model of
+   a protocol in formats that are understandable by third party software
+   or by a human. Current work focuses on export format compatible with
+   main traffic dissectors (Wireshark and Scapy) and fuzzers (Peach and
+   Sulley).
+
+And here is a screenshot of the main graphical interface:
+
+.. figure:: https://dev.netzob.org/attachments/96/netzob_UI.png
+   :align: center
+   :alt: 
+
+The following sections will describe in more details the available
+mechanisms.
+
+
+Import and capture data
+~~~~~~~~~~~~~~~~~~~~~~~
+
+The first step in the inferring process of a protocol in Netzob is to
+capture and to import messages as samples. There are different methods
+to retrieve messages depending of the communication channel used (files,
+network, IPC, USB, etc.) and the format (PCAP, hex, raw binary flows,
+etc.).
+
+The figure below describes the multiple communication channels and
+therefore possible sniffing point's Netzob aims at addressing.
+
+.. figure:: http://www.netzob.org/img/overview_multipleFlows.png
+   :align: center
+   :alt: Multiple communication flows arround an application
+
+   Multiple communication flows arround an application
+
+The current version (version 0.4) of Netzob deals with the following
+data sources :
+
+-  **Live network communications**
+-  **Captured network communications** (PCAPs)
+-  **Inter-Process Communications** (IPCs)
+-  **Text and binary files**
+-  **API flows** through `oSpy <http://code.google.com/p/ospy/>`_ file
+   format support
+
+Otherwise, if you plan to reverse a protocol implemented over an
+supported communication channel, Netzob's can manipulates any
+communications flow through an XML representation. Therefore, this
+situation only requires a specific development to capture the targeted
+flow and to save it using a compatible XML.
+
+.. figure:: http://www.netzob.org/img/overview_extraImport.png
+   :align: center
+   :width: 800 px
+   :alt: Importing data from an unknown communication channel using the XML definition
+
+   Importing data from an unknown communication channel using the XML
+   definition
+
+
+Inferring message format and state machine with Netzob
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The vocabulary of a communication protocol defines all the words which
+are integrated in it. For example, the vocabulary of a malware's
+communication protocol looks like a set of possible commands : {"attack
+`www.google.fr <http://www.google.fr>`_", "dnspoison
+this.dns.server.com", "execute 'uname -a'", ...}. Another example of a
+vocabulary is the set of valids words in the HTTP protocol : { "GET
+/images/logo.png HTTP/1.1 ...", "HTTP/1.1 200 OK ...", ...}.
+
+Netzob's vocabulary inferring process has been designed in order to
+retrieve the set of all possible words used in a targeted protocol and
+to identify their structures. Indeed words are made of different fields
+which are defined by their value and types. Hence a word can be
+described using the structure of its fields.
+
+We describe the learning process implemented in Netzob to
+semi-automatically infer the vocabulary and the grammar of a protocol.
+This process, illustrated in the following picture, is performed in
+three main steps:
+
+#. **Clustering messages and partitioning these messages in fields.**
+#. **Characterizing message fields and abstracting similar messages in
+   symbols.**
+#. **Inferring the transition graph of the protocol.**
+
+.. figure:: http://www.netzob.org/img/overview_inferenceSteps.png
+   :align: center
+   :width: 800 px
+   :alt: The main functionalities
+
+   The main functionalities
+
+
+Step 1: clustering Messages and Partitioning in Fields
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To discover the format of a symbol, Netzob supports different
+partitioning approaches. In this article we describe the most accurate
+one, that leverages sequence alignment processes. This technique permits
+to align invariants in a set of messages. The `Needleman-Wunsh
+algorithm <http://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm>`_
+performs this task optimally. Needleman-Wunsh is particularly effective
+on protocols where dynamic fields have variable lengths (as shown on the
+following picture).
+
+.. figure:: http://www.netzob.org/img/overview_needleman.png
+   :align: center
+   :alt: Sequence alignment with Needleman-Wunsh algorithm
+
+   Sequence alignment with Needleman-Wunsh algorithm
+
+When partitioning and clustering processes are done, we obtain a
+relevant first approximation of the overall message formats. The next
+step consists in determining the characteristics of the fields.
+
+If the size of those fields is fixed, as in TCP and IP headers, it is
+preferable to apply a basic partitioning, also provided by Netzob. Such
+partitioning works by aligning each message by the left, then
+separating successive fixed columns from successive dynamic columns.
+
+To regroup aligned messages by similarity, the Needleman-Wunsh algorithm
+is used in conjunction with a clustering algorithm. The applied
+algorithm is `UPGMA <http://en.wikipedia.org/wiki/UPGMA>`_.
+
+
+Step 2 : characterization of Fields
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The field type identification partially derives from the partitioning
+inference step. For fields containing only invariants, the type merely
+corresponds to the invariant value. For other fields, the type is
+automatically materialized, in first approximation, with a regular
+expression, as shown on next figure. This form enables easy validation of
+the data compliance with a specific type. Moreover, Netzob offers the
+possibility to visualize the definition domain of a field. This helps to
+manually refine the type associated with a field.
+
+.. figure:: http://www.netzob.org/img/overview_fieldType.png
+   :align: center
+   :alt: Characterization of field type
+
+   Characterization of field type
+
+Some intra-symbol dependencies are automatically identified. The size
+field, present in many protocol formats, is an example of intra-symbol
+dependency. A search algorithm has been designed to look for potential
+size fields and their associated payloads. By extension, this technique
+permits to discover encapsulated protocol payloads.
+
+Environmental dependencies are also identified by looking for specific
+values retrieved during message capture. Such specific values consist of
+characteristics of the underlying hardware, operating system and network
+configuration. During the dependency analysis, these characteristics are
+searched in various encoding.
+
+
+Step 3: inferring the Transition Graph of the Protocol
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The third step of the learning process discovers and extracts the
+transition graph from a targeted protocol (also called the grammar).
+More formally, the grammar of a communication protocol defines the set
+of valid sentences which can be produced by a communication. A sentence
+is a sorted set of words which may be received or emmited by a protocol
+handler. An exemple of a simple sentence is :
+
+::
+
+    ["attack www.google.fr", "attack has failed", "attack www.kernel.org", "root access granted."]
+
+which can be described using the following simple automata with S0 the
+initial state :
+
+.. figure:: http://www.netzob.org/img/overview_exampleSimpleGrammar.png
+   :align: center
+   :alt: Schema of a simple grammar
+
+   Schema of a simple grammar
+
+The learning process step is achieved by a set of active experiments
+that stimulate a real client or server implementation using successive
+sequences of input symbols and analyze its responses.
+
+In Netzob, the automata used to represent or model a communication
+protocol is an extended version of a Mealy automata which includes
+semi-stochastic transitions, contextualized and parametrized inputs and
+outputs. The first academic presention of this model is included in a
+dedicated scientific paper provided in the documentation section.
+
+The model is inferred through a dedicated **active** process which
+consists in stimulating an implementation and to analyze its responses.
+In this process, we use the previously infered vocabulary to discover
+and to learn the grammar of the communication protocol. Each stimulation
+is computed following an extension of the **Angluin L** algorithm\*.
+
+Protocol simulation
+~~~~~~~~~~~~~~~~~~~
+
+One of our main goal is to generate realistic network traffic from
+undocummented protocols. Therefore, we have implemented a dedicated
+module that, given vocabulary and grammar models previously infered, can
+simulate a communication protocol between multiple bots and masters.
+Besides their use of the same model, each actors is independent from the
+others and is organized around three main stages.
+
+The first stage is a dedicated library that reads and writes from the
+network channel. It also parses the flow in messages according to
+previous protocols layers. The second stage uses the vocabulary to
+abstract received messages into symbols and vice-versa to specialize
+emitted symbols into messages. A memory buffer is also available to
+manage dependency relations. The last stage implements the grammar model
+and computes which symbols must be emitted or received according to the
+current state and time.
+
+Smart fuzzing with Netzob
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A typical example of dynamic vulnerability analysis is the robustness
+tests. It can be used to reveal software programming errors which can
+leads to software security vulnerabilities. These tests provide an
+efficient and almost automated solution to easily identify and study
+exposed surfaces of systems. Nevertheless, to be fully efficient, the
+fuzzing approaches must cover the complete definition domain and
+combination of all the variables which exist in a protocol (IP adresses,
+serial numbers, size fields, payloads, message identifer, etc.). But
+fuzzing typical communication interface requires too many test cases due
+to the complex variation domains introduced by the semantic layer of a
+protocol. In addition to this, an efficient fuzzing should also cover
+the state machine of a protocol which also brings another huge set of
+variations. The necessary time is nearly always too high and therefore
+limits the efficiency of this approach.
+
+With all these contraints, achieving robustness tests on a target is
+feasible only if the expert has access to a specially designed tool for
+the targeted protocol. Hence the emergence of a large number of tools to
+verify the behavior of an application on one or more communication
+protocols. However in the context of proprietary communications
+protocols for which no specifications are published, fuzzers do not
+provide optimal results.
+
+Netzob helps the security evaluator by simplifying the creation of a
+dedicated fuzzer for a proprietary or undocumented protocol. It provides
+to the expert means to execute a semi-automated inferring process to create a
+model of the targeted protocol. This model can afterward be refined by
+the evaluator. Finally, the created model is included in the fuzzing
+module of Netzob which considers the vocabulary and the grammar of the
+protocol to generate optimized and specific test cases. Both mutation
+and generation are available for fuzzing.
+
+Export protocol model
+~~~~~~~~~~~~~~~~~~~~~
+
+The following export formats are currently provided by Netzob:
+
+-  XML format
+-  human readable (Wireshark like)
+-  Peach fuzzer export: this enables efficiency combination of Peach
+   Fuzzer on previously undocumented protocols.
+
+Besides, you can write your own exporter to manipulate the inferred
+protocol model in your favorite tool.