CCSDS_study project

2026-05-05 21:54:35 +08:00
commit 9be41f9270
585 changed files with 91275 additions and 0 deletions
--- a/netzob-030/doc/documentation/source/user_guide/import/index.rst
+++ b/netzob-030/doc/documentation/source/user_guide/import/index.rst
@@ -0,0 +1,72 @@
+.. currentmodule:: netzob
+
+.. _import:
+
+Importing Data
+==============
+
+Communication protocols can be found is every parts of a system, as shown on the following picture:
+
+.. image:: netzob_comprot.png
+    :width: 750px
+    :alt: Payload extraction
+
+Netzob can handle multiple kinds of input data. Hence, you can analyze network traffic, IPC communications, files structures, etc.
+
+Import can either be done by using a dedicated captor or by providing already captured messages in a specific format.
+
+Current accepted formats are:
+
+* PCAP files
+* Structured files
+* Netzob XML files (used by Netzob for its internal representation of messages)
+
+Current supported captors are:
+
+* Network captor, based on the XXX library 
+* Intra Process communication captor (API calls), based on API hooking
+* Inter Process Communication captor (pipes, shared memory and local sockets), based on system call hooking 
+
+Imported messages are manipulated by Netzob through specific Python
+objects which contains metadata that describes contextual parameters
+(timestamp or even IP source/destination for example). All the Python
+object that describe messages derived from an abstract object :
+AbstractMessage.
+
+The next part of this section details the composition of each message
+object.
+
+AbstractMessage
+---------------
+All the messages inherits from this definition and therefore has the following parameters :
+
+* a unique ID
+* a data field represented with an array of hex
+
+NetworkMessage
+--------------
+A network message is defined with the following parameters :
+
+* a timestamp
+* the ip source
+* the ip target
+* the protocol (TCP/UDP/ICMP...)
+* the layer 4 source port
+* the layer 4 target port
+
+
+Definition of a NetworkMessage :
+
+ 
+FileMessage
+--------------
+A file message is defined with the following parameters :
+
+* a filename
+* the line number in the file
+* the creation date of the file
+* the last modification date of the file
+* the owner of the file
+* the size of the file
+
+
--- a/netzob-030/doc/documentation/source/user_guide/import/netzob_comprot.png
+++ b/netzob-030/doc/documentation/source/user_guide/import/netzob_comprot.png
--- a/netzob-030/doc/documentation/source/user_guide/inference/ExampleOfAligning.png
+++ b/netzob-030/doc/documentation/source/user_guide/inference/ExampleOfAligning.png
--- a/netzob-030/doc/documentation/source/user_guide/inference/ExampleOfMultipleAlignment.png
+++ b/netzob-030/doc/documentation/source/user_guide/inference/ExampleOfMultipleAlignment.png
--- a/netzob-030/doc/documentation/source/user_guide/inference/grammar.rst
+++ b/netzob-030/doc/documentation/source/user_guide/inference/grammar.rst
@@ -0,0 +1,12 @@
+.. currentmodule:: netzob
+
+.. _grammar:
+
+Grammar inference
+#################
+
+Identification of the automata of the protocol
+**********************************************
+
+Fields dependencies with messages of previous states
+****************************************************
--- a/netzob-030/doc/documentation/source/user_guide/inference/index.rst
+++ b/netzob-030/doc/documentation/source/user_guide/inference/index.rst
@@ -0,0 +1,73 @@
+.. currentmodule:: netzob
+
+.. _inference:
+
+Protocol inference
+==================
+
+Definition of a communication protocol
+--------------------------------------
+
+A communication protocol is as language. A language is defined
+through~:
+
+* its vocabulary (the set of valid words or, in our context, the set
+  of valid messages) ;
+* its grammar (the set of valid sentences which, in our context, can
+  be represented as a protocol state machine, like the TCP state
+  machine).
+
+A word of the vocabular is called a symbol. A symbol represents an
+abstract view of a set of similar messages. Similar messages refer to
+messages having the same semantic (for example, a TCP SYN message, a
+SMTP HELLO message, an ICMP ECHO REQUEST message, etc.).
+
+A symbol is structured following a format, which specifies a sequence
+of fields (like the IP format). A field can be splitted into
+sub-fields. For example, a payload is a field of a TCP
+message. Therefore, by defining a layer as a kind of payload (which is
+a specific field), we can retrieve the so-called Ethernet, IP, TCP and
+HTTP layers from a raw packet ; each layer having its own vocabular
+and grammar.
+
+Field's size can be fixed or variable.
+Field's content can be static of dynamic.
+Field's content can be basic (a 32 bits integer) or complex (an array).
+A field has four attributes~:
+
+* the type defines its definition domain or set of valid values (16 bits integer, string, etc.) ;
+* the data description defines the structuration of the field (ASN.1, TSN.1, EBML, etc.) ;
+* the data encoding defines ... (ASCII, little endian, big endian, XML, EBML, DER, XER, PER, etc.) ;
+* the semantic defines ... (IP address, port number, URL, email, checksum, etc.).
+
+Field's content can be~:
+
+* static ;
+* dependant of another field (or a set of fields) of the same message (intra-message dependency) ;
+* dependant of a field (or a set of fields) of a previous message in the grammar (inter-message dependency) ;
+* dependant of the environment ;
+* dependant of the application behaviour (which could depend on the user behaviour) ;
+* random (the initial value of the TCP sequence number for example).
+
+Modelization in Netzob
+----------------------
+
+Netzob provides a framework for the semi-automated modelization (inference) of communication protocols, i.e. inferring its vocabular and grammar.
+
+* Vocabular inference
+   * Message structure inference (based on sequence alignment)
+   * Regoupment of similar message structures
+   * Field type inference
+   * Field dependencies from the same message and from the environment
+   * Field semantic inference
+* Grammar inference
+   * Identification of the automata of the protocol
+   * Fields dependencies with messages of previous states
+
+All the functionalities of the framework are detailled in this chapter.
+
+.. toctree::
+   :maxdepth: 2
+
+   vocabular
+   grammar
--- a/netzob-030/doc/documentation/source/user_guide/inference/message_abstraction.png
+++ b/netzob-030/doc/documentation/source/user_guide/inference/message_abstraction.png
--- a/netzob-030/doc/documentation/source/user_guide/inference/payload_extraction.png
+++ b/netzob-030/doc/documentation/source/user_guide/inference/payload_extraction.png
--- a/netzob-030/doc/documentation/source/user_guide/inference/vocabular.rst
+++ b/netzob-030/doc/documentation/source/user_guide/inference/vocabular.rst
@@ -0,0 +1,182 @@
+.. currentmodule:: netzob
+
+.. _vocabular:
+
+Vocabular inference
+###################
+
+Structure inference
+*******************
+
+Regoupment of similar structures
+********************************
+
+Options during alignment process
+================================
+
+* "read-only” process (do not require a participation in the
+  communication).
+* Identify the fixed and dynamic fields of all the messages.
+* Regroups equivalent messages depending of their field structures.
+
+
+* Clustering (Regroups equivalent messages using) :
+	* an UPGMA Algorithm to regroup similar messages
+	* an openMP and MPI implementation 
+
+* Sequencing, Alignment (Identification of fields in messages) :
+	* Needleman & Wunsch Implementation 
+
+
+Needleman and Wunsch algorithm
+==============================
+
+* Originaly a bio-informatic algorithm (sequencing DNA)
+* Align two messages and identify common patterns and field structure
+* Computes an alignment score representing the efficiency of the
+  alignment
+
+The following picture shows the sequence alignment of two messages.
+
+.. image:: ExampleOfAligning.png
+    :alt: Example of sequence alignment
+
+UPGMA algorithm
+===============
+
+* Identify equivalent messages based on their alignment score.
+* Build a hierarchical organization of the messages with the UPGMA
+  algorithm (Unweighted Pair Group Method with Arithmetic Mean)
+
+The following picture shows a regroupment of similar messages based on the result of the clustering process.
+
+.. image:: ExampleOfMultipleAlignment.png
+    :alt: Example of clustering
+
+Abstraction of a set of message
+===============================
+
+The abstraction is the process of substituting the dynamic fields with their representation as a regex. An example of abstraction is shown on the follinw picture.
+
+.. image:: message_abstraction.png
+    :alt: Example of message abstraction
+
+Analyses after alignment process
+================================
+aaa
+
+Message contextual menu
+=======================
+aaa
+
+Group contextual menu
+=====================
+aaa
+
+Refine regexes
+==============
+aaa
+
+Slick regexes
+=============
+aaa
+
+Concatenate
+===========
+aaa
+
+Split column
+============
+aaa
+
+Merge columns
+=============
+aaa
+
+Delete message
+==============
+aaa
+
+Field type inference
+********************
+
+Visualization options
+=====================
+aaa
+
+Type structure contextual menu
+==============================
+aaa
+
+Messages distribution
+=====================
+
+This function shows a graphical representation of the distribution of bytes per offset for each message of the current group. This function helps to identify entropy variation of each fields. Entropy variation combined with byte distribution help the user to infer the field type.
+
+[INCLUDE GRAPH]
+
+Data typing
+===========
+
+* Primary types : binary, ascii, num, base64...
+	* Definition domain, unique elements and intervals
+	* Data carving (tar gz, png, jpg, ...)
+	* Semantic data identification (emails, IP ...)
+
+Domain of definition
+====================
+aaa
+
+Change type representation
+==========================
+aaa
+
+Field dependencies from the same message and from the environment
+*****************************************************************
+
+Fields dependancies identification
+==================================
+
+* Length fields and associated payloads
+* Encapsulated messages identifications
+
+And from the environment...
+
+Payload extraction
+==================
+
+The function "Find Size Fields", as its name suggests, is dedicated to find fields that contain any length value as well as the associated payload. It does this on each group. Netzob supports different encoding of the size field : big and little endian binary values are supported through size of 1, 2 and 4 bytes. The algorithm used to find the size fields and their associated payloads is desribed in the table XXX.
+
+[INCLUDE ALGORITHM]
+
+The following picture represents the application of the function on a trace example. It shows the automated extraction of the IP and UDP payloads from an Ethernet frame.
+
+.. image:: payload_extraction.png
+    :alt: Payload extraction
+
+Field semantic inference
+************************
+
+Data carving
+============
+
+Data carving is the process of extracting semantic information from fields or messages. Netzob can extract the following semantic information :
+
+* URL
+* email
+* IP address
+
+[INCLUDE FIGURE]
+
+Search
+======
+aaa
+
+
+
+
+
+
+Properties
+==========
+aaa