TL;DR: This paper implemented and evaluated ProDecoder, a network trace based protocol message format inference system that exploits the semantics of protocol messages without the executable code of application protocols to infer message format specifications of SMB and SMTP.
Abstract: Extracting the protocol message format specifications of unknown applications from network traces is important for a variety of applications such as application protocol parsing, vulnerability discovery, and system integration. In this paper, we propose ProDecoder, a network trace based protocol message format inference system that exploits the semantics of protocol messages without the executable code of application protocols. ProDecoder is based on the key insight that the n-grams of protocol traces exhibit highly skewed frequency distribution that can be leveraged for accurate protocol message format inference. In ProDecoder, we first discover the latent relationship among n-grams by first grouping protocol messages with the same semantics and then inferring message formats by keyword based clustering and cluster sequence alignment. We implemented and evaluated ProDecoder to infer message format specifications of SMB (a binary protocol) and SMTP (a textual protocol). Our experimental results show that ProDecoder accurately parses and infers SMB protocol with 100% precision and recall. For SMTP, ProDecoder achieves approximately 95% precision and recall.
TL;DR: A novel approach called PRE-Bin is proposed that automatically extracts binary-type fields of binary protocols based on fine-grained bits and outperforms the existing algorithms.
Abstract: Protocol message format extraction is a principal process of automatic network protocol reverse engineering when target protocol specifications are not available. However, binary protocol reverse engineering has been a new challenge in recent years for approaches that traditionally have dealt with text-based protocols rather than binary protocols. In this study, the authors propose a novel approach called PRE-Bin that automatically extracts binary-type fields of binary protocols based on fine-grained bits. First, a silhouette coefficient is introduced into the hierarchical clustering to confirm the optimal clustering number of binary frames. Second, a modified multiple sequence alignment algorithm, in which the matching process and back-tracing rules are redesigned, is also proposed to analyse binary field features. Finally, a Bayes decision model is invoked to describe field features and determine bit-oriented field boundaries. The maximum a posteriori criterion is leveraged to complete an optimal protocol format estimation of binary field boundaries. The authors implemented a prototype system of PRE-Bin to infer the specification of binary protocols from actual traffic traces. Experimental results indicate that PRE-Bin effectively extracts binary fields and outperforms the existing algorithms.
TL;DR: A novel protocol reverse engineering method to extract an intuitive and clear protocol specification by using a contiguous sequential pattern algorithm three times hierarchically and defining four types of the field formats.
Abstract: As the amount of Internet traffic increases due to newly emerging applications and their malicious behaviors, the amount of traffic that must be analyzed is rapidly increasing. Many protocols that occur under these situations are unknown and undocumented. For efficient network management and security, a deep understanding of these protocols is required. Although many protocols reverse engineering methods have been introduced in the literature, there is still no single standardized method to completely extract a protocol specification, and each of the existing methods has some limitations. In this paper, we propose a novel protocol reverse engineering method to extract an intuitive and clear protocol specification. The proposed method extracts field formats, message formats, and flow formats as protocol syntax by using a contiguous sequential pattern algorithm three times hierarchically and defining four types of the field formats. Moreover, the proposed methods can extracts protocol semantics and a protocol finite state machine. The proposed method sufficiently compresses input messages into a small number of message formats in order to easily identify the intuitive structure of an unknown protocol. We implemented our method in a prototype system and evaluated the method to infer message formats of HTTP (a text protocol) and DNS (a binary protocol). The experimental results show that the proposed method infers HTTP with 100% correctness and 99% coverage. For DNS, the proposed method achieves 100% correctness and coverage.
TL;DR: The proposed protocol is based on dSIP, a SIP-based protocol proposed by other authors as generic framework for a distributed SIP Location Service, due to implementation simplicity, possibility of reuse of already available SIP stack implementations, easy integration into existing UAs, minimization of the number of required protocols for a P2P UA, and widespread support for the SIP standard.
Abstract: This draft describes a Kademlia-based protocol for Resource Lookup in
P2PSIP. The proposed protocol is based on dSIP, a SIP-based protocol
proposed by other authors as generic framework for a distributed SIP
Location Service. Although the dSIP authors have obsoleted the draft
by a newer approach based on a binary protocol named RELOAD, we are
still considering this SIP-based approach, due to implementation
simplicity, possibility of reuse of already available SIP stack
implementations, easy integration into existing UAs, minimization of
the number of required protocols for a P2P UA, and widespread support
for (and relative maturity of) the SIP standard.
TL;DR: In this paper, a pulse position modulation protocol is provided in which the position of a single pulse, such as an infrared pulse is located in time in one of three or more locations.
Abstract: Methods and apparatus for use in communication between a remote control and a receiving unit provide power saving modes of operation and enable remote control use of high data generating devices, such as a trackball. A pulse position modulation protocol is provided in which the position of a single pulse, such as an infrared pulse is located in time in one of three or more locations. In the preferred embodiment, the pulse is provided at least in one of eight states, and more preferably in one of sixteen states. In the latter arrangement, a single bit provides for a hex data output by a single pulse. In this way, a single pulse may substitute for what otherwise would have been multiple pulses in a binary protocol. Power savings may be achieved through this method. A time base compensation method is provided in which the remote control provides two detectable events separated by a predetermined time as measured by the remote control time base or clock. The receiving unit measures the time difference between the two events as determined by the receiver's time base or clock. A correction factor is then applied to subsequent detections of time differences between events as sent by the remote control. In the preferred embodiment, a multiplicative factor is applied. Dual protocol remote control devices may be provided wherein a first protocol is utilized in conjunction with a second protocol comprising the pulse position modulated system, such as the hex based system. In this manner, a binary protocol may be utilized for lower channel or lower data transfer arrangements and the pulse position modulation protocol may be utilized for relatively high data generating devices, such as a trackball. The protocol is advantageously utilized with high key identification numbers, such as those above 255.