TL;DR: Evaluation results show that the performance of cryptographic operations in MicroTEE is better than it in Linux when the size of data is small, and the design avoids the compromise of the whole TEE OS if only one kernel service is vulnerable.
Abstract: ARM TrustZone technology is widely used to provide Trusted Execution Environments (TEE) for mobile devices. However, most TEE OSes are implemented as monolithic kernels. In such designs, device drivers, kernel services and kernel modules all run in the kernel, which results in large size of the kernel. It is difficult to guarantee that all components of the kernel have no security vulnerabilities in the monolithic kernel architecture, such as the integer overflow vulnerability in Qualcomm QSEE TrustZone and the TZDriver vulnerability in HUAWEI Hisilicon TEE architecture. This paper presents MicroTEE, a TEE OS based on the microkernel architecture. In MicroTEE, the microkernel provides strong isolation for TEE OS's basic services, such as crypto service and platform key management service. The kernel is only responsible for providing core services such as address space management, thread management, and inter-process communication. Other fundamental services, such as crypto service and platform key management service are implemented as applications at the user layer. Crypto Services and Key Management are used to provide Trusted Applications (TAs) with sensitive information encryption, data signing, and platform attestation functions. Our design avoids the compromise of the whole TEE OS if only one kernel service is vulnerable. A monitor has also been added to perform the switch between the secure world and the normal world. Finally, we implemented a MicroTEE prototype on the Freescale i.MX6Q Sabre Lite development board and tested its performance. Evaluation results show that the performance of cryptographic operations in MicroTEE is better than it in Linux when the size of data is small.
TL;DR: MicroTEE as mentioned in this paper is a TEE OS based on the microkernel architecture, which provides strong isolation for TEE's basic services, such as crypto service and platform key management service.
Abstract: ARM TrustZone technology is widely used to provide Trusted Execution Environments (TEE) for mobile devices. However, most TEE OSes are implemented as monolithic kernels. In such designs, device drivers, kernel services and kernel modules all run in the kernel, which results in large size of the kernel. It is difficult to guarantee that all components of the kernel have no security vulnerabilities in the monolithic kernel architecture, such as the integer overflow vulnerability in Qualcomm QSEE TrustZone and the TZDriver vulnerability in HUAWEI Hisilicon TEE architecture. This paper presents MicroTEE, a TEE OS based on the microkernel architecture. In MicroTEE, the microkernel provides strong isolation for TEE OS's basic services, such as crypto service and platform key management service. The kernel is only responsible for providing core services such as address space management, thread management, and inter-process communication. Other fundamental services, such as crypto service and platform key management service are implemented as applications at the user layer. Crypto Services and Key Management are used to provide Trusted Applications (TAs) with sensitive information encryption, data signing, and platform attestation functions. Our design avoids the compromise of the whole TEE OS if only one kernel service is vulnerable. A monitor has also been added to perform the switch between the secure world and the normal world. Finally, we implemented a MicroTEE prototype on the Freescale i.MX6Q Sabre Lite development board and tested its performance. Evaluation results show that the performance of cryptographic operations in MicroTEE is better than it in Linux when the size of data is small.
TL;DR: The proposed work relates to the development of Big-Sensor-Cloud Infrastructure (BSCI) that immensely enhances the usability and management of the physical sensor devices that are manufactured in a proprietary, vendor-specific design.
Abstract: The proposed work relates to the development of Big-Sensor-Cloud Infrastructure (BSCI) that enhances the usability and management of the physical sensor devices. Traditional sensor networks are manufactured in a proprietary, vendor-specific design, and cannot handle application switching dynamically at runtime due to the presence of monolithic kernel. Thus, the applications are inaccessible to the common people who do not own physical sensor devices. Recently, sensor-cloud infrastructure has been viewed as a substitute for traditional sensor networks. However, with the increasing growth in the velocity, variety, and variability of data, the management becomes a serious concern and difficulty. Thus, existing systems are not able to capture, analyze, and control the present data efficiently, in real-time. BSCI is a distributed framework for "Big" sensor-data storage, processing, virtualization, leveraging, and efficient remote management. The framework interfaces between the physical and cyber worlds, thereby acquiring real-time data from the physical WSNs into the cloud platform. This data are processed and delivered to the end-users as a simple service - Sensors-as-a-Service (Se-aaS). Multiple organizations with heterogeneous demand can be successfully served with Se-aaS. From a user-perspective, BSCI allows the naive users to envision the typical sensor devices as simple accessible services like electricity, and water.
TL;DR: It is demonstrated that due to their isolated software contexts, most virtualized applications consistently outperform their bare-metal counterparts when executing on 64-nodes of a multi-tenant, kernel-intensive cloud system.
Abstract: Isolation is a desirable property for applications executing in multi-tenant computing systems. On the performance side, hardware resource isolation via partitioning mechanisms is commonly applied to achieve QoS, a necessary property for many noise-sensitive parallel workloads. Conversely, on the software side, partitioning is used, usually in the form of virtual machines, to provide secure environments with smaller attack surfaces than those present in shared software stacks. In this paper, we identify a further benefit from isolation, one that is currently less appreciated in most parallel computing settings: isolation of system software stacks, including OS kernels, can lead to significant performance benefits through a reduction in variability. To highlight the existing problem in shared software stacks, we first developed a new systematic approach to measure and characterize latent sources of variability in the Linux kernel. Using this approach, we find that hardware VMs are effective substrates for limiting kernel-level interference that otherwise occurs in monolithic kernel systems. Furthermore, by enabling reductions in variability, we find that virtualized environments often have superior worst-case performance characteristics than native or containerized environments. Finally, we demonstrate that due to their isolated software contexts, most virtualized applications consistently outperform their bare-metal counterparts when executing on 64-nodes of a multi-tenant, kernel-intensive cloud system.
TL;DR: This work divides up kernel objects into areas of responsibility to introduce additional address spaces which will prevent information leakage, even in the case of a successful attack on the kernel.
Abstract: Monolithic kernel design mandates the use of a single address space for kernel data and code. While this design is easy to understand and performs well, it does not provide much in the way of protection from exploitable bugs in the interface. By dividing up kernel objects into areas of responsibility, we can introduce additional address spaces which will prevent information leakage, even in the case of a successful attack on the kernel. We are exploring several possible implementations with the goal of increasing security while minimizing the impact on performance.
TL;DR: It is reported that microkernel OSes surpass monolithic kernel OSes in terms of performance of distributed processing in a multicore environment.
Abstract: To effectively use a multicore processor, processes should be distributed to each core. Furthermore, circumstances under which the distribution effect does or does not appear must be clarified. Operating systems (OSes) have two types of architectures: microkernel and monolithic kernel. A microkernel OS implements minimal OS functions as the kernel and most other OS functions as the processes, which are referred to as OS servers. A monolithic kernel OS implements all OS functions as the kernel. These two types of OSes have different distribution forms for processing. This paper evaluates the basic performance of the microkernel OS; further, it compares distributed processing between microkernel and monolithic kernel OSes. We employed AnT as the microkernel OS and Linux as the monolithic kernel OS. AnT is a microkernel OS developed for a multicore environment and Linux is a widely used monolithic kernel OS. Finally, this paper reports that microkernel OSes surpass monolithic kernel OSes in terms of performance of distributed processing in a multicore environment.
TL;DR: The characteristics of the Azalea multi-kernel are described and its feasibility in terms of performance and scalability with existing Linux is demonstrated through the experiment of deploying a full-weight kernel and three unikernels in a single node.
Abstract: In a manycore environment, the monolithic kernel has problems in terms of scalability. To solve this problem, we propose an Azalea multi-kernel. The Azalea multi-kernel takes all the advantages of multi-kernel and unikernel. First, intensive I / O offloads to the full-weight kernel to improve the locality of the kernel service. Second, application performance is enhanced by removing kernel noise. In this paper, we describe the characteristics of the Azalea multi-kernel and demonstrate its feasibility in terms of performance and scalability with existing Linux through the experiment of deploying a full-weight kernel(Linux) and three unikernels(azalea unikernel) in a single node.
TL;DR: The prototype of LeMo, the novel architecture to hold the module with least privilege in the kernel, is implemented and it is shown that LeMo is able to defeat the malicious module with a acceptable performance overhead.
Abstract: The Linux kernel is the monolithic kernel, and the boundaries among the objects in the kernel are not particularly clear. Once the malicious module is loaded in the kernel, it can almost access the entire kernel. This breaks the principle of least privilege. To overcome it, in this paper, we propose LeMo which is the novel architecture to hold the module with least privilege in the kernel. In LeMo, the modules are restricted to access the necessary kernel objects. To the end, before the module is loaded in the kernel, the patched kernel build a new page table for the module. With page-based access control, the patched kernel is capable of preventing the malicious modules to arbitrarily access the kernel. We have implemented the prototype of LeMo which provide the tools which load or unload the module. Our evaluation show that LeMo is able to defeat the malicious module with a acceptable performance overhead.
TL;DR: A method to auto-generate verification code from an automaton, which can be integrated into a module and dynamically added into the kernel for efficient on-the-fly verification of the system, using in-kernel tracing features is presented.
Abstract: Formal verification of the Linux kernel has been receiving increasing attention in recent years, with the development of many models, from memory subsystems to the synchronization primitives of the real-time kernel. The effort in developing formal verification methods is justified considering the large code-base, the complexity in synchronization required in a monolithic kernel and the support for multiple architectures, along with the usage of Linux on critical systems, from high-frequency trading to self-driven cars. Despite recent developments in the area, none of the proposed approaches are suitable and flexible enough to be applied in an efficient way to a running kernel. Aiming to fill such a gap, this paper proposes a formal verification approach for the Linux kernel, based on automata models. It presents a method to auto-generate verification code from an automaton, which can be integrated into a module and dynamically added into the kernel for efficient on-the-fly verification of the system, using in-kernel tracing features. Finally, a set of experiments demonstrate verification of three models, along with performance analysis of the impact of the verification, in terms of latency and throughput of the system, showing the efficiency of the approach.
TL;DR: A new OS structure, in which a lightweight virtual executor called transkernel offloads specific phases from a monolithic kernel, that translates stateful kernel execution through cross-ISA, dynamic binary translation (DBT) and can enable efficiency gain, even on off-the-shelf hardware.
Abstract: Smart devices see a large number of ephemeral tasks driven by background activities. In order to execute such a task, the OS kernel wakes up the platform beforehand and puts it back to sleep afterwards. In doing so, the kernel operates various IO devices and orchestrates their power state transitions. Such kernel executions are inefficient as they mismatch typical CPU hardware. They are better off running on a low-power, microcontroller-like core, i.e., peripheral core, relieving CPU from the inefficiency.
We therefore present a new OS structure, in which a lightweight virtual executor called transkernel offloads specific phases from a monolithic kernel. The transkernel translates stateful kernel execution through cross-ISA, dynamic binary translation (DBT); it emulates a small set of stateless kernel services behind a narrow, stable binary interface; it specializes for hot paths; it exploits ISA similarities for lowering DBT cost.
Through an ARM-based prototype, we demonstrate transkernel's feasibility and benefit. We show that while cross-ISA DBT is typically used under the assumption of efficiency loss, it can enable efficiency gain, even on off-the-shelf hardware.
TL;DR: A hardware-assisted OS primitive, XPC (Cross Process Call), for fast and secure synchronous IPC, which is compatible with the traditional address space based isolation mechanism and can be easily integrated into existing microkernels and monolithic kernels.
Abstract: Microkernel has many intriguing features like security, fault-tolerance, modularity and customizability, which recently stimulate a resurgent interest in both academia and industry (including seL4, QNX and Google's Fuchsia OS). However, IPC (inter-process communication), which is known as the Achilles' Heel of microkernels, is still the major factor for the overall (poor) OS performance. Besides, IPC also plays a vital role in monolithic kernels like Android Linux, as mobile applications frequently communicate with plenty of user-level services through IPC. Previous software optimizations of IPC usually cannot bypass the kernel which is responsible for domain switching and message copying/remapping; hardware solutions like tagged memory or capability replace page tables for isolation, but usually require non-trivial modification to existing software stack to adapt the new hardware primitives. In this paper, we propose a hardware-assisted OS primitive, XPC (Cross Process Call), for fast and secure synchronous IPC. XPC enables direct switch between IPC caller and callee without trapping into the kernel, and supports message passing across multiple processes through the invocation chain without copying. The primitive is compatible with the traditional address space based isolation mechanism and can be easily integrated into existing microkernels and monolithic kernels. We have implemented a prototype of XPC based on a Rocket RISC-V core with FPGA boards and ported two microkernel implementations, seL4 and Zircon, and one monolithic kernel implementation, Android Binder, for evaluation. We also implement XPC on GEM5 simulator to validate the generality. The result shows that XPC can reduce IPC call latency from 664 to 21 cycles, up to 54.2x improvement on Android Binder, and improve the performance of real-world applications on microkernels by 1.6x on Sqlite3 and 10x on an HTTP server with minimal hardware resource cost.