TL;DR: Lightweight Authentication for Secure Automotive Networks (LASAN) as discussed by the authors is a full lifecycle authentication approach for automotive networks, which can be used in all aspects of automotive product lifecycle, including manufacturing, vehicle maintenance, and software updates.
Abstract: With the increasing amount of interconnections between vehicles, the attack surface of internal vehicle networks is rising steeply. Although these networks are shielded against external attacks, they often do not have any internal security to protect against malicious components or adversaries who can breach the network perimeter. To secure the in-vehicle network, all communicating components must be authenticated, and only authorized components should be allowed to send and receive messages. This is achieved through the use of an authentication framework. Cryptography is widely used to authenticate communicating parties and provide secure communication channels (e.g., Internet communication). However, the real-time performance requirements of in-vehicle networks restrict the types of cryptographic algorithms and protocols that may be used. In particular, asymmetric cryptography is computationally infeasible during vehicle operation.In this work, we address the challenges of designing authentication protocols for automotive systems. We present Lightweight Authentication for Secure Automotive Networks (LASAN), a full lifecycle authentication approach. We describe the core LASAN protocols and show how they protect the internal vehicle network while complying with the real-time constraints and low computational resources of this domain. By leveraging the fixed structure of automotive networks, we minimize bandwidth and computation requirements. Unlike previous work, we also explain how this framework can be integrated into all aspects of the automotive product lifecycle, including manufacturing, vehicle maintenance, and software updates. We evaluate LASAN in two different ways: First, we analyze the security properties of the protocols using established protocol verification techniques based on formal methods. Second, we evaluate the timing requirements of LASAN and compare these to other frameworks using a new highly modular discrete event simulator for in-vehicle networks, which we have developed for this evaluation.
TL;DR: Measurement-Based Probabilistic Timing Analysis using the Coefficient of Variation (MBPTA-CV), a new mixture-distribution aware, WCET-suited MBPTA method that builds on recent EVT developments in other fields to automatically select the distribution parameters that best fit the maxima of the observed execution times.
Abstract: Extreme Value Theory (EVT) has been historically used in domains such as finance and hydrology to model worst-case events (e.g., major stock market incidences). EVT takes as input a sample of the distribution of the variable to model and fits the tail of that sample to either the Generalised Extreme Value (GEV) or the Generalised Pareto Distribution (GPD). Recently, EVT has become popular in real-time systems to derive worst-case execution time (WCET) estimates of programs. However, the application of EVT is not straightforward and requires a detailed analysis of, and customisation for, the particular problem at hand. In this article, we tailor the application of EVT to timing analysis. To that end, (1) we analyse the response time of different hardware resources (e.g., cache memories) and identify those that may lead to radically different types of execution time distributions. (2) We show that one of these distributions, known as mixture distribution, causes problems in the use of EVT. In particular, mixture distributions challenge not only properly selecting GEV/GPD parameters (i.e., location, scale and shape) but also determining the size of the sample to ensure that enough tail values are passed to EVT and that only tail values are used by EVT to fit GEV/GPD. Failing to select these parameters has a negative impact on the quality of the derived WCET estimates. We tackle these problems, by (3) proposing Measurement-Based Probabilistic Timing Analysis using the Coefficient of Variation (MBPTA-CV), a new mixture-distribution aware, WCET-suited MBPTA method that builds on recent EVT developments in other fields (e.g., finance) to automatically select the distribution parameters that best fit the maxima of the observed execution times. Our results on a simulation environment and a real board show that MBPTA-CV produces high-quality WCET estimates.
TL;DR: A novel radio-frequency identification (RFID)-based system suitable for counterfeit detection, traceability, and authentication in the IoT supply chain called CDTA is developed, composed of different types of on-chip sensors and in-system structures that collect necessary information to detect multiple counterfeit IC types, track and trace IoT devices, and verify the overall system authenticity.
Abstract: The Internet of Things (IoT) is transforming the way we live and work by increasing the connectedness of people and things on a scale that was once unimaginable. However, the vulnerabilities in the IoT supply chain have raised serious concerns about the security and trustworthiness of IoT devices and components within them. Testing for device provenance, detection of counterfeit integrated circuits (ICs) and systems, and traceability of IoT devices are challenging issues to address. In this article, we develop a novel radio-frequency identification (RFID)-based system suitable for counterfeit detection, traceability, and authentication in the IoT supply chain called CDTA. CDTA is composed of different types of on-chip sensors and in-system structures that collect necessary information to detect multiple counterfeit IC types (recycled, cloned, etc.), track and trace IoT devices, and verify the overall system authenticity. Central to CDTA is an RFID tag employed as storage and a channel to read the information from different types of chips on the printed circuit board (PCB) in both power-on and power-off scenarios. CDTA sensor data can also be sent to the remote server for authentication via an encrypted Ethernet channel when the IoT device is deployed in the field. A novel board ID generator is implemented by combining outputs of physical unclonable functions (PUFs) embedded in the RFID tag and different chips on the PCB. A light-weight RFID protocol is proposed to enable mutual authentication between RFID readers and tags. We also implement a secure interchip communication on the PCB. Simulations and experimental results using Spartan 3E FPGAs demonstrate the effectiveness of this system. The efficiency of the radio-frequency (RF) communication has also been verified via a PCB prototype with a printed slot antenna.
TL;DR: This work presents a fast multi-threaded method to find the smallest micro-architecture for a given BIP and target latency by discriminating between all different exploration knobs and exploring these concurrently.
Abstract: This works presents a Design Space Exploration (DSE) method for Behavioral IPs (BIPs) given in ANSI-C or SystemC to find the smallest micro-architecture for a specific target latency Previous work on High-Level Synthesis (HLS) DSE mainly focused on finding a tradeoff curve with Pareto-optimal designs HLS is, however, a single process (component) synthesis method Very often, the latency of the components requires a specific fixed latency when inserted within a larger system This work presents a fast multi-threaded method to find the smallest micro-architecture for a given BIP and target latency by discriminating between all different exploration knobs and exploring these concurrently Experimental results show that our proposed method is very effective and comprehensive results compare the quality of results vs the speedup of your proposed explorer
TL;DR: A mathematical modeling framework that is rich in expressivity to capture IoT characteristics from a global perspective is proposed and a set of fundamental challenges in sensing, decentralized computation, robustness, energy efficiency, and hardware security based on the proposed modeling framework are set forward.
Abstract: Constantly advancing integration capability is paving the way for the construction of the extremely large scale continuum of the Internet where entities or things from vastly varied domains are uniquely addressable and interacting seamlessly to form a giant networked system of systems known as the Internet-of-Things (IoT). In contrast to this visionary networked system paradigm, prior research efforts on the IoT are still very fragmented and confined to disjoint explorations of different applications, architecture, security, services, protocol, and economical domains, thus preventing design exploration and optimization from a unified and global perspective. In this context, this survey article first proposes a mathematical modeling framework that is rich in expressivity to capture IoT characteristics from a global perspective. It also sets forward a set of fundamental challenges in sensing, decentralized computation, robustness, energy efficiency, and hardware security based on the proposed modeling framework. Possible solutions are discussed to shed light on future development of the IoT system paradigm.
TL;DR: This article focuses on detecting a special type of attack called code reuse attacks (CRA), which use a recently introduced technique that allows attackers to perform arbitrary computation without injecting their code by reusing only existing code fragments.
Abstract: The ARM CoreSight Program Trace Macrocell (PTM) has been widely deployed in recent ARM processors for real-time debugging and tracing of software. Using PTM, the external debugger can extract execution behaviors of applications running on an ARM processor. Recently, some researchers have been using this feature for other purposes, such as fault-tolerant computation and security monitoring. This motivated us to develop an external security monitor that can detect control hijacking attacks, of which the goal is to maliciously manipulate the control flow of victim applications at an attacker’s disposal. This article focuses on detecting a special type of attack called code reuse attacks (CRA), which use a recently introduced technique that allows attackers to perform arbitrary computation without injecting their code by reusing only existing code fragments. Our external monitor is attached to the outside of the host system via the system bus and ARM CoreSight PTM, and is fed with execution traces of a victim application running on the host. As a majority of CRAs violates the normal execution behaviors of a program, our monitor constantly watches and analyzes the execution traces of the victim application and detects a symptom of attacks when the execution behaviors violate certain rules that normal applications are known to adhere. We present two different implementations for this purpose: a hardware-based solution in which all CRA detection components are implemented in hardware, and a hardware/software mixed solution that can be employed in a more resource-constrained environment where the deployment of full hardware-level CRA detection is burdensome.
TL;DR: This work introduces a novel design approach—the Ψ-chart—where communication patterns and protocols are designed independently of a system’s functionality and resources, via dedicated models, and its implementation is presented in TTool/DIPLODOCUS, a Unified Modeling Language (UML)/SysML framework for the modeling, simulation, formal verification and automatic code generation of data-flow embedded systems.
Abstract: In Model-Driven Engineering system-level approaches, the design of communication protocols and patterns is subject to the design of processing operations (computations) and to their mapping onto execution resources. However, this strategy allows us to capture simple communication schemes (e.g., processor-bus-memory) and prevents us from evaluating the performance of both computations and communications (e.g., impact of application traffic patterns onto the communication interconnect) in a single step. To solve these issues, we introduce a novel design approach—the Ψ-chart—where we design communication patterns and protocols independently of a system’s functionality and resources, via dedicated models. At the mapping step, both application and communication models are bound to the platform resources and transformed to explore design alternatives for both computations and communications. We present the Ψ-chart and its implementation (i.e., communication models and Design Space Exploration) in TTool/DIPLODOCUS, a Unified Modeling Language (UML)/SysML framework for the modeling, simulation, formal verification and automatic code generation of data-flow embedded systems. The effectiveness of our solution in terms of better design quality (e.g., portability, time) is demonstrated with the design of the physical layer of a ZigBee (IEEE 802.15.4) transmitter onto a multi-processor architecture.
TL;DR: This article integrates tradeoff awareness into the CP model and introduces a two-step solving approach that utilizes the advantages of heuristics, while still keeping the completeness property of CP.
Abstract: Due to its complexity, the problem of mapping and scheduling streaming applications on heterogeneous MPSoCs under real-time and performance constraints has traditionally been tackled by incomplete heuristic algorithms. In recent years, approaches based on Constraint Programming (CP) have shown promising results as complete methods for finding optimal mappings, in particular concerning throughput. However, so far none of the available CP approaches consider the tradeoff between throughput and buffer requirements or throughput and power consumption. This article integrates tradeoff awareness into the CP model and introduces a two-step solving approach that utilizes the advantages of heuristics, while still keeping the completeness property of CP. With a number of experiments considering several streaming applications and different platform models, the article illustrates not only the efficiency of the presented model but also its suitability for solving different problems with various combinations of performance constraints.
TL;DR: A dynamically adaptive scrubbing mechanism is proposed that adapts the scrubbing process to heterogeneous applications (composed of periodic/sporadic and streaming/DSP (Digital Signal Processing) tasks), as well as their reconfigurations and modifications at runtime.
Abstract: Commercial off-the-shelf (COTS) reconfigurable devices have been recognized as one of the most suitable processing devices to be applied in nano-satellites, since they can satisfy and combine their most important requirements, namely processing performance, reconfigurability, and low cost. However, COTS reconfigurable devices, in particular Static-RAM Field Programmable Gate Arrays, can be affected by cosmic radiation, compromising the overall nano-satellite reliability. Scrubbing has been proposed as a mechanism to repair faults in configuration memory. However, the current scrubbing mechanisms are predominantly static, unable to adapt to heterogeneous applications and their runtime variations. In this article, a dynamically adaptive scrubbing mechanism is proposed. Through a window-based scrubbing scheduling, this mechanism adapts the scrubbing process to heterogeneous applications (composed of periodic/sporadic and streaming/DSP (Digital Signal Processing) tasks), as well as their reconfigurations and modifications at runtime. Conducted simulation experiments show the feasibility and the efficiency of the proposed solution in terms of system reliability metric and memory overhead.
TL;DR: This study applies the random region covering technique to circuit design automation and proposes a theory to explain why this technique is efficient at searching for the global optimum.
Abstract: Random region covering is a global optimization technique that explores the landscape by introducing multiple random starting points to initiate the local optimization solvers. This study applies the random region covering technique to circuit design automation and proposes a theory to explain why this technique is efficient at searching for the global optimum. In addition to analyzing the efficiency of the random region covering algorithm, the theory gives a probability-based estimation of the goodness of the optimization result. To enhance the efficiency of the random region covering technique, this work evaluates the boundary of top performance regions and proposes a modified random region covering method that only performs the global optimization on the top design region. The results from a large number of mathematical experiments verify the proposed methodology. The optimized designs of a class-E power amplifier and a wide load range operational amplifier outperform both manual designs and other state-of-the-art optimization techniques.
TL;DR: TEI-power is presented, a dynamic voltage and frequency scaling--based dynamic thermal management technique that considers the TEI phenomenon and also the superlinear dependencies of power consumption components on the temperature and outlines a real-time trade-off between delay and power consumption as a function of the chip temperature to provide significant energy savings.
Abstract: FinFETs have emerged as a promising replacement for planar CMOS devices in sub-20nm technology nodes. However, based on the temperature effect inversion (TEI) phenomenon observed in FinFET devices, the delay characteristics of FinFET circuits in sub-, near-, and superthreshold voltage regimes may be fundamentally different from those of CMOS circuits with nominal voltage operation. For example, FinFET circuits may run faster in higher temperatures. Therefore, the existing CMOS-based and TEI-unaware dynamic power and thermal management techniques would not be applicable. In this article, we present TEI-power, a dynamic voltage and frequency scaling--based dynamic thermal management technique that considers the TEI phenomenon and also the superlinear dependencies of power consumption components on the temperature and outlines a real-time trade-off between delay and power consumption as a function of the chip temperature to provide significant energy savings, with no performance penalty—namely, up to 42% energy savings for small circuits where the logic cell delay is dominant and up to 36% energy savings for larger circuits where the interconnect delay is considerable.
TL;DR: This article proposes an affine register file design for GPUs that is energy efficient due to it reducing the redundant executions of both the uniform and affine vectors.
Abstract: A modern GPU can simultaneously process thousands of hardware threads. These threads are grouped into fixed-size SIMD batches executing the same instruction on vectors of data in a lockstep to achieve high throughput and performance. The register files are huge due to each SIMD group accessing a dedicated set of vector registers for fast context switching, and consequently the power consumption of register files has become an important issue. One proposed solution is to replace some of the vector registers by scalar registers, as different threads in a same SIMD group operate on scalar values and so the redundant computations and accesses of these scalar values can be eliminated. However, it has been observed that a significant number of registers containing affine vectors u such that u[i] = b + i × s can be represented by base b and stride s. Therefore, this article proposes an affine register file design for GPUs that is energy efficient due to it reducing the redundant executions of both the uniform and affine vectors. This design uses a pair of registers to store the base and stride of each affine vector and provides specific affine ALUs to execute affine instructions. A method of compiler analysis has been developed to detect scalars and affine vectors and annotate instructions for facilitating their corresponding scalar and affine computations. Furthermore, a priority-based register allocation scheme has been implemented to assign scalars and affine vectors to appropriate scalar and affine register files. Experimental results show that this design was able to dispatch 43.56% of the computations to scalar and affine ALUs when using eight scalar and four affine registers per warp. This resulted in the current design also reducing the energy consumption of the register files and ALUs to 21.86% and 26.54%, respectively, and it reduced the overall energy consumption of the GPU by an average of 5.18%.
TL;DR: This article proposes a noble content-aware bit shuffling (CABS) technique that minimizes bit flips and evenly distributes them to maximize the lifetime of PCM at the bit level and introduces two additional optimizations, namely, addition of an inversion bit and use of an XOR key, to further reduce bit flips.
Abstract: Recently, phase change memory (PCM) has been emerging as a strong replacement for DRAM owing to its many advantages such as nonvolatility, high capacity, low leakage power, and so on However, PCM is still restricted for use as main memory because of its limited write endurance There have been many methods introduced to resolve the problem by either reducing or spreading out bit flips Although many previous studies have significantly contributed to reducing bit flips, they still have the drawback that lower bits are flipped more often than higher bits because the lower bits frequently change their bit values Also, interblock wear-leveling schemes are commonly employed for spreading out bit flips by shifting input data, but they increase the number of bit flips per write In this article, we propose a noble content-aware bit shuffling (CABS) technique that minimizes bit flips and evenly distributes them to maximize the lifetime of PCM at the bit level We also introduce two additional optimizations, namely, addition of an inversion bit and use of an XOR key, to further reduce bit flips Moreover, CABS is capable of recovering from stuck-at faults by restricting the change in values of stuck-at cells Experimental results showed that CABS outperformed the existing state-of-the-art methods in the aspect of PCM lifetime extension with minimal overhead CABS achieved up to 485% enhanced lifetime compared to the data comparison write (DCW) method only with a few metadata bits Moreover, CABS obtained approximately 97% of improved write throughput than DCW because it significantly reduced bit flips and evenly distributed them Also, CABS reduced about 54% of write dynamic energy compared to DCW Finally, we have also confirmed that CABS is fully applicable to BCH codes as it was able to reduce the maximum number of bit flips in metadata cells by 321%
TL;DR: This work proposes a dynamic runtime control which adapts its observations to online temporal properties, further increasing the dynamism of the approach, and mitigating the unnecessary overhead implied by existing static approaches.
Abstract: In real-time mixed-critical systems, Worst-Case Execution Time (WCET) analysis is required to guarantee that timing constraints are respected—at least for high-criticality tasks. However, the WCET is pessimistic compared to the real execution time, especially for multicore platforms. As WCET computation considers the worst-case scenario, it means that whenever a high-criticality task accesses a shared resource in multicore platforms, it is considered that all cores use the same resource concurrently. This pessimism in WCET computation leads to a dramatic underutilization of the platform resources, or even failing to meet the timing constraints. In order to increase resource utilization while guaranteeing real-time guarantees for high-criticality tasks, previous works proposed a runtime control system to monitor and decide when the interferences from low-criticality tasks cannot be further tolerated. However, in the initial approaches, the points where the controller is executed were statically predefined. In this work, we propose a dynamic runtime control which adapts its observations to online temporal properties, further increasing the dynamism of the approach, and mitigating the unnecessary overhead implied by existing static approaches. Our dynamic adaptive approach allows one to control the ongoing execution of tasks based on runtime information, and further increases the gains in terms of resource utilization compared with static approaches.
TL;DR: This survey provides a comprehensive description of the existing parametric dataflow MoCs (constructs, constraints, properties, static analyses) and compares them using a common example to help designers of streaming applications choose the most suitable model for their needs and pave the way for the design of new parametric MoCs.
Abstract: Dataflow models of computation (MoCs) are widely used to design embedded signal processing and streaming systems. Dozens of dataflow MoCs have been proposed in the past few decades. More recently, several parametric dataflow MoCs have been presented as an interesting tradeoff between analyzability and expressiveness. They offer a controlled form of dynamism under the form of parameters (e.g., parametric rates), along with runtime parameter configuration. This survey provides a comprehensive description of the existing parametric dataflow MoCs (constructs, constraints, properties, static analyses) and compares them using a common example. The main objectives are to help designers of streaming applications choose the most suitable model for their needs and pave the way for the design of new parametric MoCs.
TL;DR: A topological reduction method is introduced that is able to automatically generate interpretable macromodel circuits in symbolic form; that is, the circuit elements in the compact model maintain analytical relations of the parameters of the original full circuit.
Abstract: In the field of analog integrated circuit (IC) design, small-signal macromodels play indispensable roles for developing design insight and sizing reference. However, the subject of automatically generating symbolic low-order macromodels in human readable circuit form has not been well studied. Traditionally, work has been published on reducing full-scale symbolic transfer functions to simpler forms but without the guarantee of interpretability. On the other hand, methodologies developed for interconnect circuits (mainly resistor-capacitor-inductor (RCL) networks) are not suitable for analog ICs. In this work, a topological reduction method is introduced that is able to automatically generate interpretable macromodel circuits in symbolic form; that is, the circuit elements in the compact model maintain analytical relations of the parameters of the original full circuit. This type of symbolic macromodel has several benefits that other traditional modeling methods do not offer: First, reusability, namely that designer need not repeatedly generate macromodels for the same circuit even it is re-sized or re-biased; second, interpretability, namely a designer may directly identify circuit parameters (in the original circuit) that are closely related to the dominant frequency characteristics, such as dc gain, gain/phase margins, and dominant poles/zeros. The effectiveness and computational efficiency of the proposed method have been validated by several operational amplifier (opamp) circuit examples.
TL;DR: An optimized charge and drive management (OCDM) methodology that selects the optimal driving route, schedules daily trips, and optimizes the EV charging process while considering the driver’s timing preference is proposed.
Abstract: Electric vehicles (EVs) have been considered as a solution to the environmental issues caused by transportation, such as air pollution and greenhouse gas emission. However, limited energy capacity, scarce EV supercharging stations, and long recharging time have brought anxiety to drivers who use EVs as their main mean of transportation. Furthermore, EV owners need to deal with a huge battery replacement cost when the battery capacity degrades. Yet in-house EV chargers affect the pattern of the power grid load, which is not favorable to the utilities. The driving route, departure/arrival time of daily trips, and electricity price influence the EV energy consumption, battery lifetime, electricity cost, and EV charger load on the power grid. The EV driving range and battery lifetime issues have been addressed by battery management systems and route optimization methodologies. However, in this article, we are proposing an optimized charge and drive management (OCDM) methodology that selects the optimal driving route, schedules daily trips, and optimizes the EV charging process while considering the driver’s timing preference. Our methodology will improve the EV driving range, extend the battery lifetime, reduce the recharging cost, and diminish the influence of EV chargers on the power grid. The performance of our methodology compared to the state of the art have been analyzed by experimenting on three benchmark EVs and three drivers. Our methodology has decreased EV energy consumption by 27%, improved the battery lifetime by 24.8%, reduced the electricity cost by 35%, and diminished the power grid peak load by 17% while increasing less than 20 minutes of daily driving time. Moreover, the scalability of our OCDM methodology for different parameters (e.g., time resolution and multiday cycles) in terms of execution time and memory usage has been analyzed.
TL;DR: Experimental results show that both the word- and partition-level designs can improve the lifetime of the non-volatile caches effectively with low performance and energy overheads.
Abstract: Non-volatile memory technologies are among the most promising technologies for implementing the main memories and caches in future microprocessors and replacing the traditional DRAM and SRAM technologies. However, one of the most challenging design issues of the non-volatile memory technologies is the limited write. In this article, we first propose to exploit the narrow-width values to improve the lifetime of non-volatile last-level caches with word-level write variation reduction. Leading zeros masking scheme is proposed to reduce the write stress to the upper half of the narrow-width data. To balance the write variations between the upper half and the lower half of the narrow-width data, two swapping schemes, the swap on write (SW) and swap on replacement (SRepl), are proposed. Two existing optimization schemes, the multiple dirty bit (MDB) and read before write (RBW), are adopted with our word-level swapping design. To further reduce the write variation on the partition level, we propose to exploit the cache partitioning design to improve the lifetime. Based on the observation that different applications demonstrate different cache access (write) behaviors, we propose to partition the last-level cache for different applications and balance the write variations by partition swapping. Both software-based and hardware-based partitioning and swapping schemes are proposed and evaluated for different situations. Our experimental results show that both our word- and partition-level designs can improve the lifetime of the non-volatile caches effectively with low performance and energy overheads.
TL;DR: This article formulate two mixed integer-linear programs for optimal multi-project, multi-resource allocation with task precedence and resource co-constraints and demonstrates the capability of scheduling over two dozen chip development projects at the design center level, subject to resource and datacenter capacity limits.
Abstract: A large semiconductor product company spends hundreds of millions of dollars each year on design infrastructure to meet tapeout schedules for multiple concurrent projects. Resources (servers, electronic design automation tool licenses, engineers, and so on) are limited and must be shared -- and the cost per day of schedule slip can be enormous. Co-constraints between resource types (e.g., one license per every two cores (threads)) and dedicated versus shareable resource pools make scheduling and allocation hard. In this article, we formulate two mixed integer-linear programs for optimal multi-project, multi-resource allocation with task precedence and resource co-constraints. Application to a real-world three-project scheduling problem extracted from a leading-edge design center of anonymized Company X shows substantial compute and license costs savings. Compared to the product company, our solution shows that the makespan of schedule of all projects can be reduced by seven days, which not only saves ∼ 2.7% of annual labor and infrastructure costs but also enhances market competitiveness. We also demonstrate the capability of scheduling over two dozen chip development projects at the design center level, subject to resource and datacenter capacity limits as well as per-project penalty functions for schedule slips. The design center ended up purchasing 600 additional servers, whereas our solution demonstrates that the schedule can be met without having to purchase any additional servers. Application to a four-project scheduling problem extracted from a leading-edge design center in a non-US location shows availability of up to ∼ 37% headcount reduction during a half-year schedule for just one type of chip design activity.
TL;DR: Experimental results show that the routability can be improved significantly by applying congestion-based virtual sizing and that the slicing representation can improve the regularity of the placement solutions and hence improve the routable with higher efficiency compared to the nonslicing representation.
Abstract: The exponential increase in scale and complexity of very large-scale integrated circuits (VLSIs) poses a great challenge to current electronic design automation (EDA) techniques. As an essential step in the whole EDA layout synthesis, placement is attracting more and more attention, especially for analog and mixed-signal integrated circuits. Recently, experts in this field have observed a variety of analog-specific layout constraints to obtain high-performance placement solutions. These constraints include symmetry, alignment, boundary, preplace, abutment, range and maximum separation, and routability of the placement solutions. In this article, the effectiveness of slicing and nonslicing representation is investigated. Additionally, the technique of congestion-based virtual sizing is proposed. Experimental results show that the routability can be improved significantly by applying congestion-based virtual sizing. Results also show that the slicing representation can improve the regularity of the placement solutions and hence improve the routability with higher efficiency compared to the nonslicing representation.
TL;DR: A novel hybrid approach based on integrating the Logic-Based Benders Decomposition principle with a pure Integer Linear Programming (ILP) model is introduced for mapping applications described by Directed Acyclic Graphs (DAGs) on platforms consisting of heterogeneous cores.
Abstract: The proper mapping of an application on a multi-core platform and the scheduling of its tasks are key elements to achieve the maximum performance. In this article, a novel hybrid approach based on integrating the Logic-Based Benders Decomposition (LBBD) principle with a pure Integer Linear Programming (ILP) model is introduced for mapping applications described by Directed Acyclic Graphs (DAGs) on platforms consisting of heterogeneous cores. The LBBD approach combines two optimization techniques with complementary strengths, namely ILP and Constraint Programming (CP), and is employed as a cut generation scheme. The generated constraints are utilized by the ILP model to cut possible assignment combinations aiming at improving the solution or proving the optimality of the best-found one. The introduced approach was applied both on synthetic DAGs and on DAGs derived from real applications. Through the proposed approach, many problems were optimally solved that could not be solved by any of the above methods (ILP, LBBD) alone within a time limit of 2 hours, while the overall solution time was also significantly decreased. Specifically, the hybrid method exhibited speedups equal to 4.2× for the synthetic instances and 10× for the real-application DAGs over the LBBD approach and two orders of magnitude over the ILP model.
TL;DR: This article is presenting a survey of existing hardware-enabled pharmaceutical supply chain security schemes and their limitations, and highlights the current challenges and point out future research directions.
Abstract: The pharmaceutical supply chain is the pathway through which prescription and over-the-counter (OTC) drugs are delivered from manufacturing sites to patients. Technological innovations, price fluctuations of raw materials, as well as tax, regulatory, and market demands are driving change and making the pharmaceutical supply chain more complex. Traditional supply chain management methods struggle to protect the pharmaceutical supply chain, maintain its integrity, enhance customer confidence, and aid regulators in tracking medicines. To develop effective measures that secure the pharmaceutical supply chain, it is important that the community is aware of the state-of-the-art capabilities available to the supply chain owners and participants. In this article, we will be presenting a survey of existing hardware-enabled pharmaceutical supply chain security schemes and their limitations. We also highlight the current challenges and point out future research directions. This survey should be of interest to government agencies, pharmaceutical companies, hospitals and pharmacies, and all others involved in the provenance and authenticity of medicines and the integrity of the pharmaceutical supply chain.
TL;DR: This article introduces, implements, and evaluates novel algorithms for effective integration of voltage assignment into the inner floorplanning loops, and achieves results that surpass naïve low-power and high-performance voltage assignment by 17% and 10%, on average.
Abstract: Voltage assignment is a well-known technique for circuit design, which has been applied successfully to reduce power consumption in classical 2D integrated circuits (ICs). Its usage in the context of 3D ICs has not been fully explored yet although reducing power in 3D designs is of crucial importance, for example, to tackle the ever-present challenge of thermal management. In this article, we investigate the effective and efficient partitioning of 3D designs into multiple voltage domains during the floorplanning step of physical design. In particular, we introduce, implement, and evaluate novel algorithms for effective integration of voltage assignment into the inner floorplanning loops. Our algorithms are compatible not only with the traditional objectives of 2D floorplanning but also with the additional objectives and constraints of 3D designs, including the planning of through-silicon vias (TSVs) and the thermal management of stacked dies. We test our 3D floorplanner extensively on the GSRC benchmarks as well as on an augmented version of the IBM-HB+ benchmarks. The 3D floorplans are shown to achieve effective trade-offs for power and delays throughout different configurations—our results surpass naive low-power and high-performance voltage assignment by 17% and 10%, on average. Finally, we release our 3D floorplanning framework as open-source code.
TL;DR: Experimental results indicate that the K-means clustering heuristic significantly reduces the clock power by clustering modules with similar switching behavior and close proximity, and the SA algorithm effectively inserts the shutdown gates to a 3D clock tree, while considering control TSV’s placement.
Abstract: We propose efficient algorithms to construct a low-power clock tree for through-silicon-via (TSV)-based 3D-ICs. We use shutdown gates to save clock trees’ dynamic power, which selectively turn off certain clock tree branches to avoid unnecessary clock activities when the modules in these tree branches are inactive. While this clock gating technique has been extensively studied in 2D circuits, its application in 3D-ICs is unclear. In 3D-ICs, a shutdown gate is connected to a control signal unit through control TSVs, which may cause placement conflicts with existing clock TSVs in the layout due to TSV’s large physical dimension. We develop a two-phase clock tree synthesis design flow for 3D-ICs: (1) 3D abstract clock tree generation based on K-means clustering and (2) clock tree embedding with simultaneous shutdown gates’ insertion based on simulated annealing (SA) and a force-directed TSV placer. Experimental results indicate that (1) the K-means clustering heuristic significantly reduces the clock power by clustering modules with similar switching behavior and close proximity, and (2) the SA algorithm effectively inserts the shutdown gates to a 3D clock tree, while considering control TSV’s placement. Compared with previous 3D clock tree synthesis techniques, our K-means clustering-based approach achieves larger reduction in clock tree power consumption while ensuring zero clock skew.
TL;DR: This article presents a technique to track changes in the dynamic loop characteristics of DC-DC converters without disturbing the normal mode of operation using a white noise–based excitation and correlation, and shows that the degraded part can be diagnosed to take remedial action.
Abstract: Complex electronic systems include multiple power domains and drastically varying dynamic power consumption patterns, requiring the use of multiple power conversion and regulation units. High-frequency switching converters have been gaining prominence in the DC-DC converter market due to their high efficiency and smaller form factor. Unfortunately, they are also subject to higher process variations, and faster in-field degradation, jeopardizing stable operation of the power supply. This article presents a technique to track changes in the dynamic loop characteristics of DC-DC converters without disturbing the normal mode of operation using a white noise–based excitation and correlation. Using multiple points for injection and analysis, we show that the degraded part can be diagnosed to take remedial action. White noise excitation is generated via a pseudo-random disturbance at reference, load current, and pulse-width modulation (PWM) nodes of the converter with the test signal energy being spread over a wide bandwidth, without significantly affecting the converter noise and ripple floor. The impulse response is extracted by correlating the random input sequence with the disturbed output generated. Test signal analysis is achieved by correlating the pseudo-random input sequence with the output response and thereby accumulating the desired behavior over time and pulling it above the noise floor of the measurement set-up. An off-the-shelf power converter, LM27402, is used as the device-under-test (DUT) for experimental verification. Experimental results show that the proposed technique can estimate converter natural frequency and quality factor (Q-factor) within ±2.5% and ±0.7% error margin respectively, over changes in load inductance and capacitance. For the diagnosis purpose, a measure of inductor's DC resistance (DCR) value, which is the inductor's series resistance and indicative of the degradation in inductor's Q-factor, is estimated within less than ±1.6% error margin.
TL;DR: In this article, a multi-mode dataflow model with task migration between modes is proposed to minimize the resource requirement of multi-modal dataflow models, and a genetic algorithm is used to schedule all SDF graphs in all modes simultaneously.
Abstract: The Synchronous Data Flow (SDF) model is widely used for specifying signal processing or streaming applications. Since modern embedded applications become more complex with dynamic behavior changes at runtime, several extensions of the SDF model have been proposed to specify the dynamic behavior changes while preserving static analyzability of the SDF model. They assume that an application has a finite number of behaviors (or modes), and each behavior (mode) is represented by an SDF graph. They are classified as multi-mode dataflow models in this article. While there exist several scheduling techniques for multi-mode dataflow models, no one allows task migration between modes. By observing that the resource requirement can be additionally reduced if task migration is allowed, we propose a multiprocessor scheduling technique of a multi-mode dataflow graph considering task migration between modes. Based on a genetic algorithm, the proposed technique schedules all SDF graphs in all modes simultaneously to minimize the resource requirement. To satisfy the throughput constraint, the proposed technique calculates the actual throughput requirement of each mode and the output buffer size for tolerating throughput jitter. We compare the proposed technique with a method that analyzes SDF graphs in each execution mode separately, a method that does not allow task migration, and a method that does not allow mode-overlapped schedule for synthetic examples and five real applications: H.264 decoder, lane detection, vocoder, MP3 decoder, and printer pipeline.
TL;DR: It is argued that symbolic analyses are more appropriate since they express the system performance as a function of parameters (i.e., input and output rates, execution times) since they can be quickly evaluated for each different configuration or checked with respect to different quality-of-service requirements.
Abstract: The synchronous dataflow model of computation is widely used to design embedded stream-processing applications under strict quality-of-service requirements (e.g., buffering size, throughput, input-output latency). The required analyses can either be performed at compile time (for design space exploration) or at runtime (for resource management and reconfigurable systems). However, these analyses have an exponential time complexity, which may cause a huge runtime overhead or make design space exploration unacceptably slow.In this article, we argue that symbolic analyses are more appropriate since they express the system performance as a function of parameters (i.e., input and output rates, execution times). Such functions can be quickly evaluated for each different configuration or checked with respect to different quality-of-service requirements. We provide symbolic analyses for computing the maximal throughput of acyclic synchronous dataflow graphs, the minimum required buffers for which as soon as possible (ASAP) scheduling achieves this throughput, and finally, the corresponding input-output latency of the graph. The article first investigates these problems for a single parametric edge. The results are extended to general acyclic graphs using linear approximation techniques. We assess the proposed analyses experimentally on both synthetic and real benchmarks.
TL;DR: This work proposes a multiharmonic model that provides a complete small-signal characterization of both DC averages and high-order harmonic responses and corrects the misleading results of the existing models by providing truthful characterization of the overall converter AC response.
Abstract: Small-signal models of pulse-width modulation (PWM) converters are widely used for analyzing stability and play an important role in converter design and control. However, existing small-signal models either are based on averaged DC behaviors, and hence are unable to capture frequency responses that are faster than the switching frequency, or greatly approximate these high-frequency responses. We address the severe limitations of the existing models by proposing a multiharmonic model that provides a complete small-signal characterization of both DC averages and high-order harmonic responses. The proposed model captures important high-frequency overshoots and undershoots of the converter response, which are otherwise unaccounted for by the existing techniques. In two converter examples, the proposed model corrects the misleading results of the existing models by providing truthful characterization of the overall converter AC response and offers important guidance for converter design and closed-loop control.
TL;DR: A novel Model-Based Design (MBD) methodology to model, co-simulate, design, and optimize microgrid and its multi-level controllers is presented and it is illustrated that implementing a hierarchical controller reduces the average power consumption and shifts the peak load for cost saving.
Abstract: In power systems, the traditional, non-interactive, and manually controlled power grid has been transformed to a cyber-dominated smart grid. This cyber-physical integration has provided the smart grid with communication, monitoring, computation, and controlling capabilities to improve its reliability, energy efficiency, and flexibility. A microgrid is a localized and semi-autonomous group of smart energy systems that utilizes the above-mentioned capabilities to drive modern technologies such as electric vehicle charging, home energy management, and smart appliances. Design, upgrading, test, and verification of these microgrids can get too complicated to handle manually. The complexity is due to the wide range of solutions and components that are intended to address the microgrid problems. This article presents a novel Model-Based Design (MBD) methodology to model, co-simulate, design, and optimize microgrid and its multi-level controllers. This methodology helps in the design, optimization, and validation of a microgrid for a specific application. The application rules, requirements, and design-time constraints are met in the designed/optimized microgrid while the implementation cost is minimized. Based on our novel methodology, a design automation, co-simulation, and analysis tool, called GridMAT, is implemented. Our experiments have illustrated that implementing a hierarchical controller reduces the average power consumption by 8% and shifts the peak load for cost saving. Moreover, optimizing the microgrid design using our MBD methodology considering smart controllers has decreased the total implementation cost. Compared to the conventional methodology, the cost decreases by 14% and compared to the MBD methodology where smart controllers are not considered, it decreases by 5%.
TL;DR: An algorithm which directly constructs two scheduling tables without using a priority order is proposed which outperforms both the OCBP-based algorithm and MCEDF in terms of the number of instances scheduled in a randomly generated set of instances.
Abstract: Real-time and embedded systems are moving from the traditional design paradigm to integration of multiple functionalities onto a single computing platform. Some of the functionalities are safety critical and subject to certification. The rest of the functionalities are nonsafety critical and do not need to be certified. Designing efficient scheduling algorithms which can be used to meet the certification requirement is challenging. Our research considers the time-triggered approach to scheduling of mixed-criticality jobs with two criticality levels. The first proposed algorithm for the time-triggered approach is based on the OCBP scheduling algorithm which finds a fixed-priority order of jobs. Based on this priority order, the existing algorithm constructs two scheduling tables SLOoc and SHIoc. The scheduler uses these tables to find a scheduling strategy. Another time-triggered algorithm called MCEDF was proposed as an improvement over the OCBP-based algorithm. Here we propose an algorithm which directly constructs two scheduling tables without using a priority order. Furthermore, we show that our algorithm schedules a strict superset of instances which can be scheduled by the OCBP-based algorithm as well as by MCEDF. We show that our algorithm outperforms both the OCBP-based algorithm and MCEDF in terms of the number of instances scheduled in a randomly generated set of instances. We generalize our algorithm for jobs with m criticality levels. Subsequently, we extend our algorithm to find scheduling tables for periodic and dependent jobs. Finally, we show that our algorithm is also applicable to mixed-criticality synchronous programs upon uniprocessor platforms and schedules a bigger set of instances than the existing algorithm.