TL;DR: In this article, an integrated circuit, such as included as a portion of a sensor node, can include a regulator circuit having an input coupleable to an energy harvesting transducer.
Abstract: An integrated circuit, such as included as a portion of a sensor node, can include a regulator circuit having an input coupleable to an energy harvesting transducer. The integrated circuit can include a wireless receiver circuit coupled to the regulator circuit and configured to wirelessly receive at least enough operating energy to establish operation of the sensor node without requiring the energy harvesting transducer. The integrated circuit can include a digital processor circuit coupled to the regulator circuit and a power management processor circuit. The digital processor circuit or one or more other circuits can include a subthreshold operational mode established by the power management processor circuit based on the selected energy consumption level. For example, establishing the subthreshold operational mode can include adjusting or selecting a supply voltage so as to establish subthreshold operation of a field effect transistor (FET) in the digital processor circuit or other circuits.
TL;DR: A half-rate clock and data recovery circuit and a deserializer that employ charge-steering logic to reduce the power consumption and is realized in 65-nm technology.
Abstract: The demand for higher data rates in serial links has exacerbated the problem of power consumption, motivating extensive work on receiver and transmitter building blocks. This paper presents a half-rate clock and data recovery circuit and a deserializer that employ charge-steering logic to reduce the power consumption. Realized in 65-nm technology, the overall circuit draws 5 mW from a 1-V supply, producing a clock with an rms jitter of 1.5 ps and a jitter tolerance of 0.5 UIpp at 5 MHz jitter frequency.
TL;DR: An aging-aware logic synthesis approach is proposed to increase circuit lifetime with respect to a specific guardband and shows that the proposed approach improves circuit lifetime in average by more than 3X with negligible impact on area.
Abstract: As CMOS technology scales down into the nanometer regime, designers have to add pessimistic timing margins to the circuit as guardbands to avoid timing violations due to various reliability effects, in particular accelerated transistor aging. Since aging is workload-dependent, the aging rates of different paths are non-uniform, and hence, design time delay-balanced circuits become significantly unbalanced after some operational time. In this paper, an aging-aware logic synthesis approach is proposed to increase circuit lifetime with respect to a specific guardband. Our main objective is to optimize the design timing with respect to post-aging delay in a way that all paths reach the assigned guardband at the same time. In this regard, in an iterative process, after computing the post-aging delays, the lifetime is improved by putting tighter timing constraints on paths with higher aging rate and looser constraints on paths which have less post-aging delay than the desired guarband. The experimental results shows that the proposed approach improves circuit lifetime in average by more than 3X with negligible impact on area. Our approach is implemented on top of a commercial synthesis toolchain, and hence scales very well.
TL;DR: This tutorial will cover the basic principles and advantages of asynchronous logic, some insights on new research challenges, and will present the GALS scheme as an intermediate design style with recent results in asynchronous Network-on-Chip for future Many Core architectures.
Abstract: The growing variability and complexity of advanced CMOS technologies makes the physical design of clocked logic in large Systems-on-Chip more and more challenging. Asynchronous logic has been studied for many years and become an attractive solution for a broad range of applications, from massively parallel multi-media systems to systems with ultra-low power & low-noise constraints, like cryptography, energy autonomous systems, and sensor-network nodes. The objective of this embedded tutorial is to give a comprehensive and recent overview of asynchronous logic. The tutorial will cover the basic principles and advantages of asynchronous logic, some insights on new research challenges, and will present the GALS scheme as an intermediate design style with recent results in asynchronous Network-on-Chip for future Many Core architectures. Regarding industrial acceptance, recent asynchronous logic applications within the microelectronics industry will be presented, with a main focus on the commercial CAD tools available today.
TL;DR: The proposed Sub-threshold (Sub-Vt) Self-Adaptive VDD Scaling (SSAVS) system for a Wireless Sensor Network with the objective of lowest possible power dissipation for the prevailing throughput and circuit conditions, yet high robustness and with minimal overheads is proposed.
Abstract: We propose a Sub-threshold (Sub-Vt) Self-Adaptive VDD Scaling (SSAVS) system for a Wireless Sensor Network with the objective of lowest possible power dissipation for the prevailing throughput and circuit conditions, yet high robustness and with minimal overheads. The effort to achieve the lowest possible power operation is by means of adjusting VDD to the minimum voltage (within 50 mV) for said conditions. High robustness is achieved by adopting the Quasi-Delay-Insensitive (QDI) asynchronous-logic protocols where the circuits therein are self-timed, and by the embodiment of our proposed Pre-Charged-Static-Logic (PCSL) design approach; when compared against competing approaches, the PCSL is most competitive in terms of energy/operation, delay and IC area. By exploiting the already existing request and acknowledge signals of the QDI protocols, the ensuing overhead of the SSAVS is very modest. The filter bank embodied in the SSAVS is shown to be ultra-low power and highly robust. When benchmarked against the competing conventional Dynamic-Voltage-Frequency-Scaling (DVFS) synchronous-logic counterpart, no one system is particularly advantageous when the operating conditions are known. However, when the competing DVFS system is designed for the worst-case condition, the proposed SSAVS system is somewhat more competitive, including uninterrupted operation while its VDD self-adjusts to the varying conditions.
TL;DR: This work designs and implementation of Virtex-6 circuit to re-assure power reduction in sequential circuit and shows that there is reduction in dynamic power especialy significant reduction in clock power.
Abstract: In this work, our focus is on study and analysis of various clock gating technique and design and analysis of clock gating based low power sequential circuit at RTL level. Virtex-6 is 40-nm FPGA, on which we implement our circuit to re-assure power reduction in sequential circuit. Clock gating is implemented on smaller circuit called D flip-flop and on larger circuit called 16-bit register. The percentage of reduction in dynamic power especially clock power is verified for different device operating frequency. Here, we achieved 87.09%, 88.02%, 88.02%, and 88.01% clock power reduction in this work when clock period is 1ns, 0.1ns, 0.01ns and 0.001ns respectively. Design and implementation result shows that there is reduction in dynamic power especialy significant reduction in clock power We also achieved 15%, 14.22%, 14.58%, 14.57% and 14.57% dynamic power reduction when clock period is 10ns, 1ns, 0.1ns, 0.01ns, and 01ps respectively.
TL;DR: This paper presents the first BDD-based majority logic decomposition method and a Logic decomposition system, BDS-MAJ, that enables efficient logic synthesis for both random control and datapath circuits.
Abstract: Despite the impressive advance of logic synthesis during the past decades, a general methodology capable of efficiently synthesizing both control and datapath logic is still missing. Indeed, while synthesis techniques for random control logic (AND/OR-intensive) are well established, no dominant method for automated synthesis of datapath logic (XOR/MAJ-intensive) has yet emerged. Recently, Binary Decision Diagrams (BDDs) have been adopted to create an optimization system, named BDS, that supports integrated synthesis of both AND/OR- and XOR-intensive functions through functional logic decomposition on the BDD structure. However, it does not support direct decomposition and manipulation of majority logic which, instead, is widely used in datapath circuits. In this paper, we present the first BDD-based majority logic decomposition method and a logic decomposition system, BDS-MAJ, that enables efficient logic synthesis for both random control and datapath circuits. Experimental results show that logic synthesis based on BDS-MAJ produces CMOS circuits having on average 28.8% and 26.4% less area and, at the same time, 12.8% and 20.9% smaller delay with respect to academic ABC and BDS synthesis tools. Compared to commercial Synopsys Design Compiler synthesis tool, BDS-MAJ reduces on average the circuit area by 6.0% and decreases the delay by 7.8%.
TL;DR: This paper presents a robust and energy-efficient computation architecture exploiting an asynchronous timing strategy to dynamically minimize leakage and to self-adapt to process variations and different operating conditions.
Abstract: Further power and energy reductions via technology and voltage scaling have become extremely difficult due to leakage and variability issues. In this paper, we present a robust and energy-efficient computation architecture exploiting an asynchronous timing strategy to dynamically minimize leakage and to self-adapt to process variations and different operating conditions. Based on a logic topology with built-in leakage suppression, the prototype asynchronous neural signal processor demonstrates robust sub-threshold operation down to 0.25 V, while consuming only 460 nW in 0.03 mm2 in a 65 nm CMOS technology. These results represent a 4.4× reduction in power, a 3.7× reduction in energy and a 2.2× reduction in power density, when compared to the state-of-the-art processors.
TL;DR: The experimental verification of noise-enhanced logic behaviour in an electronic analog of a synthetic genetic network, composed of two repressors and two constitutive promoters, finds that the input-output characteristics of a logic gate is reproduced faithfully under moderate noise.
Abstract: We report the experimental verification of noise-enhanced logic behaviour in an electronic analog of a synthetic genetic network, composed of two repressors and two constitutive promoters. We observe good agreement between circuit measurements and numerical prediction, with the circuit allowing for robust logic operations in an optimal window of noise. Namely, the input-output characteristics of a logic gate is reproduced faithfully under moderate noise, which is a manifestation of the phenomenon known as Logical Stochastic Resonance. The two dynamical variables in the system yield complementary logic behaviour simultaneously. The system is easily morphed from AND/NAND to OR/NOR logic.
TL;DR: A constant delay (CD) logic style is proposed in this paper, targeting at full-custom high-speed applications, and exhibits a unique characteristic where the output is pre-evaluated before the inputs from the preceding stage is ready.
Abstract: A constant delay (CD) logic style is proposed in this paper, targeting at full-custom high-speed applications. The CD characteristic of this logic style regardless of the logic type makes it suitable in implementing complicated logic expressions such as addition. CD logic exhibits a unique characteristic where the output is pre-evaluated before the inputs from the preceding stage is ready. This feature offers performance advantage over static and dynamic domino logic styles in a single-cycle multistage circuit block. Several design considerations including timing window width adjustment and clock distribution are discussed. Using 65-nm general-purpose CMOS technology, the proposed logic demonstrates an average speedup of 94% and 56% over static and dynamic domino logic, respectively, in five different logic gates. Simulation results of 8-bit ripple carry adders show that CD logic is 39% and 23% faster than the static and dynamic-based adders, respectively. CD logic also demonstrates 39% speedup and 64% (22%) energy-delay product (EDP) reduction from static logic at 100% (10%) data activity in 32-bit carry lookahead adders. For 8-bit Wallace tree multiplier, CD logic achieves a similar speedup with at least 50% EDP reduction across all data activities.
TL;DR: Results indicate that concurrent error masking based on approximate logic circuits can mask 88% of targeted logical errors for 34% area overhead and 17% power overhead, and 100% timing errors on all timing paths within 20% of the critical path delay.
Abstract: With technology scaling, logical errors arising due to single-event upsets and timing errors arising due to dynamic variability effects are increasing in logic circuits. Existing techniques for online resilience to logical and timing errors are limited to detection of errors, and often result in significant performance penalty and high area/power overhead. This paper proposes approximate logic circuits as a design approach for low cost concurrent error masking. An approximate logic circuit predicts the value of the outputs of a given logic circuit for a specified portion of the input space, and can indicate uncertainty about the outputs over the rest of the input space. Using portions of the input space that are most vulnerable to errors as the specified input space, we show that approximate logic circuits can be used to provide low overhead concurrent error masking support for a given logic circuit. We describe efficient algorithms for synthesizing approximate circuits for concurrent error masking of logical and timing errors. Results indicate that concurrent error masking based on approximate logic circuits can mask 88% of targeted logical errors for 34% area overhead and 17% power overhead, 100% timing errors on all timing paths within 10% of the critical path delay for 23% area overhead and 8% power overhead, and 100% timing errors on all timing paths within 20% of the critical path delay for 42% area overhead and 26% power overhead.
TL;DR: In this article, the clock distribution network (CDN) is separated from the rest of the logic to improve the clock tree design and reduce the area footprint, and the CDN is connected to the logic tier(s) via high-density inter-tier vias.
Abstract: Exemplary embodiments of the invention are directed to systems and method for designing a clock distribution network for an integrated circuit. The embodiments identify critical sources of clock skew, tightly control the timing of the clock and build that timing into the overall clock distribution network and integrated circuit design. The disclosed embodiments separate the clock distribution network (CDN), i.e., clock generation circuitry, wiring, buffering and registers, from the rest of the logic to improve the clock tree design and reduce the area footprint. In one embodiment, the CDN is separated to a separate tier of a 3D integrated circuit, and the CDN is connected to the logic tier(s) via high-density inter-tier vias. The embodiments are particularly advantageous for implementation with monolithic 3D integrated circuits.
TL;DR: A single-cycle issue queue circuit architecture that simplifies the wakeup and selection logic is proposed, allowing simulated circuit operation at over 4 GHz in a foundry 45 nm SOI fabrication process.
Abstract: In this paper a single-cycle issue queue circuit architecture that simplifies the wakeup and selection logic is proposed. The micro-architecture and fully static CMOS circuits are presented for a 32-entry queue that issues four instructions per cycle. The instruction-ready signals are divided into groups and processed in parallel to issue the four oldest ready instructions. The complete issue queue and prioritization logic requires 20 inversions, allowing simulated circuit operation at over 4 GHz in a foundry 45 nm SOI fabrication process.
TL;DR: A very-high-speed integrated circuits HDL (VHDL) behavioral model for NML circuits, which allows the evaluation of not only the logic behavior but also its power dissipation, based on a technological solution called “snake-clock.”
Abstract: The interest in emerging nanotechnologies has been recently focused on nanomagnetic logic (NML), which has unique appealing features. NML circuits have very low power consumption and, because of their magnetic nature, maintain the information safely stored even without power supply. The nature of these circuits is much different from that of CMOS circuits. As a consequence, to better understand NML logic, complex circuits and not only simple gates must be designed. This constraint calls for a new design and simulation methodology. It should efficiently encompass manifold properties: 1) being based on commonly used hardware description language (HDL) in order to easily manage complexity and hierarchy; 2) maintaining a clear link with physical characteristics; and 3) modeling performance aspects such as speed and power, together with logic behavior. In this paper, we present a very-high-speed integrated circuits HDL (VHDL) behavioral model for NML circuits, which allows the evaluation of not only the logic behavior but also its power dissipation. It is based on a technological solution called “snake-clock.” We demonstrate this model using a case study which offers the right variety of internal substructures to test the method: a 4-bit microprocessor designed using asynchronous logic. The model enables a hierarchical bottom-up evaluation of the processor logic behavior, area, and power dissipation, which we evaluate using a benchmark division algorithm. The results highlight the flexibility and the efficiency of this model, as well as the remarkable improvements that it brings to the analysis of NML circuits.
TL;DR: This work proposes a complete design infrastructure to physically implement an asynchronous digital net list with orders of magnitude time savings over expert human effort and evaluates this flow against several asynchronous circuit benchmarks.
Abstract: Asynchronous circuits are an attractive option to overcome many challenges currently faced by chip designers, such as increased process variation. However, the lack of CAD tools to generate asynchronous circuits limits the adoption of this promising technology. In this absence of CAD tools, the most time consuming part of chip design is the back-end (physical design) effort. We propose a complete design infrastructure to physically implement an asynchronous digital net list with orders of magnitude time savings over expert human effort. The core of this flow is the ability to generate customized logic that is compatible with available ASIC flows. We evaluate our flow against several asynchronous circuit benchmarks for which full custom physical implementations exist. Compared to hand-optimized custom designs, our flow produces layout that has, on average, a 51% area overhead, with a 12% increase in energy and a 9% increase in delay.
TL;DR: A unified capture scheme is proposed to generate programmable clock signals for the detection of both SDDs and circuit aging and the proposed aging-resistant design method enables the offline test circuit to be reused in online operations.
Abstract: Small delay defect (SDD) and aging-induced circuit failure are both prominent reliability concerns for nanoscale integrated circuits. Faster-than-at-speed testing is effective on SDD detection in manufacturing testing, which is always implemented by designing a suite of test signal generation circuits on the chip. Meanwhile, the integration of online aging sensors is becoming attractive in monitoring aging-induced delay degradation in the runtime. These design requirements, if implemented in separate ways, will increase the complexity of a reliable design and consume more die area. In this paper, a unified capture scheme is proposed to generate programmable clock signals for the detection of both SDDs and circuit aging. Our motivation arises from the observations that SDD detection and online aging prediction both need to capture circuit response ahead of the functional clock. The proposed aging-resistant design method enables the offline test circuit to be reused in online operations. Reversed short channel effect is also exploited to make the underlying circuit resilient to process variations. The proposed scheme is validated by intensive HSPICE simulations. Experimental results demonstrate the effectiveness in terms of low area, power, and performance overheads.
TL;DR: The results show that an asynchronous router for a time-division-multiplexed network-on-chip (NOC) that is being developed for a multi-processor platform for hard real-time systems is 2 times smaller, marginally slower and with roughly the same energy consumption, while offering a robust solution to the clock distribution problem.
Abstract: In this paper we explore the design of an asynchronous router for a time-division-multiplexed (TDM) network-on-chip (NOC) that is being developed for a multi-processor platform for hard real-time systems. TDM inherently requires a common time reference, and existing TDM-based NOC designs are either synchronous or mesochronous, but both approaches have their limitations: a globally synchronous NOC is no longer feasible in today's sub micron technologies and a mesochronous NOC requires special FIFO-based synchronizers in all input ports of all routers in order to accommodate for clock phase differences. This adds hardware complexity and increases area and power consumption. We propose to use asynchronous routers in order to achieve a simpler, more robust and globally-asynchronous NOC, and this represents an unexplored point in the design space. The paper presents a range of alternative router designs. All routers have been synthesized for a 65nm CMOS technology, and the paper reports post-layout figures for area, speed and energy and compares the asynchronous designs with an existing mesochronous clocked router. The results show that an asynchronous router is 2 times smaller, marginally slower and with roughly the same energy consumption, while offering a robust solution to the clock distribution problem. The paper further explores "clock-gating" of the individual pipeline stages in the asynchronous routers, and shows that this can lead to significant power savings.
TL;DR: A novel placement flow with clock-tree aware flip-flop merging and MBFF generation is introduced, and the corresponding algorithms to simultaneously minimize flip- flop power and clock latency when applying MBFFs during placement are proposed.
Abstract: Utilizing multi-bit flip-flops (MBFFs) is one of the most effective power optimization techniques in modern nanometer integrated circuit (IC) design. Most of the previous work apply MBFFs without doing placement refinement of combinational logic cells. Such problem formulation may result in less power reduction due to tight timing constraints with fixed combinational logic cells. This paper introduces a novel placement flow with clock-tree aware flip-flop merging and MBFF generation, and proposes the corresponding algorithms to simultaneously minimize flip-flop power and clock latency when applying MBFFs during placement. Experimental results based on the IWLS-2005 benchmark show that our approach is very effective in not only flip-flop power but also clock latency minimization without degrading circuit performance. To our best knowledge, this is also the first work in the literature which considers clock trees during flip-flop merging and MBFF generation.
TL;DR: In this paper, a method of physical clock topology planning for designing integrated circuits is described, which includes reading an initial placed netlist of an integrated circuit and a floorplan of the integrated circuit design to determine potential enable signals to gate clock signals that clock the plurality of flip flops.
Abstract: In one embodiment of the invention, a method of physical clock topology planning for designing integrated circuits is disclosed. The method includes reading an initial placed netlist of an integrated circuit design and a floorplan of the integrated circuit design, analyzing the integrated circuit design to determine potential enable signals to gate clock signals that clock the plurality of flip flops to reduce power consumption; simultaneously optimizing and placing the clock enable logic gates to gate clock signals to the plurality of flip flops; and minimizing timing variation of the clock signals to the plurality of flip flops.
TL;DR: A microfluidic circuit can automatically sort deformable particles based on the hydrodynamic resistance that the particles induce in a constrained microfluidity channel while flowing through it.
Abstract: A microfluidic circuit can automatically sort deformable particles based on the hydrodynamic resistance that the particles induce in a constrained microfluidic channel while flowing through it.
TL;DR: Experimental results indicate that peak power can be reduced significantly to at least 72% depending on the number of clusters and the phase-shifted clock identified as suitable for the given circuit by the proposed algorithms.
Abstract: Peak power reduction has been a critical challenge in the design of integrated circuits impacting the chip's performance and reliability. The reduction of peak power also reduces the power density of integrated circuits. Due to large IR-voltage drops in circuits, transistor switching slows down giving rise to timing violations and logic failures. In this paper, we present a new clock control strategy for peak-power reduction in VLSI circuits. In the proposed method, the simultaneous switching of combinational paths is minimized by taking advantage of the delay slacks among the paths and clustering the paths with similar slack values. Once the paths are identified based on the path delays and their slack values, the clustering algorithm determines the ideal number of clusters for the given circuit and for each cluster the maximum possible phase shift that can be applied to the clock. The paths are assigned to clusters in a load balanced manner based on the slack values and each cluster will have a phase shift possible on its clock depending on the slack. Thus, the proposed register-transfer level (RTL) method takes advantage of the logic-path timing slack to re-schedule circuit activities at optimal intervals within the unaltered clock period. When switching activities are redistributed more evenly across the clock period, the IC supply-current consumption is also spread across a wider range of time within the clock period. This has the beneficial effect of reducing peak-current draw in addition to reducing RMS power draw without having to change the operating frequency and without utilizing additional power supply voltages as in dual or multi VT approaches. The proposed method is implemented and tested through simulations using an experimental setup with Synopsys Tools Suite and Cadence Tools on the ISCAS'85 benchmark circuits, OpenCore circuits and LEON processor multiplier circuit. Experimental results indicate that peak power can be reduced significantly to at least 72% depending on the number of clusters and the phase-shifted clock identified as suitable for the given circuit by the proposed algorithms. Although the proposed method incurs some power overhead compared to the traditional clocking method, the overhead can be made negligible compared to the peak-power reduction as seen in the experimental results presented.
TL;DR: In this paper, a method for managing the operation of a circuit operating in a slave mode is presented, where the circuit is connected to a bus having at least two of wires and a priority logic level.
Abstract: A method is provided for managing the operation of a circuit operating in a slave mode. The circuit is connected to a bus having at least two of wires and a priority logic level. The slave circuit imposes the priority logic level on a first wire of the bus. While imposing, the slave circuit detects a possible conflict on the first wire resulting from a forcing, external to the slave circuit, of the first wire to another logic level. Upon detecting a conflict, the slave circuit is placed in a state stopping the sending by the circuit of any data over the bus while leaving the circuit listening to the bus.
TL;DR: An 8 × 8 multiplier using DPTAAL is designed and simulated, which exhibits low power and reliable logical operations, and double pass transistor logic (DPL) is introduced to improve the circuit performance at reduced voltage level.
Abstract: Asynchronous adiabatic logic (AAL) is a novel lowpower design technique which combines the energy saving benefits of asynchronous systems with adiabatic benefits. In this paper, energy efficient full adder using double pass transistor with asynchronous adiabatic logic (DPTAAL) is used to design a low power multiplier. Asynchronous adiabatic circuits are very low power circuits to preserve energy for reuse, which reduces the amount of energy drawn directly from the power supply. In this work, an 8 × 8 multiplier using DPTAAL is designed and simulated, which exhibits low power and reliable logical operations. To improve the circuit performance at reduced voltage level, double pass transistor logic (DPL) is introduced. The power results of the proposed multiplier design are compared with the conventional CMOS implementation. Simulation results show significant improvement in power for clock rates ranging from 100MHz to 300MHz.
TL;DR: This work evaluates different types of XOR cells in different voltage conditions and shows that the dual pass transistor XOR has a better performance than the complementary CMOS XOR in 0.6V operation, while the complementaryCMOS Xor has abetter performance in 1.2 V operation.
Abstract: The performance of standard cells has a strong impact on the performance of a circuit synthesized with the cells. Although a complementary CMOS logic is usually used in the standard cells, it is known that a pass transistor logic can improve the performance of a circuit with a smaller area in some cases. We evaluate different types of XOR cells in different voltage conditions. Results show that the dual pass transistor XOR has a better performance than the complementary CMOS XOR in 0.6V operation, while the complementary CMOS XOR has a better performance in 1.2 V operation. More specifically, the area and the power consumption of a benchmark circuit composed of the dual pass transistor XOR can be reduced by 24% and 35%, respectively, compared to those of the same circuit composed of the complementary CMOS XOR in 0.6V operation.
TL;DR: In this paper, an adaptive gate drive circuit that can generate a gate bias voltage with temperature compensation for a MOSFET is disclosed, which can combat higher gate leakage current at higher temperature.
Abstract: An adaptive gate drive circuit that can generate a gate bias voltage with temperature compensation for a MOSFET is disclosed. The adaptive gate drive circuit may generate the gate bias voltage with variable drive capability to combat higher gate leakage current of the MOSFET at higher temperature. The apparatus includes a control circuit and a gate drive circuit. The control circuit generates at least one control signal having a variable frequency determined based on a temperature of the MOSFET sensed by a temperature sensor. For example, a clock divider ratio may be determined based on the sensed temperature of the MOSFET, an input clock signal may be divided based on the clock divider ratio to obtain a variable clock signal, and the control signal(s) may be generated based on the variable clock signal. The gate drive circuit generates a bias voltage for the MOSFET based on the control signal(s).
TL;DR: This paper presents a novel low-power logic family, called asynchronous fine-grain power-gated logic (AFPL), which is comprised of efficient charge recovery logic gates, and a handshake controller, which handles handshaking with the neighboring stages and provides power to the ECRL gates.
Abstract: This paper presents a novel low-power logic family, called asynchronous fine-grain power-gated logic (AFPL) Each pipeline stage in the AFPL circuit is comprised of efficient charge recovery logic (ECRL) gates, which implement the logic function of the stage, and a handshake controller, which handles handshaking with the neighboring stages and provides power to the ECRL gates In the AFPL circuit, ECRL gates acquire power and become active only when performing useful computations, and idle ECRL gates are not powered and thus have negligible leakage power dissipation The partial charge reuse (PCR) mechanism can be incorporated in the AFPL circuit With the PCR mechanism, part of the charge on the output nodes of an ECRL gate entering the discharge phase can be reused to charge the output nodes of another ECRL gate about to evaluate, reducing the energy dissipation required to complete the evaluation of an ECRL gate Moreover, AFPL-PCR adopts an enhanced C-element, called C*-element, in its handshake controllers such that an ECRL gate in AFPL-PCR can enter the sleep mode early once its output has been received by the downstream pipeline stage To mitigate the hardware overhead of the AFPL circuit, two techniques of circuit simplification have been developed
TL;DR: A multi-level simulator for laser-induced fault simulation in digital circuits automatically performs the simulation of laser- induced faults using layout information and laser spot information in order to locate affected gates and derive fault-models.
Abstract: This paper presents a multi-level simulator for laser-induced fault simulation in digital circuits It automatically performs the simulation of laser-induced faults using layout information and laser spot information in order to locate affected gates and derive fault-models The paper mainly focuses on multi-level simulation for obtaining high accuracy of the fault simulation at transistor level and high speed for the simulation of the rest of the circuit This multi-level process allows handling natural and maliciously induced physical phenomenon leading to circuit misbehavior, while dealing with large circuits
TL;DR: This work proposes a new asynchronous logic template, NCL+, which is a modification of NCL to support the return-to-one protocol, and results suggest that a trade-off of power versus forward propagation delay exists.
Abstract: Asynchronous paradigms are a way to deal with hard problems in newer technologies. Among the templates for ensuring efficient asynchronous design, Null Convention Logic (NCL) appears as a fast and relatively low area and power option, enabling semi-custom design. This work proposes a new asynchronous logic template, NCL+, which is a modification of NCL to support the return-to-one protocol. A basic library of NCL+ standard-cells, with different driving strengths enables comparison between NCL and NCL+. While no significant differences in area arise, results suggest that a trade-off of power versus forward propagation delay exists. Accordingly, NCL+ provides more power efficiency and NCL provides smaller forward propagation delays.
TL;DR: In this paper, a display device includes a timing control circuit, a first data driving circuit, and a second data-driving circuit, which is used to adjust a work frequency of the first clock signal to the frequency of a second clock signal.
Abstract: A display device includes a timing control circuit, a first data driving circuit, and a second data driving circuit. The first data driving circuit receives the first clock embedded training data from the timing control circuit, performs a first clock training to adjust a work frequency of the data driving circuit to be equal to the frequency of a first clock signal, and receives the first clock embedded image data from the timing control circuit. The second data driving circuit receives a second clock embedded training data from the timing control circuit, performs a second clock training to adjust a work frequency of the data driving circuit to be equal to the frequency of a second clock signal, and receives the second clock embedded image data from the timing control circuit. The frequency of the first clock signal is different from that of the second clock signal.
TL;DR: A BGR circuit is a switch circuit that alternately switches between a differential input terminal receiving a voltage VIM and an inverted signal VIP as mentioned in this paper, and calculates a moving average value of an output voltage of the BGR in the most recent one clock cycle.
Abstract: A BGR circuit controls a switch circuit in synchronization with a clock signal from a control signal generating circuit and an inverted signal thereof, and thereby, alternately switches between a differential input terminal receiving a voltage VIM and a differential input terminal receiving a voltage VIP. An LPF circuit includes capacitive elements, a switch connected between an input node and each capacitive element, and a switch connected between an output node and each capacitive element. The LPF circuit controls ON/OFF of the switches in synchronization with a clock signal CLK, and thereby, calculates a moving average value of an output voltage of the BGR circuit in the most recent one clock cycle.