TL;DR: This work addresses several issues related to the design of optimal test access architectures that minimize testing time, including the assignment of cores to test buses, distribution of test data width between multiple test bus, and analysis oftest data width required to satisfy an upper bound on the testing time.
Abstract: Test access is a major problem for core-based system-on-a-chip (SOC) designs. Since embedded cores in an SOC are not directly accessible via chip inputs and outputs, special access mechanisms are required to test them at the system level. An efficient test access architecture should also reduce test cost by minimizing test application time. We address several issues related to the design of optimal test access architectures that minimize testing time., including the assignment of cores to test buses, distribution of test data width between multiple test buses, and analysis of test data width required to satisfy an upper bound on the testing time. Even though the decision versions of all these problems are shown to be NP-complete, they can be solved exactly for practical instances using integer linear programming (ILP). As a case study, the ILP models for two hypothetical but nontrivial systems are solved using a public-domain ILP software package.
TL;DR: A technique used to do power analysis of processors at the architecture level provides cycle-by-cycle power consumption data of the architecture on the basis of the instruction/data flow stream to characterize the power dissipation of control units.
Abstract: Architecture-level power estimation has received more attention recently because of its efficiency. This article presents a technique used to do power analysis of processors at the architecture level. It provides cycle-by-cycle power consumption data of the architecture on the basis of the instruction/data flow stream. To characterize the power dissipation of control units, a novel hierarchical method has been developed. Using this technique, a power estimator is implemented for a commercial processor. The accuracy of the estimator is validated by comparing the power values it produces against measurements made by a gate-level power simulator for the same benchmark set. Our estimation approach is shown to provide very efficient and accurate power analysis at the architecture level. The energy models built for first-pass estimation (such as ALU, MAC unit, register files) are reusable for future architecture design modification. In this article, we demonstrate the application of the technique. Furthermore, this technique can evaluate various kinds of software to achieve hardware/software codesign for low power.
TL;DR: In this article, the problem of test pattern generation for single stuck-at faults in combinational circuits, under the additional constraint that the number of specified primary input assignments is minimized, is addressed.
Abstract: This article addresses the problem of test pattern generation for single stuck-at faults in combinational circuits, under the additional constraint that the number of specified primary input assignments is minimized. This problem has different applications in testing, including the identification of "don't care" conditions to be used in the synthesis of Built-In Self-Test (BIST) logic. The proposed solution is based on an integer linear programming (ILP) formulation which builds on an existing Propositional Satisfiability (SAT) model for test pattern generation. The resulting ILP formulation is linear on the size of the original SAT model for test generation, which is linear on the size of the circuit. Nevertheless, the resulting ILP instances represent complex optimization problems, that require dedicated ILP algorithms. Preliminary results on benchmark circuits validate the practical applicability of the test pattern minimization model and associated ILP algorithm.
TL;DR: This work presents a method to optimize clocked circuits by relocating and changing the time of activation of registers to maximize the throughput, based on a modulo scheduling algorithm for software pipelining, instead of retiming.
Abstract: We present a method to optimize clocked circuits by relocating and changing the time of activation of registers to maximize the throughput. Our method is based on a modulo scheduling algorithm for software pipelining, instead of retiming. It optimizes the circuit without the constraint on the clock phases that retiming has, which permits to always achieve the optimal clock period. The two methods have the same overall time complexity, but we avoid the computation of all pair-shortest paths, which is a heavy burden regarding both space and time. From the optimal schedule found, registers are placed in the circuit without looking at where the original registers were. The resulting circuit is a multi-phase clocked circuit, where all the clocks have the same period and the phases are automatically determined by the algorithm. Edge-triggered flip-flops are used where the combinational delays exactly match that period, whereas level-sensitive latches are used elsewhere, improving the area occupied by the circuit. Experiments on existing and newly developed benchmarks show a substantial performance improvement compared to previously published work.
TL;DR: An algorithm that substantially reduces the computational effort required to obtain the exact solution to the Resource Constrained Scheduling (RCS) problem is presented, using a branch-and-bound search technique and efficient techniques to accurately estimate the possible time-steps at which each operation can be scheduled.
Abstract: This article presents an algorithm that substantially reduces the computational effort required to obtain the exact solution to the Resource Constrained Scheduling (RCS) problem. The reduction is obtained by (a) using a branch-and-bound search technique, which computes both upper and lower bounds, and (b) using efficient techniques to accurately estimate the possible time-steps at which each operation can be scheduled and using this to prune the search space. Results on several benchmarks with varying resource constraints indicate the clear superiority of the algorithm presented here over traditional approaches using integer linear programming, with speed-ups of several orders of magnitude.
TL;DR: The resulting on-chip fault latencies are one ten-thousandth of previously reported system level concurrent error detection and diagnosis latencies.
Abstract: We report a register transfer level technique for concurrent error detection and diagnosis in data dominated designs called Introspection. Introspection uses idle computation cyles in the data path and idle data transfer cycles in the interconnection network in a synergistic fashion for concurrent error detection and diagnosis (CEDD). The resulting on-chip fault latencies are one ten-thousandth (10-4) of previously reported system level concurrent error detection and diagnosis latencies. The associated area overhead and performance penalty are negligible. We derive a cost function that considers introspection constraints such as (i) executing an operation on three disjoint function units for diagnosis and (ii) promoting function units to participate in at least one CEDD operation. We formulate integration of introspection constraints into the operation-to-operator binding phase of high-level synthesis as a weighted bipartite matching problem. The effectiveness of introspection and its implementation are illustrated on numerous industrial strength benchmarks.
TL;DR: This paper addresses the problem of verifying the equivalence of two sequential circuits by identifying equivalent flip-flop pairs using an induction-based algorithm and generalizing the idea of exploring the structural similarity between circuits to perform verification in stages.
Abstract: In this paper we address the problem of verifying the equivalence of two sequential circuits. State-of-the-art sequential optimization techniques such as retiming and sequential redundancy removal can handle designs with up to hundreds or even thousands of flip-flops. However, the BDD-based approaches for verifying sequential equivalence can easily run into memory explosion for such designs. In an attempt to handle larger circuits, we modify test pattern-generation techniques for verification. The suggested approach utilizes the popular efficient backward-justification technique used in most sequential ATPG programs. We present several techniques to enhance the efficiency of this approach by (1) identifying equivalent flip-flop pairs using an induction-based algorithm, and (2) generalizing the idea of exploring the structural similarity between circuits to perform verification in stages. This ATPG-based framework is suitable for verifying circuits either with or without a reset state. In order to extend this approach to verify retimed circuits, we introduce a delay-compensation-based algorithm for preprocessing the circuits. The experimental results of verifying the correctness of circuits after sequential redundancy removal and retiming with up to several hundred flip-flops are presented.
TL;DR: The C-1-D property can be used directly on specifications for which it naturally holds a condition that has not been exploited thus far in FSM verification, and can be enforced on arbitrary FSMs by exposing some of the latch outputs as pseudo-primary outputs during synthesis and verification.
Abstract: This article introduces the notion of a Complete-1-Distinguishability (C-1-D) property for simplifying equivalence checking of finite state machines (FSMs). When a specification machine has the C-1-D property, the traversal of the product machine can be eliminated. Instead, a much simpler check suffices. The check consists of first obtaining a 1-equivalence mapping between the individually reachable states of the specification and the implementation machines, and then checking that it is a bisimulation relation. The C-1-D property can be used directly for specification machines on which it naturally holds---a condition that has not been exploited thus far in FSM verification. We also show how this property can be enforced on an arbitrary FSM by exposing some of its latch outputs as pseudo-primary outputs during synthesis and verification. In this sense, our synthesis/verification methodology provides another point in the trade-off curve between constraints-on-synthesis versus complexity-of-verification. Practical experiences with this methodology have resulted in success with several examples for which it is not possible to complete verification using existing implicit state space traversal techniques.
TL;DR: It is shown that the incompleteness problem in transformational design is closely related to the syntactic variance problem in high-level synthesis and that this latter problem is not solvable in general either.
Abstract: The completeness of a collection of design transformations is an important aspect in transformational design. Completeness guarantees that any correct design can in principle be explored using the transformation system. In the field of transformational design the problem of incompleteness is not well understood and it is often believed that complete transformation systems can be constructed. In this article, we show, using a formal framework based on the theory of computation, that this is not the case if the transformation system is based on an expressive general-purpose design language such as VHDL. Only when restrictions are imposed on the design language and correctness relation, a transformation system can be made complete in theory, but this is expected to result in serious practical problems. It is shown that the incompleteness problem in transformational design is closely related to the syntactic variance problem in high-level synthesis and that this latter problem is not solvable in general either.
TL;DR: An algorithm for automatic matching of a design function to a device from a component database that generates an interface that can automatically adapt the device to behave as the function.
Abstract: Component reuse techniques have been a recent focus of research because they are seen as the next-generation techniques to handle increasing system complexities. However, there are several unresolved issues to be addressed and prominent among them is the issue of component matching. As the number of reusable components in a component database grows, the task of manually matching a component to the user requirements becomes infeasible. Automating this matching can help in rapid system prototyping, improving quality and reducing cost. In addition, if the matching algorithm is sound, this approach can also reduce precious validation effort.In this article, we propose an algorithm for automatic matching of a design function to a device from a component database. The distinguishing feature of the algorithm is that when successful, it generates an interface that can automatically adapt the device to behave as the function. The algorithm is based on a new simulation relation called forced simulation that is shown to be a necessary and sufficient condition for component matching to be possible for a given pair of function and device. We demonstrate the application of the algorithm by reusing on some programmable components of the Intel family.
TL;DR: An algorithm for propagating constraints and hierarchically pipelining a given throughput-constrained system is presented and it is indicated that it may be efficiently used for synthesizing or estimating within system-level design.
Abstract: Behavioral specifications of DSP systems generally contain a number of nested loops. In order to obtain high date rates for such systems, it is necessary to pipeline the system within the behavior, within the loop bodies, and also within the operations. In order to hierarchically pipeline a performance-constrained system, an important step consists of distributing the performance constraint among the loops in such a manner that the constraint is satisfied and design cost is minimized. This paper presents an algorithm for propagating constraints and hierarchically pipelining a given throughput-constrained system. Along with pipelining, the algorithm schedules the operations within the loop bodies and selects components for them, with the aim of minimizing cost while satisfying the constraint propagated to the loop body. Results demonstrate the necessity of pipelining across the three granularity levels in order to obtain high performance designs. They also demonstrate the feasibility and quality of our approach, the indicate that it may be efficiently used for synthesizing or estimating within system-level design.
TL;DR: It is shown that the constrained polygon transformation problem is NP-hard and several fast algorithms that produce results within a few percent of a theoretical lower bound on several floorplans are presented.
Abstract: A productivity-driven methodology for incremental floorplanning is described and the constrained polygon transformation problem, a key step of this methodology, is formulated. The input to the problem consists of a floorplan computed using area estimates and the actual area required for each subcircuit of the floorplan. Informally, the objective is to change the areas of the modules without drastically changing their shapes or locations. We show that the constrained polygon transformation problem is NP-hard and present several fast algorithms that produce results within a few percent of a theoretical lower bound on several floorplans.
TL;DR: The automatic synthesis algorithm in this paper combines exact (MILP-based) and heuristic techniques to solve the problem of manual synthesis of large time-constrained applications, striking a good balance between quality of results and synthesis time.
Abstract: Large time-constrained applications are highly computer-intensive and are often implemented as a complex organization of pipelined data parallel tasks on a pool of embedded processors, DSP processors, and FPGAs. The large number of design alternatives available at each task level, the application as a whole, and the special needs of the reconfigurable devices (such as the FPGA) make the manual synthesis of such systems very tedious. The automatic synthesis algorithm in this paper combines exact (MILP-based) and heuristic techniques to solve this problem, which basically involves (1) propagation of timing constraints; (2) pipelining the loops to meet throughput requirements; (3) resource selection and allocation, keeping the processing requirements and the timing constraints in view; (4) scheduling the resources across the tasks to ensure maximum utilization; and (5) hiding the reconfiguration delays of the FPGAs. While the use of MILP techniques helps in getting high-quality results, combining them with heuristics ensures acceptable synthesis times, striking a good balance between quality of results and synthesis time. Our experimental evaluation of the algorithm shows an average 40% in resource cost reduction (compared to manual synthesis) with synthesis times from minutes to as low as a few seconds in some cases.
TL;DR: This paper proposes transformation of a behavior before scheduling and assignment, namely introducing redundant computations such that the resulting data path is testable using few BIST resources.
Abstract: The need for considering BIST requirements during the scheduling and assignment stages of behavioral synthesis has been demonstrated in previous research and techniques for reducing BIST resources of a data path during these stages of synthesis have been developed. However, the degree of freedom that can be exploited during scheduling and assignment to minimize these resources is often limited by the data and control dependencies of a behavior. In this paper, we propose transformation of a behavior before scheduling and assignment, namely introducing redundant computations such that the resulting data path is testable using few BIST resources. The transformation makes use of spare capacity of modules to add redundancy that enables test paths to be shared among the modules. A technique for identifying potential BIST resource sharing problems in a behavior and resolving them by redundant computations is presented. Introduiction of redundant computations is performed without compromising the latency and functional resource requirement of the behavior.
TL;DR: An on-chip test pattern generator that uses an one-dimensional cellular automaton (CA) to generate either a precomputed sequence of test patterns or pairs ofTest patterns for path delay faults is proposed, the first approach that guarantees successful on- chip generation of a given test pattern sequence using a finite number of CA cells.
Abstract: We propose an on-chip test pattern generator that uses an one-dimensional cellular automaton (CA) to generate either a precomputed sequence of test patterns or pairs of test patterns for path delay faults. To our knowledge, this is the first approach that guarantees successful on-chip generation of a given test pattern sequence (or a given test set for path delay faults) using a finite number of CA cells. Given a pair of columns (Cu, Cv) of the test matrix, the proposed method uses alternative “link procedures” Pj that compute the number of extra CA cells to enable the generation of (Cu, Cv) by the CA. A systematic approach uses the link procedures to minimize the total number of needed CA cells. The performance of the scheme depends on an appropriate choice of link procedures Pj.
TL;DR: A technique for automatic exploration of architectural alternatives in the design of complex electronic embedded systems and systems-on-a-chip by transforming the problem into a set of simple model-to-model operations and a mapping algorithm that becomes the core of the entire design process.
Abstract: We present a technique for automatic exploration of architectural alternatives in the design of complex electronic embedded systems and systems-on-a-chip. The technique transforms the problem into a set of simple model-to-model operations and a mapping algorithm that becomes the core of the entire design process. The mapping algorithm is formulated as an assignment-type problem (ATP), which is, in turn, solved by a straightforward optimization method. The result is a design assistance tool, which is demonstrated through a telecommunication systems example.
TL;DR: A notion of equivalence is established for gate-level netlists containing black boxes, and a procedure is introduced that computes the complete don't care set and can achieve more minimization than conventional synthesis.
Abstract: We are concerned with optimizing gate-level netlists containing “black boxes,” that is, components whose functionality is not available to the optimization tool. We establish a notion of equivalence for gate-level netlists containing black boxes, and prove that it is sound and complete. We show that conventional approaches to optimizing such netlists fail to fully exploit the don't care flexibility available for synthesis. Based on our new notion of equivalence, we introduce a procedure that computes the complete don't care set. Experiments indicate that our procedure can achieve more minimization than conventional synthesis.
TL;DR: The paper provides the first reports on pessimistic and optimistic diagnostic measures for all faults of the large ISCAS circuits with known deterministic tests and modified to diagnose defects, given the output responses of failing devices.
Abstract: This article describes a diagnostic fault simulator for stuck-at faults in sequential circuits that is both time and space efficient. The simulator represents indistinguishable classes of faults as memory efficient lists. The use of lists reduces the number of output response comparisons between faults and hence speeds up the simulation process. The lists also make it easy to drop faults when they are fully distinguished from other faults. Experimental results on the ISCAS89 circuits show that the simulator runs significantly faster than an earlier work based on distinguishability matrices, and for large circuits is faster and more memory efficient than a recent method based on lists of indistinguishable faults. The paper provides the first reports on pessimistic and optimistic diagnostic measures for all faults of the large ISCAS circuits with known deterministic tests. The diagnostic fault simulator has also been modified to diagnose defects, given the output responses of failing devices. Results on simulated bridging defects show that the diagnosis time is comparable to the time for fault simulation with fault dropping.
TL;DR: The key to the approach is the addition of logic to the system that interacts with the existing controller to push the effects of controller faults into the data flow, so that they can be observed at the datapath registers rather than directly at the controller outputs.
Abstract: In systems consisting of interacting datapaths and controllers and utilizing built-in self test (BIST), the datapaths and controllers are traditionally tested separately by isolating each component from the environment of the system during test This work facilitates the testing of datapath/controller pairs in an integrated fashion The key to the approach is the addition of logic to the system that interacts with the existing controller to push the effects of controller faults into the data flow, so that they can be observed at the datapath registers rather than directly at the controller outputs The result is to reduce the BIST overhead over what is needed if the datapath and controller are tested independently, and to allow a more complete test of the interface between datapath and controller, including the faults that do not manifest themselves in isolation Fault coverage and overhead results are given for four example circuits
TL;DR: POSE can be applied especially to system-level synthesis, whose targets can be parallel computer architectures, systems-on-chip, or embedded systems, and can be easily integrated with other heuristic design methodologies to allow increased design efficiency.
Abstract: Design automation tools and methodologies always encounter a problem of how systems may be designed efficiently, including issues such as static modeling and dynamic manipulation of system parts. With the rapid progress of design technology, the continuously increasing number of different choices per system part and the growing complexity of today's systems, the efficiency of the design environment is not only a major concern now, but will also be a demanding problem in the near future. In contrast to heuristic methods, a novel environment called POSE is proposed that increases efficiency during design without losing optimality in the final design results. System parts are modeled using the popular object-oriented modeling technique and are dynamically manipulated using the parallel design technique. A complete integration of object-oriented and parallel techniques is one of the major feature of POSE. Common problems related to parallel design such as emptiness and deadlock are also elegantly solved within POSE. Experimental results and formal analysis based on POSE all show its practical and theoretical usefulness. POSE can be used at any level of synthesis as long as off-the-shelf building-blocks manipulation is required. POSE can be applied especially to system-level synthesis, whose targets can be parallel computer architectures, systems-on-chip, or embedded systems. We will show how POSE has been applied to ICOS, a recently proposed synthesis methodology. Furthermore, POSE can be easily integrated with other heuristic design methodologies to allow increased design efficiency.
TL;DR: These new techniques enable us to reduce both the time and space complexities of the previously best known approximation algorithms by more than a factor of n and n2 for rectangular and L-shaped subfloorplans, respectively.
Abstract: As the sizes of many IC design problems become increasingly larger, approximation has become a valuable approach for arriving at satisfactory results without incurring exorbitant computational cost. In this paper, we present several approximation techniques for solving floorplan area minimization problems. These new techniques enable us to reduce both the time and space complexities of the previously best known approximation algorithms by more than a factor of n and n2 for rectangular and L-shaped subfloorplans, respectively (where n is the number of given implementions). The improvements in the time and space complexities of such approximation techniques is critical to their applicability in floorplan area minimization algorithms. The techniques are quite general, and may be applicable to other classes of approximation problems.
TL;DR: The approach synthesizes the pipeline structure from a given instruction set architecture specification and generates a set of reordering constraints that guides the compiler back-end to properly schedule instructions so that possible pipeline hazards are avoided and throughput is improved.
Abstract: This paper presents a hardware/software co-synthesis approach to pipelined ISP (instruction set processor) design. The approach synthesizes the pipeline structure from a given instruction set architecture (behavioral) specification. In addition, it generates a set of reordering constraints that guides the compiler back-end (reorderer) to properly schedule instructions so that possible pipeline hazards are avoided and throughput is improved.Co-synthesis takes place while resolving pipeline hazards, which can be attributed to interin-struction dependencies (IIDs). An extended taxonomy of IIDs have been proposed for the systematic analysis of pipeline hazards. Hardware/software methods are developed to resolve IIDs. Algorithms based on taxonomy and resolutions are constructed and integrated into the pipeline synthesis process to explore hardware and software design space. Application benchmarks are used to evaluate possible designs and guide the design decision. The power of the co-synthesis tool PIPER is demonstrated through pipeline synthesis of one illustrative example and two ISPs, including an industrial one (TDY-43). In comparison with other related approaches, our approach achieves higher throughput and provides a systematic way to explore the hardware/software trade-off.
TL;DR: The tests reveal that the proposed algorthm can not only remove the parasitic effects of the test buses but also tolerate test signal variations and is robust enough to handle loud environmental noise and the nonlinearity of the switching devices.
Abstract: A parasitic effect removal methodology is proposed to handle the large parasitic effects in analog testability buses. The removal is done by an on-chip test generation technique and an intrinsic response extraction algorithm. On-chip test generation creates test signals on-chip to avoid the parasitic effects of the test application bus. The intrinsic response extraction cross-checks and cancels the parasitic effects of both test application and response observation paths. The tests using both SPICE simulation and MNABST-1 P1149.4 test chip reveal that the proposed algorthm can not only remove the parasitic effects of the test buses but also tolerate test signal variations. Furthermore, it is robust enough to handle loud environmental noise and the nonlinearity of the switching devices.
TL;DR: A survey of the state-of-the-art techniques used in performing data and memory-related optimizations in embedded systems, covering a broad spectrum of optimization techniques that address memory architectures at varying levels of granularity.
Abstract: We present a survey of the state-of-the-art techniques used in performing data and memory-related optimizations in embedded systems. The optimizations are targeted directly or indirectly at the memory subsystem, and impact one or more out of three important cost metrics: area, performance, and power dissipation of the resulting implementation.We first examine architecture-independent optimizations in the form of code transoformations. We next cover a broad spectrum of optimization techniques that address memory architectures at varying levels of granularity, ranging from register files to on-chip memory, data caches, and dynamic memory (DRAM). We end with memory addressing related issues.
TL;DR: A memory exploration procedure based on three performance metrics, namely, cache size, the memory access time and the energy consumption, is presented and the importance of including energy in the performance metrics is shown.
Abstract: In embedded system design, the designer has to choose an on-chip memory configuration that is suitable for a specific application. To aid in this design choice, we present a memory exploration procedure based on three performance metrics, namely, cache size, the memory access time and the energy consumption. We show the importance of including energy in the performance metrics, since an increase in the cache size and line size reduces the memory access time but does not necessarily reduce the energy consumption. The memory exploration procedures enable us to find the cache configuration (cache size, line size) that satisfies the area and time constraints while minimizing the energy consumption, and the cache configuration that satisfies the area and energy constraints while minimizing the memory access time. The exploration procedures for cache configuration is very efficient since it considers only a selected set of candidate points. Finally, we validate our exploration procedures by running simulation experiments on MediaBench applications.
TL;DR: New tighter sufficiency conditions for slicibility of rectangular graphs are postulated and utilized in the generation of area-optimal floorplans to help in reducing the total effort for topology generation, and in solving problems of larger size.
Abstract: Rectangular dualization method of floorplanning usually involves topology generation followed by sizing. Slicible topologies are often preferred for their simplicity and efficiency. While slicible topologies can be obtained efficiently, existing linear-time algorithms for topology generation from a given rectangular graph does not guarantee slicible topologies even if one exists. Moreover, the class of rectangular graphs, known as inherently nonslicible graphs, do not have any slicible topologies. In this article, new tighter sufficiency conditions for slicibility of rectangular graphs are postulated and utilized in the generation of area-optimal floorplans. These graph-theoretic conditions not only capture a larger class of slicible rectangular graphs but also help in reducing the total effort for topology generation, and in solving problems of larger size.
TL;DR: This paper considers the delay minimization problem of an interconnect wire by simultaneously considering buffer insertion, buffer sizing and wire sizing and provides elegant closed form optimal solutions for all three problems.
Abstract: In this paper, we consider the delay minimization problem of an interconnect wire by simultaneously considering buffer insertion, buffer sizing and wire sizing. We consider three cases, namely using no buffer (i.e., wire sizing alone), using a given number of buffers, and using the optimal number of buffers. We provide elegant closed form optimal solutions for all three problems. These closed form solutions are useful in early stages of the VLSI design flow such as logic synthesis and floorplanning.
TL;DR: This work developed a processor model that captures the connectivity, the parallelism, and all architectural peculiarities of an embedded processor, and implemented a retargetable and optimizing compiler working on this model.
Abstract: Embedded processors in electronic systems typically are tuned to a few applications. Development of processor-specific compilers is prohibitively expensive and, as a result, such compilers, if existing, yield code of an unacceptable quality. To improve this code quality, we developed a processor model that captures the connectivity, the parallelism, and all architectural peculiarities of an embedded processor. We also implemented a retargetable and optimizing compiler working on this model. We present the graph-based processor model, and formally define the code generation task as binding the intermediate representation of an application to this model. We also present a new method for code selection, based on this processor model, that is capable of handling directed acyclic graphs instead of trees.