Conference
International Conference on Application Specific Array Processors
About: International Conference on Application Specific Array Processors is an academic conference. The conference publishes majorly in the area(s): Very-large-scale integration & Parallel algorithm. Over the lifetime, 291 publications have been published by the conference receiving 2888 citations.
Topics: Very-large-scale integration, Parallel algorithm, Systolic array, Signal processing, Algorithm design
Papers
25 Oct 1993
TL;DR: The presented optimization criterion jointly minimizes context-switching overhead caused by an activation of a block and maximizes the degree of vector processing of the important class of "single appearance minimum activation schedules" (SAMAS).
Abstract: For the design of complex digital signal processing systems, block diagram oriented synthesis of real time software for programmable target processors has become an important design aid. The synthesis approach discussed in this paper is based on multirate block diagrams with scalable synchronous dataflow (SSDF) semantics. For this class of dataflow graphs optimum vectorization techniques are introduced. Vectorization is treated as a transformation on an SSDF graph which increases the number of samples consumed or produced per activation of a block according to a specific optimization criterion. The presented optimization criterion jointly minimizes context-switching overhead caused by an activation of a block and maximizes the degree of vector processing of the important class of "single appearance minimum activation schedules" (SAMAS). This class comprises schedules in which each block appears exactly once and is activated minimum times. First, "single appearance" implies the most compact implementation of a schedule in terms of program memory. Second, "minimum activation" implies increased throughput according to optimum vectorization and minimal context-switching. >
107 citations
2 Sep 1991
TL;DR: An analytical model for the behavior of dataflow graphs with data-dependent control flow that can be analyzed to construct an annotated schedule, or a static schedule that annotates each firing of an actor with the Boolean conditions under which that firing occurs.
Abstract: This paper describes an analytical model for the behavior of dataflow graphs with data-dependent control flow. The number of tokens produced or consumed by each actor is given as a symbolic function of the Booleans in the system. Long term averages can be analyzed to determine consistency of token flow rates, which in turn determines whether memory requirements are bounded. Short-term behavior can be analyzed to construct an annotated schedule, or a static schedule that annotates each firing of an actor with the Boolean conditions under which that firing occurs. Annotated schedules can be used to generate efficient implementations of the algorithms given by the dataflow graphs. >
106 citations
25 Oct 1993
TL;DR: The authors present a multiplier, the reduced area multiplier, with a novel reduction scheme which results in fewer components and less interconnect overhead than either Wallace or Dadda multipliers.
Abstract: As developed by Wallace (1964) and Dadda (1965), a high-speed method for the parallel multiplication of two binary numbers is to reduce their partial products to two numbers whose sum is equal to the product. The resulting two numbers are then summed using a fast carry-propagate adder. The authors present a multiplier, the reduced area multiplier, with a novel reduction scheme which results in fewer components and less interconnect overhead than either Wallace or Dadda multipliers. This reduction scheme is especially useful for pipelined multipliers, because it minimizes the number of latches required in the reduction of the partial products. Equations are given for determining the number of components and a method is presented for estimating the interconnect overhead for Wallace, Dadda and reduced area multipliers. Area estimates indicate that pipelined reduced area multipliers require 3 to 8% less area than equivalent Wallace multipliers and 15 to 25% less area than equivalent Dadda multipliers. >
95 citations
4 Aug 1992
TL;DR: Two architectures for high speed VLSI implementations of the Soft-Output Viterbi-Algorithm are proposed and area estimates are given for both architectures and well known trade-off between computational complexity and storage requirements is played.
Abstract: During the last few years decoding algorithms that make not only the use of soft quantized inputs but also deliver soft decision outputs have attracted considerable attention because additional coding gains are obtainable in concatenated systems. A prominent member of this class of algorithms is the soft-output viterbi algorithm. In this paper two architectures for high speed VLSI implementations of the soft-output viterbi-algorithm are proposed and area estimates are given for both architectures. The well known trade-off between computational complexity and storage requirements is played to obtain new VLSI architectures with increased implementation efficiency. Area savings in excess of 40% in comparison to straightforward solutions are reported. >
60 citations
22 Aug 1994
TL;DR: The number of multipliers and adders required for both the folded and digit-serial lattice-based architectures approaches one-half the number required to implement similar systems based on direct-form filter implementations as the order of the FIR filters becomes large.
Abstract: This paper presents efficient single-rate architectures for the orthonormal discrete wavelet transform (DWT). Folded and digit-serial architectures are derived from an efficient lattice implementation of two-channel FIR paraunitary systems known as the quadrature mirror filter (QMF) lattice. Folded architectures are derived by applying systematic folding techniques to multirate systems. For digit-serial architectures, we show that any two-channel subband system can be implemented using digit-serial processing techniques by utilizing the polyphase decomposition. Using this result, we describe an orthonormal DWT architecture which uses the QMF lattice structure and digit-serial processing techniques. The number of multipliers and adders required for both the folded and digit-serial lattice-based architectures approaches one-half the number required to implement similar systems based on direct-form filter implementations as the order of the FIR filters becomes large. This makes folded and digit-serial QMF lattice structures attractive choices for applications of the orthonormal DWT which require low area and low power dissipation. >
59 citations
Performance Metrics
| Year | Papers |
|---|---|
| 1995 | 38 |
| 1994 | 41 |
| 1993 | 61 |
| 1992 | 50 |
| 1991 | 33 |
| 1990 | 68 |