TL;DR: In this article, a detailed cycle-accurate interconnection network model (GARNET) is proposed to simulate a CMP architecture with virtual channel (VC) flow control.
Abstract: Until very recently, microprocessor designs were computation-centric. On-chip communication was frequently ignored. This was because of fast, single-cycle on-chip communication. The interconnect power was also insignificant compared to the transistor power. With uniprocessor designs providing diminishing returns and the advent of chip multiprocessors (CMPs) in mainstream systems, the on-chip network that connects different processing cores has become a critical part of the design. Transistor miniaturization has led to high global wire delay, and interconnect power comparable to transistor power. CMP design proposals can no longer ignore the interaction between the memory hierarchy and the interconnection network that connects various elements. This necessitates a detailed and accurate interconnection network model within a full-system evaluation framework. Ignoring the interconnect details might lead to inaccurate results when simulating a CMP architecture. It also becomes important to analyze the impact of interconnection network optimization techniques on full system behavior. In this light, we developed a detailed cycle-accurate interconnection network model (GARNET), inside the GEMS full-system simulation framework. GARNET models a classic five-stage pipelined router with virtual channel (VC) flow control. Microarchitectural details, such as flit-level input buffers, routing logic, allocators and the crossbar switch, are modeled. GARNET, along with GEMS, provides a detailed and accurate memory system timing model. To demonstrate the importance and potential impact of GARNET, we evaluate a shared and private L2 CMP with a realistic state-of-the-art interconnection network against the original GEMS simple network. The objective of the evaluation was to figure out which configuration is better for a particular workload. We show that not modeling the interconnect in detail might lead to an incorrect outcome. We also evaluate Express Virtual Channels (EVCs), an on-chip network flow control proposal, in a full-system fashion. We show that in improving on-chip network latency-throughput, EVCs do lead to better overall system runtime, however, the impact varies widely across applications.
TL;DR: It is shown that the controllability of a multi-agent system can be uniquely determined by the topology structure of interconnection graph, for which the investigation comes down to that for a multi -agent system with the interconnectiongraph being connected.
TL;DR: In this paper, the authors present a method for producing a semiconductor device comprising the steps of: providing semiconductor substrate (1), comprising active components on the surface of said substrate, depositing a top layer of dielectric material on the substrate or on other dielectrics layers present on said surface, etching at least one first opening (7 ) at least through said top layer, filling said opening(s) at least with a first conductive material (8 ), and performing a first CMP step, to form the second conductive structures (3,26 ),
Abstract: The present disclosure is related to method for producing a semiconductor device comprising the steps of: providing a semiconductor substrate ( 1 ), comprising active components on the surface of said substrate, depositing a top layer ( 2 ) of dielectric material on the surface of said substrate or on other dielectric layers present on said surface, etching at least one first opening ( 7 ) at least through said top layer, filling said opening(s) at least with a first conductive material ( 8 ), and performing a first CMP step, to form said first conductive structures ( 3,26 ), etching at least one second opening ( 13 ) at least through said top layer, filling said opening(s) at least with a second conductive material ( 10 ), and performing a second CMP step, to form said second conductive structures ( 4,24 ), wherein the method comprises the step of depositing a common CMP stopping layer ( 5,25 ) on said dielectric top layer, before the steps of etching and filling said first opening(s), so that said same CMP stopping layer is used for stopping the CMP process after filling of the first opening(s) as well as the CMP process after filling of the second opening(s). The disclosure is equally related to devices obtainable by the method of the disclosure.
TL;DR: In this paper, an approach for high density packaging of semiconductor chips using silicon space transformer chip level package structures, which allow high density chip interconnection and/or integration of multiple chips or chip stacks high I/O interconnection, and heterogeneous chip or function integration.
Abstract: Apparatus and methods are provided for high density packaging of semiconductor chips using silicon space transformer chip level package structures, which allow high density chip interconnection and/or integration of multiple chips or chip stacks high I/O interconnection and heterogeneous chip or function integration.
TL;DR: In this article, the authors attach at least two isolated electronic components to an elastomeric substrate, and arrange an electrical interconnection between the components in a boustrophedonic pattern interconnecting the two isolated components with the interconnection.
Abstract: In embodiments, the present invention may attach at least two isolated electronic components to an elastomeric substrate, and arrange an electrical interconnection between the components in a boustrophedonic pattern interconnecting the two isolated electronic components with the electrical interconnection. The elastomeric substrate may then be stretched such that the components separate relative to one another, where the electrical interconnection maintains substantially identical electrical performance characteristics during stretching, and where the stretching may extend the separation distance between the electrical components to many times that of the unstretched distance.
TL;DR: A polymer-based flexible tactile sensor array integrated with interconnection terminals was fabricated with polymer MEMS technologies and electroplating as discussed by the authors, which can be easily connected with the sensor array mounting PCB board by inserting the flexible cable into the pluggable connector socket on the board.
Abstract: A polymer-based flexible tactile sensor array integrated with interconnection terminals was fabricated with polymer MEMS technologies and electroplating. The tactile sensing arrays were made as 4 × 4, 8 × 8, 16 × 16 and 32 × 32 sensing elements connected with a pluggable connector. Both the tactile sensor arrays and the interconnection terminals were fabricated together on the same polymer substrate in order to integrate them as a sensor module. The tactile sensor array with FFC (flexible flat cable) can be easily connected with the sensor array mounting PCB board by inserting the flexible cable into the pluggable connector socket on the board. The number and pitch of the fabricated interconnection terminals for 32 × 32 arrays were 68 and 500 μm, respectively. Continuous normal force ranging 0–1 N was applied to a unit tactile sensor for evaluating the tactile sensor characteristics through the integrated interconnection terminals. The measured resistance increased linearly with normal force in the range of 0–0.6 N but the resistance variation rate decreased above 0.6 N. The variation rate of the resistance is about 2.0%/N in the range of 0–0.6 N and 1%/N in the range of 0.6–1 N. The characterization of the tactile sensor array for the rod bar and a fingertip was successfully evaluated using the custom designed controller and the window-based 3D display software.
TL;DR: A case is made for a new approach to designing on-chip interconnection networks that addresses the challenges of energy, space, and design complexity.
Abstract: Buffers in on-chip networks consume significant energy, occupy chip area, and increase design complexity. In this paper, we make a case for a new approach to designing on-chip interconnection netwo...
TL;DR: This paper designs a low-diameter 3D network using low-radix routers and implements a small-to-medium sized clique network in different layers of a 3D chip, taking advantage of the start-of-the-art one-hop vertical communication design and utilizing lateral long wires to shorten network paths.
Abstract: Interconnection plays an important role in performance and power of CMP designs using deep sub-micron technology. The network-on-chip (NoCs) has been proposed as a scalable and high-bandwidth fabric for interconnect design. The advent of the 3D technology has provided further opportunity to reduce on-chip communication delay. However, the design of the 3D NoC topologies has important distinctions from 2D NoCs or off-chip interconnection networks. First, current 3D stacking technology allows only vertical inter-layer links. Hence, there cannot be direct connections between arbitrary nodes in different layers — the vertical connection topology are essentially fixed. Second, the 3D NoC is highly constrained by the complexity and power of routers and links. Hence, low-radix routers are preferred over high-radix routers for lower power and better heat dissipation. This implies long network latency due to high hop counts in network paths. In this paper, we design a low-diameter 3D network using low-radix routers. Our topology leverages long wires to connect remote intra-layer nodes. We take advantage of the start-of-the-art one-hop vertical communication design and utilize lateral long wires to shorten network paths. Effectively, we implement a small-to-medium sized clique network in different layers of a 3D chip. The resulting topology generates a diameter of 3-hop only network, using routers of the same radix as 3D mesh routers. The proposed network shows up to 29% of network latency reduction, up to 10% throughput improvement, and up to 24% energy reduction, when compared to a 3D mesh network.
TL;DR: The developments of ultra fine pitch and high density solder microbumps for advanced 3D stacking technologies are highlighted and various reliability tests were carried out for mechanical characterization of microbump interconnections.
Abstract: Memory bandwidth has become a bottleneck to processor performance for tera-scale computing needs. To reduce this obstacle, a revolution in package technologies is required for tera-scale computing requirements. 3D TSV (Through Silicon Via) stacking is believed to be one of the technologies that can meet those requirements. In advanced 3D stacking technologies, one of the important steps is to develop and assemble fine pitch, high density solder microbumps. This type of solder microbump in flip chip interconnection provides a high wiring density in silicon die with a high-performance signal and power connection. There is a growing interest in the development and study of this new type of chip stacking and bonding approach for both existing and future devices. This paper will highlight the developments of ultra fine pitch and high density solder microbumps for advanced 3D stacking technologies. A Cu/SnAg solder microbump with 50/40 µm in pitch was fabricated at the silicon wafer level by an electroplating method. The total thickness of the plated Cu and SnAg microbump was 20um. The under bump metallurgy (UBM) layer on the Si carrier used thin film based metal layers. The assembly of the Si chip and the Si carrier was conducted with the thermocompression flip chip bonder at different temperatures, times and pressures and the optimized bonding conditions were obtained. After assembly, the underfill process was carried out to fill the gap and achieve a void free underfilling using a material with a fine filler size. Finally, various reliability tests were carried out for mechanical characterization of microbump interconnections.
TL;DR: Experimental result shows that history-based DFS successfully adjusts link frequency to track actual link utilization over time, demonstrating the feasibility of the proposed link as a power-aware interconnection network for system-on-chip (SoC).
TL;DR: In this article, the power oscillation damping (POD) controllers implemented in the two thyristor controlled series compensators of the Brazilian North-South (NS) interconnection, in the year 1999, were solely intended to damp the low frequency NS oscillation mode.
Abstract: The power oscillation damping (POD) controllers implemented in the two thyristor controlled series compensators of the Brazilian North-South (NS) interconnection, in the year 1999, were solely intended to damp the low-frequency NS oscillation mode. These controllers are still under operation and are derived from the modulus of the active power flow in the NS line that is phase-lagged at the frequency of the NS mode and may experience relatively large excursions generated by exogenous disturbances. This paper utilizes the same 1999 data to compare the performance of a proposed robust POD controller design with those of two conventional designs. A recent robust control synthesis algorithm used in this work is based on a nonsmooth optimization technique and has the capability to handle various controller structures, including reduced-order, and to deal with time-domain constraints on both controlled and measured outputs. Moreover, the nonsmooth design technique encompasses multiple operating conditions subject to various test signals, hence building a truly time-domain multi-scenarios approach. According to the results discussed hereafter, this is a key advantage in the industrial context of increasing demand for performance and robustness. The described results relate to a large-scale system model used in the feasibility studies for that interconnection.
Abstract: A power converter apparatus that includes a substrate, plate-like positive and negative interconnection members, and capacitors is disclosed. Pairs of groups of switching elements are mounted on the substrate. Each of the positive interconnection member and the negative interconnection member has a terminal portion. The terminal portion has a joint portion that is electrically joined to a circuit pattern on the substrate. The switching elements are arranged in the same number on both sides of the joint portion of at least the positive interconnection member of the positive and negative interconnection members.
TL;DR: Optical fiber interconnection devices, which can take the form of a module, are disclosed that include an array of optical fibers and multiferiber optical-fiber connectors, for example, two twelve-port connectors or multiples thereof, and three eight-port connector or multipleples thereof as discussed by the authors.
Abstract: Optical fiber interconnection devices, which can take the form of a module, are disclosed that include an array of optical fibers and multi-fiber optical-fiber connectors, for example, two twelve-port connectors or multiples thereof, and three eight-port connectors or multiples thereof. The array of optical fibers is color-coded and is configured to optically interconnect the ports of the two twelve-port connectors to the three eight-port connectors in a manner that preserves transmit and receive polarization. In one embodiment, the interconnection devices provide optical interconnections between twelve-fiber optical connector configurations to eight-fiber optical connector configurations, such as from twelve-fiber line cards to eight-fiber line cards, without having to make structural changes to cabling infrastructure. In one aspect, the optical fiber interconnection devices provide a migration path from duplex optics to parallel optics.
TL;DR: In this paper, the authors proposed to use two sets of interconnection lines in row-column fashion with all the sensor elements having one of their ends connected to a row line and other end to a column line.
Abstract: In this paper, we present a method that simplifies the interconnect complexity of N × M resistive sensor arrays from N × M to N + M. In this method, we propose to use two sets of interconnection lines in row–column fashion with all the sensor elements having one of their ends connected to a row line and other end to a column line. This interconnection overloading results in crosstalk among all the elements. This crosstalk causes the spreading of information over the whole array. The proposed circuit in this method takes care of this effect by minimizing the crosstalk. The circuit makes use of the concept of virtual same potential at the inputs of an operational amplifier in negative feedback to obtain a sufficient isolation among various elements. We theoretically present the suitability of the method for small/moderate sized sensor arrays and experimentally verify the predicted behavior by lock-in-amplifier based measurements on a light dependent resistor (LDR) in a 4 × 4 resistor array. Finally, we present a successful implementation of this method on a 16 × 16 imaging array of LDR.
TL;DR: In this paper, a multistrata dynamic random access memory (DRAM) vertically integrated with a complementary metal oxide semiconductor (CMOS) logic device using through-silicon vias (TSVs) and a unique interposer technology was developed for high-performance, power-efficient, and scalable computing.
Abstract: A multistrata dynamic random access memory (DRAM) vertically integrated with a complementary metal oxide semiconductor (CMOS) logic device using through-silicon vias (TSVs) and a unique interposer technology was developed for high-performance, power-efficient, and scalable computing. SMAFTI (SMArt chip connection with FeedThrough Interposer) technology, featuring an ultra-thin organic interposer with high-density feedthrough conductive vias, was used for interconnecting the three-dimensionally stacked DRAM and the CMOS logic device . A DRAM-compatible TSV manufacturing process was realized through the use of a ldquovia-firstrdquo process and highly doped poly-Si TSVs for vertical traces inside memory dice. A multilayer ultra-thin die stacking process with micro-bump interconnection using a solid-liquid interdiffusion technique was also developed. The thermal aging reliability of the micro-bump interconnection was evaluated by a unique analysis method and its basic reliability was confirmed. Finally, we fabricated a prototype package including stacked DRAM and a CMOS logic device, and observed the combined operation. High-speed 3 Gbit/s signals were successfully transmitted through the fine interposer between the memory and logic.
TL;DR: In this article, a post-encapsulation grinding (PEG) method was developed for direct chip connection (C2) interconnections, where the die is ground to less than 70 µm after joining and underfilling.
Abstract: PoP structures have been used widely in digital consumer electronics products such as digital still cameras and mobile phones. However, the final stack height from the top to the bottom package for these structures is higher than that of the current stacked die packages. To reduce the height of the package, a flip chip technology is used. Since the logic chips of mobile applications use a pad pitch of less than 80 µm or less, an ultra-fine-pitch flip chip interconnection technique is required. C4 flip chip technology is widely used in area array flip chip packages, but it is not suitable in the ultrafine-pitch flip chips because the C4 solder bumps melt and collapse on the wide opening Cu pads. Although the industry uses ultrafine-pitch interconnections between Au stud bumps on a chip and Sn/Ag pre-solder on a carrier, this flip chip technique has two major problems. One is that the need for bumps on both die and carrier drives up material costs. The other is that the long bonding process time required in the individual flip chip bonding process with associated heating and cooling steps demands large investments in equipment. To address these problems, we developed the mount and reflow with no-clean flux processes, and new interconnection techniques were developed with Cu pillars and Sn/Ag solder bumps on Al pads for wirebonding, were developed. It is very easy to control the gap between die and substrate by adjusting the Cu pillar height. Since it is unnecessary to control the collapse of the solder bumps, we call this the C2 process for direct Chip Connection (C2). The C2 bumps are connected to Cu substrate pads, which are a surface treated with OSP (Organic Solder Preservative), with reflow and no-clean processes. This technology creates the SMT/Flip Chip hybrid assembly for SoP (System on Package) use. We have produced 50 µm-pitch C2 interconnections and tested their reliability. The interconnection resistance increase caused by the reliability testing is quite small. It is clear that C2 flip chip technology provides robust solder connections at low cost. Also the C2 structure with a low-k device was evaluated and no failures were observed at 1,500 cycles in the thermal cycle test. This indicates that low-k C2 structures seem robust. For finer pitch flip chip interconnections, a wafer-level underfill process is needed to overcome the limitations of the standard capillary underfill process for ultra-narrow spaces. To date, a wafer- level underfill process exists for the C2 process with an 80-µm pitch. In addition to fine pitch interconnections, a die thickness of 70 µm is required to reduce the final stack height. Such thin die cannot be processed by the C2 process because such dies slip too easily during the reflow process. To resolve this issue, a Post-Encapsulation Grinding (PEG) method was developed. In this method the die is ground to less than 70 µm after joining and underfilling. This report presents the PEG method and reliability test results for die thicknesses 20 µm, 70 µm and 150 µm.
TL;DR: In this paper, the authors present micro fabrication process and wafer-level integration of a silicon carrier, which consists of two Si chips that are bonded together with evaporated AuSn-solder.
Abstract: This paper presents micro fabrication process and wafer-level integration of a silicon carrier, which consists of two Si chips that are bonded together with evaporated AuSn-solder. There are micro fins and channels fabricated in the Si chip and form the embedded cooling layer after bonding. The embedded cooling layer is connected with an inlet and an outlet to form a fluidic path for heat transfer enhancement. Besides, in the silicon carrier, there are through silicon vias (TSVs) with metal film on sidewall for electrical interconnection. Two or more carriers can then be stacked together with a silicon interposer in between to make up of a stacked cooling module for high power heat dissipation. The advantage of this 3-D stacking method is that it provides a method of simultaneously realizing electrical interconnection and fluidic path and it can extract heat from the constraints of 3-D silicon module chips to surface without external liquid circulation.
TL;DR: In this article, a scalable photonic interconnection network architecture is proposed whereby a Clos network is populated with broadcast-and-select stages, where a low distortion space switch technology based on recently demonstrated quantum-dot semiconductor optical amplifier technology is used as the base switch element.
Abstract: A scalable photonic interconnection network architecture is proposed whereby a Clos network is populated with broadcast-and-select stages. This enables the efficient exploitation of an emerging class of photonic integrated switch fabric. A low distortion space switch technology based on recently demonstrated quantum-dot semiconductor optical amplifier technology, which can be operated uncooled, is used as the base switch element. The viability of these switches in cascaded networks is reviewed, and predictions are made through detailed physical layer simulation to explore the potential for larger-scale network connectivity. Optical signal degradation is estimated as a function of data capacity and network size. Power efficiency and physical layer complexity are addressed for high end-to-end bandwidth, nanosecond-reconfigurable switch fabrics, to highlight the potential for scaling to several tens of connections. The proposed architecture is envisaged to facilitate high-capacity, low-latency switching suited to computing systems, backplanes, and data networks. Broadband operation through wavelength division multiplexing is studied to identify practical interconnection networks scalable to 100 Gbits/s per path and a power consumption of the order of 20 mW/(Gbits/s) for a 64×64 size interconnection network.
TL;DR: In this paper, the authors proposed a pairings method between active assembly-wise connector ports and active assembly transmit and receive ports for high-speed data-rate optical transport systems.
Abstract: Optical interconnection methods for high-speed data-rate optical transport systems are disclosed that optically interconnect active assemblies to a fiber optic cable in a polarization-preserving manner. The methods include defining active-assembly-wise connector ports that connect to active assembly transmit and receive ports, and defining or establishing a pairings method between the active-assembly-wise connector ports. In a first optical interconnection assembly, an active-assembly-wise port is optically connected to a cable-wise port. In the second optical interconnection assembly, the cable-wise port that corresponds to the connected cable-wise port in the first optical interconnection assembly is optically connected to a select active-assembly-wise port as defined by the pairings method. The optical connection process is then repeated from the second to the first optical interconnection assembly. The optical interconnection acts are repeated until all of the active-assembly-wise ports are connected.
TL;DR: In this paper, a wafer-level package with simultaneous through silicon via (TSV) connection and cavity hermetic sealing by low-temperature solder bonding for microelectromechanical system (MEMS) device such as resonator is presented.
Abstract: In this paper, a wafer-level package with simultaneous through silicon via (TSV) connection and cavity hermetic sealing by low-temperature solder bonding for microelectromechanical system (MEMS) device such as resonator is presented. Wet etching technique combined with dry etching technique is utilized to achieve a ldquoY-shapedrdquo through wafer interconnection structure to shorten the TSV in order to reduce cost. Ansoft HFSSTM 3-D electromagnetic simulator is used to assess the transition properties of signal with frequency of the new interconnection structure. Sn solder bonding is utilized to achieve simultaneous TSV connection and cavity hermetic sealing. Average shear strength of 19.5 Mpa and excellent leak rate of around 1.9 times 10-9 atm cc/s have been achieved, which meet the requirements of MIL-STD-883E. Kevin structure is also fabricated to measure the resistance of the metallized TSV, the resistance of the ldquoY-shapedrdquo through wafer interconnection and the contact resistance of the Cu/Sn IMC bond joint.
TL;DR: In this article, a method is presented for creating "interconnection blocks" that are re-usable and provide multiple, aligned and planar microfluidic interconnections, made from polydimethylsiloxane.
Abstract: In this paper a method is presented for creating 'interconnection blocks' that are re-usable and provide multiple, aligned and planar microfluidic interconnections. Interconnection blocks made from polydimethylsiloxane allow rapid testing of microfluidic chips and unobstructed microfluidic observation. The interconnection block method is scalable, flexible and supports high interconnection density. The average pressure limit of the interconnection block was near 5.5 bar and all individual results were well above the 2 bar threshold considered applicable to most microfluidic applications.
TL;DR: In this article, a first trigger MOS transistor is used as a trigger device to detect a voltage generated in the first resistance element by a gate of the first trigger mOS transistor and to allow the ESD protection device operate in response to the detected voltage.
Abstract: A semiconductor integrated circuit includes: an output pad from which an output signal is outputted; an output signal line connected with the output pad; a first pad configured to function as a ground terminal or a power supply terminal; a first wiring connected with the first pad; an output driver connected with the output pad and configured to generate the output signal; an ESD protection device connected with the output signal line and having a function to discharge surge applied to the output pad; and a first trigger MOS transistor used as a trigger device. The output driver includes: a first protection target device connected between the output signal line and the first interconnection; and a first resistance element connected between the first protection target device and the first interconnection. The first trigger MOS transistor configured to detect a voltage generated in the first resistance element by a gate of the first trigger MOS transistor and to allow the ESD protection device operate in response to the detected voltage.
TL;DR: This work aims at providing an in-depth assessment of physical synthesis efficiency of fat-trees and at extrapolating silicon-aware performance figures to back-annotate in the system-level performance analysis.
Abstract: Most of past evaluations of fat-trees for on-chip interconnection networks rely on oversimplifying or even irrealistic architecture and traffic pattern assumptions, and very few layout analyses are available to relieve practical feasibility concerns in nanoscale technologies. This work aims at providing an in-depth assessment of physical synthesis efficiency of fat-trees and at extrapolating silicon-aware performance figures to back-annotate in the system-level performance analysis. A 2D mesh is used as a reference architecture for comparison, and a 65 nm technology is targeted by our study. Finally, in an attempt to mitigate the implementation cost of k-ary n-tree topologies, we also review an alternative unidirectional multi-stage interconnection network which is able to simplify the fat-tree architecture and to minimally impact performance.
TL;DR: A chip package transmitting slow speed signals via edge connectors and high speed signals by means of through-silicon-vias is described in this paper, where edge connectors are formed in recesses formed in the sidewalls of the package.
Abstract: A chip package transmitting slow speed signals via edge connectors and high speed signals by means of through-silicon-vias. The edge connectors are formed in recesses formed in the sidewalls of the package.
TL;DR: This paper describes an innovative regular non-blocking, point-to-point, point to-multipoint, low latency interconnection network scheme with sliding window connectivity, which allows arbitrary parallelism among large sub-systems.
Abstract: This paper describes an innovative regular non-blocking, point-to-point, point-to-multipoint, low latency interconnection network scheme with sliding window connectivity, which allows arbitrary parallelism among large sub-systems. The area overhead of interconnect is only 30% of the chip area which is much smaller as compared to 80% in case of FPGA. The interconnection scheme is partially and dynamically reconfigurable. The configware is reduced 5.6 times by using binary encoding which allows energy efficient dynamic reconfiguration1.
TL;DR: A stack package as mentioned in this paper is a substrate having a circuit pattern; at least two semiconductor chips stacked on the substrate, having a plurality of through-via interconnection plugs and a pluralityof guard rings which surround the respective through-source interconnection plug, and connected with each other by the medium of the through-route interconnection wires.
Abstract: A stack package comprises a substrate having a circuit pattern; at least two semiconductor chips stacked on the substrate, having a plurality of through-via interconnection plugs and a plurality of guard rings which surround the respective through-via interconnection plugs, and connected with each other by the medium of the through-via interconnection plugs; a molding material for molding an upper surface of the substrate including the stacked semiconductor chips; and solder balls mounted to a lower surface of the substrate.
TL;DR: A new hierarchical interconnection network topology, called Rectangular Twisted Torus Meshes (RTTM network), is proposed, which allows the exploitation of computational locality as well as easy expansion up to a million processors.
Abstract: Since reducing the diameter is likely to improve the performance of an interconnection network, the problem of designing interconnection network with low diameter is still a current research topic. Another important issue in the design of interconnection networks for massively parallel computers is scalability. A new hierarchical interconnection network topology, called Rectangular Twisted Torus Meshes (RTTM network), is proposed. At the lowest level of RTTM network, the Level-1 sub-network, also called a Basic Module, consists of a mesh connection of 2m×2m nodes. Successively higher level networks are built by recursively interconnecting a×2a next lower level sub-networks in the form of a Rectangular Twisted Torus. An appealing property of the RTTM network is its smaller diameter and shorter average distance, which implies a reduction in communication delays. The RTTM network allows the exploitation of computational locality as well as easy expansion up to a million processors.
TL;DR: It is shown that on-chip interconnection networks can provide higher bandwidth between processors and shared first-level cache than previously considered possible, facilitating greater scalability of memory architectures that require that.
Abstract: In single-chip parallel processors, it is crucial to implement a high-throughput low-latency interconnection network to connect the on-chip components, especially the processing units and the memory units. In this paper, we propose a new mesh of trees (MoT) implementation of the interconnection network and evaluate it relative to metrics such as wire complexity, total register count, single switch delay, maximum throughput, tradeoffs between throughput and latency, and post-layout performance. We show that on-chip interconnection networks can provide higher bandwidth between processors and shared first-level cache than previously considered possible, facilitating greater scalability of memory architectures that require that. MoT is also compared, both analytically and experimentally, to some other traditional network topologies, such as hypercube, butterfly, fat trees and butterfly fat trees. When we evaluate a 64-terminal MoT network at 90-nm technology, concrete results show that MoT provides higher throughput and lower latency especially when the input traffic (or the on-chip parallelism) is high, at comparable area. A recurring problem in networking and communication is that of achieving good sustained throughput in contrast to just high theoretical peak performance that does not materialize for typical work loads. Our quantitative results demonstrate a clear advantage of the proposed MoT network in the context of single-chip parallel processing.
TL;DR: The salient feature of the proposed design is the ability to support multicasting and many-to-one communication efficiently (without arbitration), which makes it suitable for implementing cache coherency protocols and on-chip interconnect in future many-core processors.
Abstract: With recent advances in silicon nanophotonics, optical crossbars based on CMOS-compatible microring resonators have emerged as viable on-chip optical interconnection networks to deliver high-bandwidth communication at low power dissipation with a small footprint. This paper describes the design, fabrication and evaluation of an arbitration-free passive crossbar based on a microring resonator matrix that can be used to route wavelength division multiplexing (WDM) signals across the chip. The salient feature of the proposed design is the ability to support multicasting and many-to-one communication efficiently (without arbitration), which makes it suitable for implementing cache coherency protocols and on-chip interconnect in future many-core processors.
TL;DR: In this paper, the thermal resistance of interconnections of a 3D chip stack is measured and the thermal conductivity of C4 is derived to be 18 − 24 W/mC.
Abstract: As device-scaling challenges increase, three-dimensional (3D) integrated circuits (ICs) are receiving more attention for system performance enhancements, due to their higher interconnect densities and shorter interconnect lengths. However, because of the limited contact area and the higher circuit density, the cooling of 3D ICs is more challenging. In order to assess appropriate cooling solutions for 3D chip stacks in various uses, we need better understanding of the total thermal resistance of 3D chip stacks. This calls for precise thermal resistance measurements and thermal modeling for each component of a 3D chip stack. A 3D chip stack is composed of interconnections, silicon substrates, back-end-of-the-line (BEOL), front-end-of-the-line (FEOL) and in this study, the thermal resistance of interconnections is the primary focus because interconnections are regarded as one of the thermal resistance bottlenecks of a 3D chip stack. With regard to the thermal resistance measurements of interconnections, Yamaji et al. found it difficult to measure the thermal resistance of interconnections with the laser-flash method and pointed out that care was necessary for uniform temperature distribution in the sample when using the laser-flash method on heterogeneous specimens, such as stacked chips with interconnections. Considering this concern, we use a steady-state method for the thermal resistance measurements of the interconnections. The thermal resistance of 200μpitch-C4 (Pb97Sn3) joined samples is measured and the thermal conductivity of C4 is derived to be 18 – 24 W/mC. Also the thermal resistance of a silicon with various interconnection pitches and diameters is modeled and the relationship of thermal resistance to interconnection pitch and diameter is obtained. The thermal resistance reduction by underfill with various interconnection pitches and diameters is also studied.