TL;DR: A class of partitionings is presented that encompasses previous techniques and provides enough flexibility to adapt code to multiprocessors with two levels of parallelism and two level of memory.
Abstract: Supercompilers must reschedule computations defined by nested DO-loops in order to make an efficient use of supercomputer features (vector units, multiple elementary processors, cache memory, etc…). Many rescheduling techniques like loop interchange, loop strip-mining or rectangular partitioning have been described to speedup program execution. We present here a class of partitionings that encompasses previous techniques and provides enough flexibility to adapt code to multiprocessors with two levels of parallelism and two levels of memory.
TL;DR: An algorithm for partitioning the nodes of a graph into supernodes is presented, which improves the performance of the multifrontal method for the factorization of large, sparse matrices on vector computers, and factorizes the extremely sparse electric power matrices faster than the general sparse algorithm.
Abstract: In this paper we present an algorithm for partitioning the nodes of a graph into supernodes, which improves the performance of the multifrontal method for the factorization of large, sparse matrices on vector computers. This new algorithm first partitions the graph into fundamental supernodes. Next, using a specified relaxation parameter, the supernodes are coalesced in a careful manner to create a coarser supernode partition. Using this coarser partition in the factorization generally introduces logically zero entries into the factor. This is accompanied by a decrease in the amount of sparse vector computations and data movement and an increase in the number of dense vector operations. The amount of storage required for the factor is generally increased by a small amount. On a collection of moderately sized 3-D structures, matrices speedups of 3 to 20 percent on the Cray X-MP are observed over the fundamental supernode partition which allows no logically zero entries in the factor. Using this relaxed supernode partition, the multifrontal method now factorizes the extremely sparse electric power matrices faster than the general sparse algorithm. In addition, there is potential for considerably reducing the communication requirements for an implementation of the multifrontal method on a local memory multiprocessor.
TL;DR: In this paper, a fixed prefix peer-to-peer network has a number of physical nodes and the nodes are logically divided into storage slots, such that no physical node has more than one original and/or redundant fragment.
Abstract: A fixed prefix peer to peer network has a number of physical nodes. The nodes are logically divided into a number of storage slots. Blocks of data are erasure coded into original and redundant data fragments and the resultant fragments of data are stored in slots on separate physical nodes such that no physical node has more than one original and/or redundant fragment. The storage locations of all of the fragments are organized into a logical virtual node (e.g., a supernode). Thus, the supernode and the original block of data can be recovered even if some of the physical nodes are lost.
TL;DR: Flow-based measurements of broadband traffic spanning several months, gathered in the backbone of a large ISP network are analyzed and some key issues and challenges in handling/controlling P2P traffic are described, and a potential solution approach is presented.
Abstract: There is considerable interest in peer-topeer (P2P) traffic because of its remarkable increase over the last few years. By analyzing flow measurements at the border routers of a Tier-1 ISP backbone that carry broadband traffic, we are able to study its properties. P2P has become a large part of broadband traffic and its characteristics are different from older applications, such as the Web. It is a stable balanced traffic: the peak to valley ratio during a day is around two and the IN/OUT traffic balance is close to one. Although P2P protocols are based on a distributed architecture, they don’t show strong signs of geographical locality. A broadband subscriber is not much more likely to download a file from a close region than from a far region. It is clear that most of the traffic is generated by heavy hitters who “abuse” P2P (and other) applications, whereas most of the subscribers only use their broadband connections to browse the web, exchange emails or chat. However it is not easy to directly block or limit P2P traffic, because these applications adapt themselves to their environment: the users develop ways of eluding the traffic blocks. The traffic that could historically be identified with five port numbers is now spread over thousands of TCP ports, pushing port based identification to its limits. More complex methods to identify P2P traffic are not a long-term solution, the cable industry should opt for a “pay for what you use” model like the other utilities. INTRODUCTION P2P (peer-to-peer) file sharing applications have grown dramatically over the past few years and contribute a significant share of the total traffic in many networks. In this paper, we analyze flow-based measurements of broadband traffic spanning several months, gathered in the backbone of a large ISP network. We first develop an understanding of P2P traffic behavior from the viewpoint of broadband provider networks (earlier studies were based on a Tier-1 ISP backbone viewpoint [1] and on a University edgenetwork viewpoint [2]). The study then describes some key issues and challenges in handling/controlling this traffic, and presents a potential solution approach. We begin with a description of these P2P systems File Sharing Applications Many popular P2P applications such as KaZaA and Gnutella are organized as application-level overlay systems in which large numbers of computers (called peers) across the Internet link together in a decentralized manner via application-level connections. The predominant use of these systems is for sharing large data files (particularly music and video) among the connected users. The data files and associated metadata information (useful for searching content) are distributed across the different peers. A key difference with traditional clientserver systems is that each host in a P2P system acts as both a client and a server of content. In contrast to the stable configurations of traditional distributed systems, the individual peers can frequently join and leave the P2P system. The process of obtaining a file can be broadly divided into two phases – query search followed by object retrieval. First, a user specifies a query (e.g., a combination of name, genre, artist name etc.), and the P2P protocol searches for the existence of file(s) that match the query. The requesting peer receives one or more responses, and if the search is successful, identifies one or more target peers from which to download each file. The search queries as well as the responses are transmitted via the overlay connections. The details of how the search is propagated through the overlay is protocol-dependent. In earlier P2P protocols exemplified by Gnutella version 4.0, a peer initiates a query by flooding it to all its neighbors in the overlay. The neighboring peers in turn, flood to their neighbors, using a scoping mechanism to control the flood. In contrast, for newer protocols like KaZaA, as well as for newer versions of Gnutella, queries are forwarded to and handled by only a subset of special peers (called SuperNodes in KaZaA, and UltraPeers in Gnutella). A peer transmits an index of its content to the ``special peer'' to which it is connected. The special peer then uses the corresponding P2P protocol to forward the query to other such peers in the system. Once the search results are in, the requesting peer directly contacts the target peer, typically using some variant of HTTP (the target peer has a HTTP server listening by default on a known protocol-specific port), to get the requested resource. Some new systems use swarming download-a file is downloaded in chunks from multiple peers. Although the earlier P2P systems mostly used default network ports for communication, there is strong evidence to suggest that substantial P2P traffic nowadays is transmitted over a large number of nonstandard ports. This seems to be primarily motivated by the desire to circumvent firewall restrictions as well as rate–limiting actions by ISPs targeted at such applications we shall discuss this more later in the paper. Another recent occurrence has been the development of tools that allow an end-user to explicitly select the SuperNode it connects to [3]. This appears to be an attempt to improve the quality of the best-effort search process in the P2P system, for files that are not widely distributed, but are geographically localized. For instance, connecting to a SuperNode in Brazil may increase the chances of locating Samba-related content.
TL;DR: In this article, two local operations used by a peer are introduced: connect and break, where the peer forms an ad hoc search or index link to another peer, and break in which the peer breaks a link that is producing too much load.
Abstract: Peer-to-peer search networks are a popular and widely deployed means of searching massively distributed digital information repositories. Unfortunately, as such networks grow, peers may become overloaded processing messages from other peers. This article examines how to reduce the load on nodes in P2P networks by allowing them to self-organize into a relatively efficient network, and then self-tune to make the network even more efficient. Two local operations used by a peer are introduced: connect(), in which the peer forms an ad hoc search or index link to another peer, and break(), in which the peer breaks a link that is producing too much load. By replacing fixed rules with dynamic local decision-making, such “self-supervising” networks can better adjust to network conditions. Different ways to implement connect() and break() are described, and the network structures that form under different configurations are examined. Simulation results indicate that the ad hoc networks formed using the described techniques are more efficient than popular supernode topologies for several important scenarios. Results for the fault tolerance and search latency of such ad hoc networks are also presented.