TL;DR: Preliminary performance result on measuring software and network overhead is shown, and that promises the future reality of world-wide network computing is shown.
TL;DR: The progress that has been achieved to date in the development of the Globus toolkit, a set of core services for constructing grid tools and applications, is described, and the GUSTO testbed Organization is discussed, to enable large-scale evaluation of Globus technologies.
TL;DR: This work describes an architecture for data intensive applications where a high-speed distributed data cache is used as a common element for all of the sources and sinks of data, and provides standard interfaces to a large, application-oriented, distributed, on-line, transient storage system.
Abstract: Modern scientific computing involves organizing, moving, visualizing, and analyzing massive amounts of data at multiple sites around the world. The technologies, the middleware services, and the architectures that are used to build useful high-speed, wide area distributed systems, constitute the field of data intensive computing. We describe an architecture for data intensive applications where we use a high-speed distributed data cache as a common element for all of the sources and sinks of data. This cache-based approach provides standard interfaces to a large, application-oriented, distributed, on-line, transient storage system. We describe our implementation of this cache, how we have made it "network aware ", and how we do dynamic load balancing based on the current network conditions. We also show large increases in application throughput by access to knowledge of the network conditions.
TL;DR: This paper introduces a performance prediction method, AdRM (Adaptive Regression Modeling), to determine file transfer times for network-bound distributed data-intensive applications, and demonstrates the effectiveness of the method on two distributed data applications.
Abstract: The computational grid is becoming the platform of choice for large-scale distributed data-intensive applications. Accurately predicting the transfer times of remote data files, a fundamental component of such applications, is critical to achieving application performance. In this paper, we introduce a performance prediction method, AdRM (Adaptive Regression Modeling), to determine file transfer times for network-bound distributed data-intensive applications. We demonstrate the effectiveness of the AdRM method on two distributed data applications, SARA (Synthetic Aperture Radar Atlas) and SRB (Storage Resource Broker), and discuss how it can be used for application scheduling. Our experiments use the Network Weather Service [36, 37], a resource performance measurement and forecasting facility, as a basis for the performance prediction model. Our initial findings indicate that the AdRM method can be effective in accurately predicting data transfer times in wide-area multi-user grid environments.
TL;DR: The resulting systems are popularly known as parallel computers, and they allow the sharing of a computational task among multiple processors.
Abstract: Very often applications need more computing power than a sequential computer can provide. One way of overcoming this limitation is to improve the operating speed of processors and other components so that they can o er the power required by computationally intensive applications. Even though this is currently possible to certain extent, future improvements are constrained by the speed of light, thermodynamic laws, and the high nancial costs for processor fabrication. A viable and cost-e ective alternative solution is to connect multiple processors together and coordinate their computational e orts. The resulting systems are popularly known as parallel computers, and they allow the sharing of a computational task among multiple processors. As P ster [1] points out, there are three ways to improve performance: Work harder, Work smarter, and Get help.
TL;DR: A unified framework for resource scheduling in metacomputing systems where tasks with various requirements are submitted from participant sites is developed and it accommodates emerging notions of quality of service (QoS) and advance resource reservations.
Abstract: A major challenge in metacomputing systems (computational grids) is to effectively use their shared resources, such as compute cycles, memory, communication network, and data repositories, to optimize desired global objectives. We develop a unified framework for resource scheduling in metacomputing systems where tasks with various requirements are submitted from participant sites. Our goal is to minimize the overall execution time of a collection of application tasks. In our model, each application task is represented by a directed acyclic graph (DAG). A task consists of several subtasks and the resource requirements are specified at subtask level. Our framework is general and it accommodates emerging notions of quality of service (QoS) and advance resource reservations. We present several scheduling algorithms which consider compute resources and data repositories that have advance reservations. As shown by our simulation results, it is advantageous to schedule the system resource separately. Our algorithms have at least 30% improvement over the separated approach with respect to completion time.
TL;DR: The purpose of this work is to develop and validate a general adaptive scheduling algorithm for task farming applications along with a user interface that makes the algorithm accessible to domain scientists.
Abstract: Scheduling in metacomputing environments is an active field of research as the vision of a Computational Grid becomes more concrete. An important class of Grid applications are long-running parallel computations with large numbers of somewhat independent tasks (Monte-Carlo simulations, parameter-space searches, etc.). A number of Grid middleware projects are available to implement such applications but scheduling strategies are still open research issues. This is mainly due to the diversity of both Grid resource types and of their availability patterns. The purpose of this work is to develop and validate a general adaptive scheduling algorithm for task farming applications along with a user interface that makes the algorithm accessible to domain scientists. Our algorithm is general in that it is not tailored to a particular Grid middleware and that it requires very few assumptions concerning the nature of the resources. Our first testbed is NetSolve as it allows quick and easy development of the algorithm by isolating the developer from issues such as process control, I/O, remote software access, or fault-tolerance.
TL;DR: The design and performance of T2, an infrastructure for building parallel database systems that integrates storage, retrieval and processing of multi-dimensional datasets, are discussed and preliminary performance results comparing the implementation of two applications using the T2 services with custom-built integrated implementations are presented.
Abstract: Our study of a large set of scientific applications over the past three years indicates that the processing for multidimensional datasets is often highly stylized. The basic processing step usually consists of mapping the individual input items to the output grid and computing output items by aggregating, in some way, all the input items mapped to the corresponding grid point. In this paper we discuss the design and performance of T2, an infrastructure for building parallel database systems that integrates storage, retrieval and processing of multi-dimensional datasets. It achieves its primary advantage from the ability to integrate data retrieval and processing for a wide variety of applications and from the ability to maintain and jointly process multiple datasets with different underlying grids. We present preliminary performance results comparing the implementation of two applications using the T2 services with custom-built integrated implementations.
TL;DR: IPG will be able to support larger applications than ever before, such as multidisciplinary collaboration environments that couple geographically dispersed compute, data, scientific instruments, and people resources together using a suite of grid-wide services.
Abstract: ASA’s Information Power Grid is an example of an emerging, exciting concept that can potentially make high-performance computing power accessible to general users as easily and seamlessly as electricity from an electrical power grid. In the IPG system, high-performance computers located at geographically distributed sites will be connected via a high-speed interconnection network. Users will be able to submit computational jobs at any site, and the system will seek the best available computational resources, transfer the user’s input data sets to that system, access other needed data sets from remote sites, perform the specified computations and analysis, and then return the resulting data sets to the user. Systems such as the IPG will be able to support larger applications than ever before. New types of applications will also be enabled, such as multidisciplinary collaboration environments that couple geographically dispersed compute, data, scientific instruments, and people resources together using a suite of grid-wide services. IPG’s fundamental technology comes from current research results in the area of large-scale computational grids. Figure 1 provides an intuitive view of a wide-area computational grid.
TL;DR: The toolkit, called EveryWare, enables an application to draw computational power transparently from the Grid and provides the experiences gained while building the EveryWare toolkit prototype and the first true Grid application.
Abstract: The Computational Grid [10] has recently been proposed for the implementation of high-performance applications using widely dispersed computational resources. The goal of a Computational Grid is to aggregate ensembles of shared, heterogeneous, and distributed resources (potentially controlled by separate organizations) to provide computational "power" to an application program. In this paper, we provide a toolkit for the development of Grid applications. The toolkit, called EveryWare, enables an application to draw computational power transparently from the Grid. The toolkit consists of a portable set of processes and libraries that can be incorporated into an application so that a wide variety of dynamically changing distributed infrastructures and resources can be used together to achieve supercomputer-like performance. We provide our experiences gained while building the EveryWare toolkit prototype and the first true Grid application.
TL;DR: The evolution of heterogeneous concurrent computing, in the context of the parallel virtual machine (PVM) system, is discussed, which highlights the system level infrastructures that are required, aspects of parallel algorithm development that most affect performance, system capabilities and limitations, and tools and methodologies for effective computing in heterogeneous networked environments.
Abstract: Heterogeneous network-based distributed and parallel computing is gaining increasing acceptance as an alternative or complementary paradigm to multiprocessor-based parallel processing as well as to conventional supercomputing. While algorithmic and programming aspects of heterogeneous concurrent computing are similar to their parallel processing counterparts, system issues, partitioning and scheduling, and performance aspects are significantly different. In this paper, we discuss the evolution of heterogeneous concurrent computing, in the context of the parallel virtual machine (PVM) system, a widely adopted software system for network computing. In particular, we highlight the system level infrastructures that are required, aspects of parallel algorithm development that most affect performance, system capabilities and limitations, and tools and methodologies for effective computing in heterogeneous networked environments. We also present recent developments and experiences in the PVM project, and comment on ongoing and future work.
TL;DR: A new hierarchical stochastic model which benefits from both the spatial and hierarchical prior modeling is investigated, based on a tree which has been pollarded with nodes at the coarsest resolution exhibiting a grid-based interaction structure.
Abstract: This work is undertaken in the context of hierarchical stochastic models for the resolution of discrete inverse problems from low level vision. Some of these models lie on the nodes of a quadtree which leads to non-iterative inference procedures. Nevertheless, if they circumvent the algorithmic drawbacks of grid-based models (computational load and/or great dependance on the initialization), they admit modeling shortcomings (cumbersome and somehow artificial). We investigate a new hierarchical stochastic model which benefits from both the spatial and hierarchical prior modeling. The independence graph is based on a tree which has been pollarded with nodes at the coarsest resolution exhibiting a grid-based interaction structure. For this class of model, we address the critical problem of parameter estimation. To this end, we derive an EM algorithm on the hybrid structure which mixes an exact EM algorithm on each subtree and a low cost Gibbs EM algorithm on the coarse spatial grid. Experiments on a synthetic image and multispectral satellite images are reported.
TL;DR: Two new algorithms for computing the reliability of a distributed computing system with imperfect nodes are proposed based on a symbolic approach and a general factoring technique on both nodes and edges.
Abstract: The reliability of a distributed computing system depends on the reliability of its communication links and nodes and on the distribution of its resources, such as programs and data files. Many algorithms have been proposed for computing the reliability of distributed computing systems, but they have been applied mostly to distributed computing systems with perfect nodes. However, in real problems, nodes as well as links may fail. This paper proposes two new algorithms for computing the reliability of a distributed computing system with imperfect nodes. Algorithm I is based on a symbolic approach that includes two passes of computation. Algorithm II employs a general factoring technique on both nodes and edges. Comparisons with existing methods show the usefulness of the proposed algorithms for computing the reliability of large distributed computing systems.
TL;DR: This work develops a QoS-enabled customizable middleware framework called CompOSE|Q that can safely and effectively manage change in large scale distributed systems and illustrates how to achieve flexible, safe and efficient composability of resource management services in the middleware layer while ensuring QoS to the application.
Abstract: Advances in networking, communication, storage and computing technologies coupled with emerging novel application areas is enabling the widespread use of large scale distributed computing systems. These systems exhibit constant evolution as new applications place specialized requirements from the computing and communication infrastructure. Many applications provide QoS (quality of service) parameters that define the extent to which performance specifications such as responsiveness, reliability, availability, security and cost-effectiveness may be violated. These requirements are often implemented via resource management mechanisms in the middleware. We develop a QoS-enabled customizable middleware framework called CompOSE|Q that can safely and effectively manage change in large scale distributed systems. We illustrate how to achieve flexible, safe and efficient composability of resource management services in the middleware layer while ensuring QoS to the application.
TL;DR: The timeliness issue of routing and multicast when handover occurs is focused on, along with several solution approaches based on diierent system architectures.
Abstract: With the rapid advancement and extensive deployment of cluster computing and mobile communication, the integration of these two technologies has become feasible and lead to the emergence of a new paradigm called mobile cluster computing (MCC). Among the issues that need to be addressed before MCC can become a reality, the timeliness issue is an important one, especially when mobile nodes within a computing cluster migrate from one cell to another cell in a cellular wireless network. In this paper, we rst deene and analyze the potential application environment of mobile cluster computing. We also present a generic architecture of a mobile cluster computer and several potential research issues of mobile cluster computing. In the rest of this paper, we focus on the timeliness issue of routing and multicast when handover occurs, along with several solution approaches based on diierent system architectures.
TL;DR: This special issue of the journal contains many of the papers presented at the Workshop on Clusters and Computational Grids for Scientific Computing that was held at the Blackberry Farm Inn on September 2-4, 1998.
Abstract: This special issue of the journal contains many of the papers presented at the Workshop on Clusters and Computational Grids for Scientific Computing that was held at the Blackberry Farm Inn on September 2-4, 1998, and is a continuation of a series of workshops started in 1992 entitled Workshop on Environments and Tools for Parallel Scientific Computing. These workshops have been held every 2 years and alternate between the United States and France. The purpose of this fourth workshop, which is by invitation only, is to evaluate the state-of-the-art and future trends for cluster computing and the use of computational grids for scientific computing. This workshop addresses a number of themes for devel oping and using both cluster and computational grids. In particular, the talks covered the following:
TL;DR: Object-oriented programming is shown to provide support for constructing large scale systems that are cheaply built and with reusable components, adaptable to changing requirements and use efficient and cost-effective techniques.
Abstract: Description: This book delivers the latest developments in object technology and their impact in computing systems re-engineering. Object-oriented programming is here shown to provide support for constructing large scale systems that are cheaply built and with reusable components, adaptable to changing requirements and use efficient and cost-effective techniques. the UK and the USA here record their research and development work on the industrial techniques and structured object-oriented methodologies in forward and reverse engineering of computing systems. This book takes stock of progress of that work showing its promise and feasibility, and how its structured technology can overcome the limitations of forward engineering methods used in industry. Forward methods are focused in the domain of reverse engineering to implement a high level of specification for existing software.-Delivers the latest developments in object technology and their impact in computing systems re-engineering-Provides support for constructing large scale systems that are cheaply built and with reusable components, adaptable to changing requirements and use efficient and cost-effective techniques-Contains the content of the first UK Colloquium on Object Technology and Systems Re-Engineering held at Oxford University in 1998 Contents: Toward an object-oriented design methodology for hybrid systems Design patterns and their role in formal object-oriented development Devising coexistence strategies for objects with legacy systems Object-oriented model for expert systems implementation Re-engineering requirements specifications for re-use Object-oriented development of X-ray spectrometer software Pre-processing COBOL programs for reverse engineering Agent oriented programming language Fair objects Systems of systems as communicating structures Suitability of CORBA as a heterogeneous distributed platform Using O-O design to enhance procedural software Reengineering procedural software to object-oriented software using design transformations and resource usage matrix.
TL;DR: Essential and programming aspects of the extension of the DRAGON grid technology into three dimensions, and new challenges for the three-dimensional cases, are addressed.
Abstract: For a typical three dimensional flow in a practical engineering device, the time spent in grid generation can take 70 percent of the total analysis effort, resulting in a serious bottleneck in the design/analysis cycle. The present research attempts to develop a procedure that can considerably reduce the grid generation effort. The DRAGON grid, as a hybrid grid, is created by means of a Direct Replacement of Arbitrary Grid Overlapping by Nonstructured grid. The DRAGON grid scheme is an adaptation to the Chimera thinking. The Chimera grid is a composite structured grid, composing a set of overlapped structured grids, which are independently generated and body-fitted. The grid is of high quality and amenable for efficient solution schemes. However, the interpolation used in the overlapped region between grids introduces error, especially when a sharp-gradient region is encountered. The DRAGON grid scheme is capable of completely eliminating the interpolation and preserving the conservation property. It maximizes the advantages of the Chimera scheme and adapts the strengths of the unstructured and while at the same time keeping its weaknesses minimal. In the present paper, we describe the progress towards extending the DRAGON grid technology into three dimensions. Essential and programming aspects of the extension, and new challenges for the three-dimensional cases, are addressed.
TL;DR: This work constructs optimal VLSI layouts for butterfly networks, generalized hypercubes, and star graphs that have areas within a factor of 1+o(1) from their lower bounds.
Abstract: We propose the recursive grid layout scheme for deriving efficient layouts of a variety of hierarchical networks and computing upper bounds on the VLSI area of general hierarchical networks. In particular we construct optimal VLSI layouts for butterfly networks, generalized hypercubes, and star graphs that have areas within a factor of 1+o(1) from their lower bounds. We also derive efficient layouts for a number of other important networks, such as cube-connected cycles (CCC) and hypernets, which are the best results reported for these networks thus far.
TL;DR: This paper identifies the common characteristics of a network-based distributed computing environment and various design and implementational issues, associated with such an environment, are analyzed and a few solutions are proposed.
Abstract: With the phenomenal growth of the Internet and other related technological advancements in both hardware and software, the field of computing has reached a state, where we need to think ahead and plan for future needs of computing for developing global-scale systems. Unfortunately, the present computing environments have many shortcomings that may act as bottlenecks for the development of the next generation of global information systems. Since the current scenario envisions a major shift in the way of computing and the way in which computers affect lives, we need to look far define the characteristics of a new computing environment that will address the pitfalls of the present alternatives. In this paper, we identify the common characteristics of a network-based distributed computing environment. Various design and implementational issues, associated with such an environment, are analyzed and a few solutions are proposed.
TL;DR: The data structure, termed the Distribution Independent Adaptive Tree, efficiently supports both grid-based and particle-based methods and is useful in applications that involve simultaneous application of multiple methods.
Abstract: We present a data structure for supporting the access patterns required by most scientific applications that employ hierarchical methods. The data structure, termed the Distribution Independent Adaptive Tree, efficiently supports both grid-based and particle-based methods. We present efficient algorithms for most access patterns encountered in such applications: particle insertion/deletion/splitting, grid cell insertion/deletion, nearest neighbor queries, spherical region queries and computing long-range interactions. Apart from being an efficient data structure for an individual hierarchical method, the data structure is useful in applications that involve simultaneous application of multiple methods.
TL;DR: In this paper, an approach based on an object-oriented library that brings the adaptive meshing capabilities to a wide user community without deteriorating much performance is proposed. But it is not suitable for large-scale simulation models.
Abstract: Many numerical solutions of large scale simulation models require finer discretizations in some regions of the computational grid. When this region is not known in advance, adaptive meshing is the most convenient approach because it focuses the computational efforts on the most significant subdomain(s). However leaving the tasks of implementing adaptive meshing capabilities to the programmer would make the parallelization too much complex. We propose an approach based on an object-oriented library that brings the adaptive meshing capabilities to a wide user community without deteriorating much performance. The software framework includes a runtime support that detects the region requiring a dynamic grid refinement, manages reconfigurable data structures and masks any dynamic reconfiguration to the high-level code.
TL;DR: This work proposes the creation of a national computing systems research grid dedicated specifically to the experimental investigation of future grid architectures, and argues that this scale is necessary—and sufficient—to allow for realistic applications-driven experimentation.
Abstract: In future information infrastructures, boundaries between computing, storage, and communication will blur as these three functions become increasingly intertwined Networks will be more than dumb “bitways” that move bits among computers, storage, and people; they will incorporate substantial embedded computing and storage This computing and storage will, when combined with appropriate middleware services (security, resource management, instrumentation, accounting, and billing, etc) enable dramatically enhanced functionality when compared with the best-effort delivery provided by today’s Internet Film distributors can use such a computationally enhanced network (or grid [4] as it is sometimes called) for the efficient and secure distribution of digital video, using embedded resources for the caching, compression, and encryption of video streams [1] The climate change community can use a grid to deliver climate data products (“daily mean temperature,” “frost frequencies in Wisconsin,” “impacts on cranberry bog yields”) to scientists and policymakers; in this case the grid might not only cache datasets but also run the computations required to tailor simulation data for specific purposes Common to these two different examples is a distributed infrastructure capable of sophisticated computational functions The design and application of such grids raise numerous challenging research questions Unfortunately, no infrastructure exists to support the computer systems research that would answer those questions For example, a researcher interested in resource management techniques for the applications above cannot easily assemble the distributed collection of computers, archives, and networks required for realistic experimental evaluation of new mechanisms Network testbeds such as CAIRN allow experimentation with network protocols but do not incorporate significant computing or storage resources, while existing grid testbeds such as GUSTO [3] connect large amounts of compute and storage resources but do not support the dedicated, on-demand access required for experimentation Motivated by these concerns, we propose the creation of a national computing systems research grid dedicated specifically to the experimental investigation of future grid architectures This infrastructure, which we term the Broadband Experimental Terascale Access (Beta) grid, will comprise some moderate number (20–100) of reasonably powerful compute/storage nodes distributed across the country and connected to each other and to the user community via high-speed networks We argue that this scale is necessary—and sufficient—to allow for realistic applications-driven experimentation While future grids will necessarily be heterogeneous in terms of architecture and operating system, a reasonable architecture for an individual Beta Grid node is a moderate-sized PC cluster Given this basic node configuration, Linux becomes attractive as a base operating system Apart from cost issues, the flexibility provided by access to source code facilitates certain types of experiment, while the growing high-performance Linux community suggests that it should be possible to configure Beta Grid clusters with the software required to support high-speed, reliable delivery of data and computing to applications
TL;DR: This work proposes an approach based on an object-oriented library that brings the adaptive meshing capabilities to a wide user community without deteriorating much performance.
Abstract: Many numerical solutions of large scale simulation models require finer discretizations in some regions of the computational grid. When this region is not known in advance, adaptive meshing is the most convenient approach because it focuses the computational efforts on the most significant subdomain(s). However leaving the tasks of implementing adaptive meshing capabilities to the programmer would make the parallelization too much complex. We propose an approach based on an object-oriented library that brings the adaptive meshing capabilities to a wide user community without deteriorating much performance. The software framework includes a runtime support that detects the region requiring a dynamic grid refinement, manages reconfigurable data structures and masks any dynamic reconfiguration to the high-level code.
TL;DR: An undergraduate distributed computing course that focuses on the fundamental principles common to multimedia, client-server, parallel, web and collaborative computing.
Abstract: This paper proposes an undergraduate distributed computing course that focuses on the fundamental principles common to multimedia, client-server, parallel, web and collaborative computing. This computer science course should actively engage the students in exploring the concepts of distributed computing. Several extended projects using the language Java
TL;DR: The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP, and a new coarse-grain parallel concept at the zonal and intra-zonal levels.
Abstract: The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP. Developed at NASA Ames, the serial version, OVERFLOW, is a multidimensional Navier-Stokes flow solver based on overset (Chimera) grid technology. The code has recently been parallelized in two ways. One is based on the explicit message-passing interface (MPI) across processors and uses the _MPI communication package. This approach is primarily suited for distributed memory systems and workstation clusters. The second, termed the multi-level parallel (MLP) method, is simple and uses shared memory for all communications. The _MLP code is suitable on distributed-shared memory systems. For both methods, the message passing takes place across the processors or processes at the advancement of each time step. This procedure is, in effect, the Chimera boundary conditions update, which is done in an explicit "Jacobi" style. In contrast, the update in the serial code is done in more of the "Gauss-Sidel" fashion. The programming efforts for the _MPI code is more complicated than for the _MLP code; the former requires modification of the outer and some inner shells of the serial code, whereas the latter focuses only on the outer shell of the code. The _MPI version offers a great deal of flexibility in distributing grid zones across a specified number of processors in order to achieve load balancing. The approach is capable of partitioning zones across multiple processors or sending each zone and/or cluster of several zones into a single processor. The message passing across the processors consists of Chimera boundary and/or an overlap of "halo" boundary points for each partitioned zone. The MLP version is a new coarse-grain parallel concept at the zonal and intra-zonal levels. A grouping strategy is used to distribute zones into several groups forming sub-processes which will run in parallel. The total volume of grid points in each group are approximately balanced. A proper number of threads are initially allocated to each group, and in subsequent iterations during the run-time, the number of threads are adjusted to achieve load balancing across the processes. Each process exploits the multitasking directives already established in Overflow.