About: SUNMOS is a research topic. Over the lifetime, 11 publications have been published within this topic receiving 148 citations. The topic is also known as: Sandia/UNM Operating System.
TL;DR: This document provides a quick overview of how to compile and run jobs using the SUNMOS environment on the Paragon.
Abstract: SUNMOS is an acronym for Sandia/UNM Operating System. It was originally developed for the nCUBE-2 MIMD supercomputer between January and December of 1991. Between April and August of 1993, SUNMOS was ported to the Intel Paragon. This document provides a quick overview of how to compile and run jobs using the SUNMOS environment on the Paragon. The primary goal of SUNMOS is to provide high performance message passing and process support an example of its capabilities, SUNMOS Release 1.4 occupies approximately 240K of memory on a Paragon node, and is able to send messages at bandwidths of 165 megabytes per second with latencies as low as 42 microseconds using Intel NX calls. By contrast, Release 1.2 of OSF/1 for the Paragon occupies approximately 7 megabytes of memory on a node, has a peak bandwidth of 65 megabytes per second, and latencies as low as 42 microseconds (the communication numbers are reported elsewhere in these proceedings).
TL;DR: This work discusses issues regarding the development of the Argonne National Laboratory/Mississippi State University implementation of the Message Passing Interface standard on top of portals, and describes the design and implementation for both MPI point-to-point and collective communications, and MPI-2 one-sided communications.
Abstract: As the successor to SUNMOS, the Puma operating system provides a flexible, lightweight, high performance message passing environment for massively parallel computers. Message passing in Puma is accomplished through the use of a new mechanism known as a portal. Puma is currently running on the Intel Paragon and is being developed for the Intel TeraFLOPS machine. We discuss issues regarding the development of the Argonne National Laboratory/Mississippi State University implementation of the Message Passing Interface standard on top of portals. Included is a description of the design and implementation for both MPI point-to-point and collective communications, and MPI-2 one-sided communications.
TL;DR: Under OSF/1 AD, performance does not scale as the number of nodes increases, whereas under SUNMOS it seems to scale because of higher communication bandwidth.
Abstract: On Paragon, two operating systems are available: (a) OSF/1 AD, and (b) SUNMOS. The chief drawbacks of OSF/1 AD are (a) OSF/1 AD takes about 8 MB of memory on each node of the Paragon, (b) messages can be sent only at a bandwidth of 30-35 MB per second compared to 200 MB per second peak advertised rate, (c) latencies are on the order of 100 microseconds using Intel NX calls under OSF1/ AD. All these drawbacks can be minimized by using SUNMOS. SUNMOS takes only 250 KB of memory on each node and can send messages at bandwidth of 170 MB per second with latencies of 70 microseconds. We have measured the performance of applications under OSF/1 AD and SUNMOS and found that under OSF/1 AD, performance does not scale as the number of nodes increases, whereas under SUNMOS it seems to scale because of higher communication bandwidth.
TL;DR: Experiences and performance figures are reported from early tests of the 512-node Intel Paragon XPS35 at Oak Ridge National Laboratory, and early experiences with OSF/Mach and SUNMOS operating systems are reported, as well results from porting various distributed-memory applications.
Abstract: Experiences and performance figures are reported from early tests of the 512-node Intel Paragon XPS35 at Oak Ridge National Laboratory. Computation performance of the 50 MHz i860XP processor as well as communication performance of the 200 megabyte/second mesh are reported and compared with other multiprocessors. Single and multiple hop communication bandwidths and latencies are measured. Concurrent communication speeds and speed under network load are also measured. File I/O performance of the mesh-attached Parallel File System is measured. Early experiences with OSF/Mach and SUNMOS operating systems are reported, as well results from porting various distributed-memory applications. This report also summarizes the second phase of a Cooperative Research and Development Agreement between Oak Ridge National Laboratory and Intel in evaluating a 66-node Intel Paragon XPS5.
TL;DR: This variant of the FMM is useful for computing radar cross sections and antenna radiation patterns and reduces the operation count of the matrix-vector multiplication in iterative solvers to O(N3/2) (where N is the number of unknowns).
Abstract: Presented is a parallel algorithm based on the fast multipole method (FMM) for the Helmholtz equation. This variant of the FMM is useful for computing radar cross sections and antenna radiation patterns. The FMM decomposes the impedance matrix into sparse components, reducing the operation count of the matrix-vector multiplication in iterative solvers to O(N3/2) (where N is the number of unknowns). The parallel algorithm divides the problem into groups and assigns the computation involved with each group to a processor node. Careful consideration is given to the communications costs. A time complexity analysis of the algorithm is presented and compared with empirical results from a Paragon XP/S running the lightweight Sandia/University of New Mexico operating system (SUNMOS). For a 90,000 unknown problem running on 60 nodes, the sparse representation fits in memory and the algorithm computes the matrix-vector product in 1.26 seconds. It sustains an aggregate rate of 1.4 Gflop/s. The corresponding dense matrix would occupy over 100 Gbytes and, assuming that I/O is free, would require on the order of 50 seconds to form the matrix-vector product.