About: Single-core is a research topic. Over the lifetime, 524 publications have been published within this topic receiving 4874 citations. The topic is also known as: single-core & single-core CPU.
TL;DR: A new task decomposition method is proposed that decomposes each parallel task into a set of sequential tasks and achieves a resource augmentation bound of 2.62 when the decomposed tasks are scheduled using global EDF and partitioned deadline monotonic scheduling, respectively.
Abstract: Multi-core processors offer a significant performance increase over single core processors. Therefore, they have the potential to enable computation-intensive real-time applications with stringent timing constraints that cannot be met on traditional single-core processors. However, most results in traditional multiprocessor real-time scheduling are limited to sequential programming models and ignore intra-task parallelism. In this paper, we address the problem of scheduling periodic parallel tasks with implicit deadlines on multi-core processors. We first consider a synchronous task model where each task consists of segments, each segment having an arbitrary number of parallel threads that synchronize at the end of the segment. We propose a new task decomposition method that decomposes each parallel task into a set of sequential tasks. We prove that our task decomposition achieves a resource augmentation bound of 2.62 and 3.42 when the decomposed tasks are scheduled using global EDF and partitioned deadline monotonic scheduling, respectively. Finally, we extend our analysis to directed a cyclic graph tasks. We show how these tasks can be converted into synchronous tasks such that the same transformation can be applied and the same augmentation bounds hold.
TL;DR: In this paper, an innovative current doubler rectifier, which integrates all the magnetic components into a single core and minimizes the number of high current windings, is presented.
Abstract: This paper presents an innovative current doubler rectifier, which integrates all the magnetic components into a single core and minimizes the number of high current windings. Compared to the conventional approach, the proposed integrated magnetic structure features reduced core loss, smaller core size, and reduced AC conduction losses, all while still reducing winding losses. The new rectification circuit can be applied to many topologies. An asymmetrical half-bridge converter was used as one attractive example to demonstrate the operation and performance of the proposed structure. A prototype featuring 400 V input, 48 V output, 200 kHz switching frequency, and 1 kW output power was also developed based on this topology.
TL;DR: It is shown that an architecture where several slower cores are clustered together with a shared faster L1 cache is optimal for energy efficiency, because processor cores and memory operate best at different supply and threshold voltages.
Abstract: Subthreshold circuit design has become a popular approach for building energy efficient digital circuits. One drawback is performance degradation due to the exponentially reduced driving current. This had limited subthreshold circuits to relatively low performance applications such as sensor networks. To retain the excellent energy efficiency while reducing performance loss, we propose to apply subthreshold and near-threshold techniques to chip multi-processors. We show that an architecture where several slower cores are clustered together with a shared faster L1 cache is optimal for energy efficiency, because processor cores and memory operate best at different supply and threshold voltages. In particular, SPLASH2 benchmarks show about a 53% energy improvement over the traditional CMP approach (about 70% over a single core machine).
TL;DR: The low inter-processor communication latency between the cores in a CMP helps make a much wider range of applications viable candidates for parallel execution than was possible with conventional, multi-chip multiprocessors; nevertheless, limited parallelism in key applications is the main factor limiting acceptance of CMPs in some types of systems.
Abstract: Chip multiprocessors - also called multi-core microprocessors or CMPs for short - are now the only way to build high-performance microprocessors, for a variety of reasons. Large uniprocessors are no longer scaling in performance, because it is only possible to extract a limited amount of parallelism from a typical instruction stream using conventional superscalar instruction issue techniques. In addition, one cannot simply ratchet up the clock speed on today's processors, or the power dissipation will become prohibitive in all but water-cooled systems. Compounding these problems is the simple fact that with the immense numbers of transistors available on today's microprocessor chips, it is too costly to design and debug ever-larger processors every year or two. CMPs avoid these problems by filling up a processor die with multiple, relatively simpler processor cores instead of just one huge core. The exact size of a CMPs cores can vary from very simple pipelines to moderately complex superscalar processors, but once a core has been selected the CMPs performance can easily scale across silicon process generations simply by stamping down more copies of the hard-to-design, high-speed processor core in each successive chip generation. In addition, parallel code execution, obtained by spreading multiple threads of execution across the various cores, can achieve significantly higher performance than would be possible using only a single core. While parallel threads are already common in many useful workloads, there are still important workloads that are hard to divide into parallel threads. The low inter-processor communication latency between the cores in a CMP helps make a much wider range of applications viable candidates for parallel execution than was possible with conventional, multi-chip multiprocessors; nevertheless, limited parallelism in key applications is the main factor limiting acceptance of CMPs in some types of systems.
TL;DR: Sferes v2 as mentioned in this paper is a C++ framework designed to help researchers in evolutionary computation to make their code run as fast as possible on a multi-core computer, which is based on three main concepts: (1) including multcore optimizations from the start of the design process; (2) providing state-of-the-art implementations of well-selected current evolutionary algorithms (EA), and especially multiobjective EAs; (3) being based on modern (template-based) C++ techniques to be both abstract and efficient.
Abstract: This paper introduces and benchmarks Sferes v2 , a C++ framework designed to help researchers in evolutionary computation to make their code run as fast as possible on a multi-core computer It is based on three main concepts: (1) including multi-core optimizations from the start of the design process; (2) providing state-of-the art implementations of well-selected current evolutionary algorithms (EA), and especially multiobjective EAs; (3) being based on modern (template-based) C++ techniques to be both abstract and efficient Benchmark results show that when a single core is used, running time of classic EAs included in Sferes v2 (NSGA-2 and CMA-ES) are of the same order of magnitude than specialized C code When n cores are used, typical speed-ups range from 075n to 09n; however, parallelization efficiency critically depends on the time to evaluate the fitness function