Book Chapter10.1007/978-3-642-29740-3_42
Extending a highly parallel data mining algorithm to the intel ® many integrated core architecture
Alexander Heinecke,Michael Klemm,Dirk Pflüger,Arndt Bode,Hans-Joachim Bungartz +4 more
- 29 Aug 2011
- pp 375-384
TL;DR: The SG++ algorithm is extended to the Intel Many Integrated Core Architecture, generating both the host and the coprocessor code, and the ease of porting an application to Intel MIC Architecture is shown: porting existing SSE code is very easy and straightforward.
read more
Abstract: Extracting knowledge from vast datasets is a major challenge in data-driven applications, such as classification and regression, which are mostly compute bound. In this paper, we extend our SG++ algorithm to the Intel® Many Integrated Core Architecture (Intel® MIC Architecture). The ease of porting an application to Intel MIC Architecture is shown: porting existing SSE code is very easy and straightforward. We evaluate the current prototype pre-release coprocessor board codenamed Intel® "Knights Ferry". We utilize the pragma-based offloading programming model offered by the Intel® Composer XE for Intel MIC Architecture, generating both the host and the coprocessor code. We compare the achieved performance with an NVIDIA C2050 accelerator and show that the pre-release Knights Ferry coprocessor delivers better performance than the C2050 and exceeds the C2050 when comparing the productivity aspect of implementing algorithms for the coprocessors.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
OpenMP Programming on Intel Xeon Phi Coprocessors: An Early Performance Comparison
Tim Cramer,Dirk Schmidl,Michael Klemm,Dieter an Mey +3 more
- 01 Jan 2012
TL;DR: This work focuses on OpenMP*-style programming and evaluates the overhead of a selected subset of the language extensions for Intel Xeon Phi coprocessors as well as some selected standardized OpenMP constructs to assess if the architecture can run standard applications efficiently.
117
Euro-Par 2011: Parallel Processing Workshops
Michael Alexander,Pasqua D'Ambra,Adam Belloum,George Bosilca,Mario Cannataro,Marco Danelutto,Beniamino Di Martino,Michael Gerndt,Emmanuel Jeannot,Raymond Namyst,Jean Roman,Stephen L. Scott,Jesper Larsson Träff,Geoffroy Vallée,Josef Weidendorfer +14 more
- 01 Jan 2012
TL;DR: This vision paper outlines the issues involved, and presents preliminary ideas for enhancing the executability of applications on different cloud platforms, and proposes the concept of dynamic adapters supported by runtime systems for environment preconditioning.
53
Scaling Support Vector Machines on modern HPC platforms
Yang You,Haohuan Fu,Shuaiwen Leon Song,Amanda Randles,Darren J. Kerbyson,Andres Marquez,Guangwen Yang,Adolfy Hoisie +7 more
TL;DR: In this paper, the authors designed and implemented MIC-SVM, a highly efficient parallel SVM for x86 based multi-core and many-core architectures, such as the Intel Ivy Bridge CPUs and Intel Xeon Phi co-processor (MIC).
36
Emerging Architectures Enable to Boost Massively Parallel Data Mining Using Adaptive Sparse Grids
Alexander Heinecke,Dirk Pflüger +1 more
TL;DR: This paper presents the parallelization on several current task- and data-parallel platforms, covering multi-core CPUs with vector units, GPUs, and hybrid systems, and analyzes the suitability of parallel programming languages for the implementation.
31
HyPHI - Task Based Hybrid Execution C++ Library for the Intel Xeon Phi Coprocessor
Jiri Dokulil,Enes Bajrovic,Siegfried Benkner,Martin Sandrieser,Beverly Bachmayer +4 more
- 01 Oct 2013
TL;DR: HyPHI is presented, a novel library for the Intel Xeon Phi coprocessor for building applications which execute using a hybrid parallel model that exploits parallelism across host CPUs and Xeon PhiCoprocessors simultaneously.
References
Synthesis and evaluation of linear motion transitions
Jing Wang,Bobby Bodenheimer +1 more
TL;DR: This article develops methods for determining visually appealing motion transitions using linear blending, and assess the importance of these techniques by determining the minimum sensitivity of viewers to transition durations, the just noticeable difference, for both center-aligned and start-end specifications.
1.6K
Larrabee: a many-core x86 architecture for visual computing
Larry D. Seiler,Doug Carmean,Eric Sprangle,Tom Forsyth,Michael Abrash,Pradeep Dubey,Stephen Junkins,Adam T. Lake,Jeremy Sugerman,Robert Dale Cavin,Roger Espasa,Ed Grochowski,Toni Juan,Pat Hanrahan +13 more
- 01 Aug 2008
TL;DR: This article consists of a collection of slides from the author's conference presentation, some of the topics discussed include: architecture convergence; Larrabee architecture; and graphics pipeline.
Benchmarking GPUs to tune dense linear algebra
TL;DR: It is argued that modern GPUs should be viewed as multithreaded multicore vector units and exploit blocking similarly to vector computers and heterogeneity of the system by computing both on GPU and CPU.
On the Utility of Graphics Cards to Perform Massively Parallel Simulation of Advanced Monte Carlo Methods
TL;DR: It is suggested that GPUs have the potential to facilitate the growth of statistical modeling into complex data-rich domains through the availability of cheap and accessible many-core computation.
Intel threading building blocks
TL;DR: The Intel Threading Building Blocks as mentioned in this paper is a C++ runtime library that abstracts the low-level threading details necessary for effectively utilizing multi-core processors using C++ templates.
371