Top 1759 papers published in the topic of Central processing unit in 2013

Showing papers on "Central processing unit published in 2013"

Patent•

System and method for display of information using a vehicle-mount computer

[...]

24 May 2013

TL;DR: In this paper, a system and method display information using a vehicle-mounted computer is presented, which includes: (i) a computer touch screen for inputting and displaying information; (ii) a motion detector for detecting vehicle motion; and (iii) a vehicle mounted computer in communication with the computer touch screens and the motion detector.

...read moreread less

Abstract: A system and method display information using a vehicle-mount computer. The system includes: (i) a computer touch screen for inputting and displaying information; (ii) a motion detector for detecting vehicle motion; and (iii) a vehicle-mount computer in communication with the computer touch screen and the motion detector. The vehicle-mount computer includes a central processing unit and memory. The vehicle-mount computer's central processing unit is configured to store information associated with user-selected information from the computer touch screen. Further, the vehicle-mount computer's central processing unit is configured to receive vehicle-motion information from the motion detector. Moreover, the vehicle-mount computer's central processing unit is configured to control the display of a zoomed view of the user-selected information on the computer touch screen in response to the motion detector's detection of motion.

...read moreread less

290 citations

Patent•

Embedded system for construction of small footprint speech recognition with user-definable constraints

[...]

Michael J. Newman, Robert Roth, William D. Alexander, Paul van Mulbregt

23 Apr 2013

TL;DR: In this paper, the authors present techniques and methods that enable a voice trigger that wakes up an electronic device or causes the device to make additional voice commands active, without manual initiation of voice command functionality.

...read moreread less

Abstract: Techniques disclosed herein include systems and methods that enable a voice trigger that wakes-up an electronic device or causes the device to make additional voice commands active, without manual initiation of voice command functionality. In addition, such a voice trigger is dynamically programmable or customizable. A speaker can program or designate a particular phrase as the voice trigger. In general, techniques herein execute a voice-activated wake-up system that operates on a digital signal processor (DSP) or other low-power, secondary processing unit of an electronic device instead of running on a central processing unit (CPU). A speech recognition manager runs two speech recognition systems on an electronic device. The CPU dynamically creates a compact speech system for the DSP. Such a compact system can be continuously run during a standby mode, without quickly exhausting a battery supply.

...read moreread less

210 citations

Journal Article•10.14778/2536360.2536370•

Hardware-oblivious parallelism for in-memory column-stores

[...]

Max Heimel¹, Michael Saecker, Holger Pirk, Stefan Manegold, Volker Markl¹ - Show less +1 more•Institutions (1)

Technical University of Berlin¹

1 Jul 2013

TL;DR: This work proposes an alternative design for a parallel database engine, based on a single set of hardware-oblivious operators, which are compiled down to the actual hardware at runtime, which reduces the development overhead for parallel database engines, while achieving competitive performance to hand-tuned systems.

...read moreread less

Abstract: The multi-core architectures of today's computer systems make parallelism a necessity for performance critical applications. Writing such applications in a generic, hardware-oblivious manner is a challenging problem: Current database systems thus rely on labor-intensive and error-prone manual tuning to exploit the full potential of modern parallel hardware architectures like multi-core CPUs and graphics cards. We propose an alternative design for a parallel database engine, based on a single set of hardware-oblivious operators, which are compiled down to the actual hardware at runtime. This design reduces the development overhead for parallel database engines, while achieving competitive performance to hand-tuned systems.We provide a proof-of-concept for this design by integrating operators written using the parallel programming framework OpenCL into the open-source database MonetDB. Following this approach, we achieve efficient, yet highly portable parallel code without the need for optimization by hand. We evaluated our implementation against MonetDB using TPC-H derived queries and observed a performance that rivals that of MonetDB's query execution on the CPU and surpasses it on the GPU. In addition, we show that the same set of operators runs nearly unchanged on a GPU, demonstrating the feasibility of our approach.

...read moreread less

147 citations

Proceedings Article•10.5555/2523721.2523756•

Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems

[...]

Janghaeng Lee¹, Mehrzad Samadi¹, Yongjun Park¹, Scott Mahlke¹•Institutions (1)

University of Michigan¹

7 Oct 2013

TL;DR: The single kernel multiple devices (SKMD) system is presented, a framework that transparently orchestrates collaborative execution of a single data-parallel kernel across multiple asymmetric CPUs and GPUs.

...read moreread less

Abstract: Heterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each device: the GPU handles data parallel work by taking advantage of its massive number of cores while the CPU handles non data-parallel work, such as the sequential code or data transfer management Unfortunately, this work distribution can be a poor solution as it under utilizes the CPU, has difficulty generalizing beyond the single CPU-GPU combination, and may waste a large fraction of time transferring data Further, CPUs are performance competitive with GPUs on many workloads, thus simply partitioning work based on the fixed roles may be a poor choice In this paper, we present the single kernel multiple devices (SKMD) system, a framework that transparently orchestrates collaborative execution of a single data-parallel kernel across multiple asymmetric CPUs and GPUs The programmer is responsible for developing a single data-parallel kernel in OpenCL, while the system automatically partitions the workload across an arbitrary set of devices, generates kernels to execute the partial workloads, and efficiently merges the partial outputs together The goal is performance improvement by maximally utilizing all available resources to execute the kernel SKMD handles the difficult challenges of exposed data transfer costs and the performance variations GPUs have with respect to input size On real hardware, SKMD achieves an average speedup of 29% on a system with one multicore CPU and two asymmetric GPUs compared to a fastest device execution strategy for a set of popular OpenCL kernels

...read moreread less

125 citations

Journal Article•10.1016/J.COMPFLUID.2013.09.018•

Towards a generalised GPU/CPU shallow-flow modelling tool

[...]

Luke Smith¹, Qiuhua Liang¹•Institutions (1)

Newcastle University¹

15 Dec 2013-Computers & Fluids

TL;DR: A second-order accurate Godunov-type MUSCL-Hancock scheme is used with an HLLC Riemann solver to create a robust framework suitable for different types of flood simulation, showing good agreement with a post-event survey.

...read moreread less

120 citations

Posted Content•

Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture

[...]

Jiong He¹, Mian Lu², Bingsheng He¹•Institutions (2)

Nanyang Technological University¹, Agency for Science, Technology and Research²

08 Jul 2013-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This paper experimentally revisit hash joins, one of the most important join algorithms for main memory databases, on a coupled CPU-GPU architecture, and studies the fine-grained co-processing mechanisms on hash joins with and without partitioning.

...read moreread less

Abstract: Query co-processing on graphics processors (GPUs) has become an effective means to improve the performance of main memory databases. However, the relatively low bandwidth and high latency of the PCI-e bus are usually bottleneck issues for co-processing. Recently, coupled CPU-GPU architectures have received a lot of attention, e.g. AMD APUs with the CPU and the GPU integrated into a single chip. That opens up new opportunities for optimizing query co-processing. In this paper, we experimentally revisit hash joins, one of the most important join algorithms for main memory databases, on a coupled CPU-GPU architecture. Particularly, we study the fine-grained co-processing mechanisms on hash joins with and without partitioning. The co-processing outlines an interesting design space. We extend existing cost models to automatically guide decisions on the design space. Our experimental results on a recent AMD APU show that (1) the coupled architecture enables fine-grained co-processing and cache reuses, which are inefficient on discrete CPU-GPU architectures; (2) the cost model can automatically guide the design and tuning knobs in the design space; (3) fine-grained co-processing achieves up to 53%, 35% and 28% performance improvement over CPU-only, GPU-only and conventional CPU-GPU co-processing, respectively. We believe that the insights and implications from this study are initial yet important for further research on query co-processing on coupled CPU-GPU architectures.

...read moreread less

99 citations

Proceedings Article•10.1145/2451116.2451162•

Portable performance on heterogeneous architectures

[...]

Phitchaya Mangpo Phothilimthana¹, Jason Ansel¹, Jonathan Ragan-Kelley¹, Saman Amarasinghe¹•Institutions (1)

Massachusetts Institute of Technology¹

16 Mar 2013

TL;DR: A programming model in which the best mapping of programs to processors and memories is determined empirically, and the rich choice space allows the autotuner to construct poly-algorithms that combine many different algorithmic techniques, using both the CPU and the GPU, to obtain better performance than any one technique alone.

...read moreread less

Abstract: Trends in both consumer and high performance computing are bringing not only more cores, but also increased heterogeneity among the computational resources within a single machine. In many machines, one of the greatest computational resources is now their graphics coprocessors (GPUs), not just their primary CPUs. But GPU programming and memory models differ dramatically from conventional CPUs, and the relative performance characteristics of the different processors vary widely between machines. Different processors within a system often perform best with different algorithms and memory usage patterns, and achieving the best overall performance may require mapping portions of programs across all types of resources in the machine.To address the problem of efficiently programming machines with increasingly heterogeneous computational resources, we propose a programming model in which the best mapping of programs to processors and memories is determined empirically. Programs define choices in how their individual algorithms may work, and the compiler generates further choices in how they can map to CPU and GPU processors and memory systems. These choices are given to an empirical autotuning framework that allows the space of possible implementations to be searched at installation time. The rich choice space allows the autotuner to construct poly-algorithms that combine many different algorithmic techniques, using both the CPU and the GPU, to obtain better performance than any one technique alone. Experimental results show that algorithmic changes, and the varied use of both CPUs and GPUs, are necessary to obtain up to a 16.5x speedup over using a single program configuration for all architectures.

...read moreread less

94 citations

Patent•

Methods systems and apparatus for sharing information among a group of vehicles

[...]

Denis R. Burke¹, Keith A. Fry¹, Daniel E. Rudman¹, Shane M. McCutchen¹, Matthew N. Hovey¹ - Show less +1 more•Institutions (1)

General Motors¹

4 Apr 2013

TL;DR: In this paper, a group of vehicles with an onboard computer system that includes a computer processor, a receiver, a transmitter, and a tangible, non-transitory computer-readable storage medium is described.

...read moreread less

Abstract: Computer-implemented methods, systems and apparatus are provided for sharing information between a group of vehicles that includes a first vehicle and one or more other vehicles. Each of the vehicles includes an onboard computer system that includes a computer processor, a receiver, a transmitter, and a tangible, non-transitory computer-readable storage medium. The storage medium stores instructions that, when executed by the processor, cause the processor to perform various acts. In accordance with the method, an onboard computer at the first vehicle generates information and automatically communicates that information to the other vehicles of the group of vehicles. Upon receiving this information at each of the other vehicles, the information can be processed at the onboard computers of each of the other vehicles, and then presented via human machine interfaces that are disposed within or inside each of the other vehicles.

...read moreread less

85 citations

Patent•

Electronic inhalation device

[...]

Christopher Lord

9 Oct 2013

TL;DR: An electronic inhalation device comprising a power cell and a computer is described in this article, where the computer is configured in use to enter a menu mode when a user activates the menu mode.

...read moreread less

Abstract: An electronic inhalation device comprising a power cell and a computer. The computer comprises a computer processor, a memory and an input-output means. The computer is configured in use to enter a menu mode when a user activates the menu mode.

...read moreread less

79 citations

Proceedings Article•10.1145/2508859.2516704•

ShadowReplica: efficient parallelization of dynamic data flow tracking

[...]

Kangkook Jee¹, Vasileios P. Kemerlis¹, Angelos D. Keromytis¹, Georgios Portokalidis²•Institutions (2)

Columbia University¹, Stevens Institute of Technology²

4 Nov 2013

TL;DR: ShadowReplica is a new and efficient approach for accelerating DFT and other shadow memory-based analyses, by decoupling analysis from execution and utilizing spare CPU cores to run them in parallel, enabling a heavyweight technique, like dynamic taint analysis (DTA), twice as fast, while concurrently consuming fewer CPU cycles than when applying it in-line.

...read moreread less

Abstract: Dynamic data flow tracking (DFT) is a technique broadly used in a variety of security applications that, unfortunately, exhibits poor performance, preventing its adoption in production systems. We present ShadowReplica, a new and efficient approach for accelerating DFT and other shadow memory-based analyses, by decoupling analysis from execution and utilizing spare CPU cores to run them in parallel. Our approach enables us to run a heavyweight technique, like dynamic taint analysis (DTA), twice as fast, while concurrently consuming fewer CPU cycles than when applying it in-line. DFT is run in parallel by a second shadow thread that is spawned for each application thread, and the two communicate using a shared data structure. We avoid the problems suffered by previous approaches, by introducing an off-line application analysis phase that utilizes both static and dynamic analysis methodologies to generate optimized code for decoupling execution and implementing DFT, while it also minimizes the amount of information that needs to be communicated between the two threads. Furthermore, we use a lock-free ring buffer structure and an N-way buffering scheme to efficiently exchange data between threads and maintain high cache-hit rates on multi-core CPUs. Our evaluation shows that ShadowReplica is on average ~2.3× faster than in-line DFT (~2.75× slowdown over native execution) when running the SPEC CPU2006 benchmark, while similar speed ups were observed with command-line utilities and popular server software. Astoundingly, ShadowReplica also reduces the CPU cycles used up to 30%.

...read moreread less

68 citations

Patent•

Vehicle safety driving early warning method based on vehicle internet

[...]

Tian Daxin, Wang Yunpeng, Jian Wang, Jianshan Zhou, Guangquan Lu, Yu Guizhen, Zhu Zhifu, Hao Luo, Yong Yuan - Show less +5 more

17 Jul 2013

TL;DR: In this paper, a vehicle safety driving early warning method based on vehicle internet is proposed, where vehicle information data is sent to a network background server through a vehicle-mounted wireless communication unit.

...read moreread less

Abstract: The invention discloses a vehicle safety driving early warning method based on vehicle internet. Vehicle information data is sent to a network background server through a vehicle-mounted wireless communication unit, the network background server acquires information data of dangerous points on a road section where a vehicle is located according to the vehicle information data and sends the information data at the dangerous points on the road section to the vehicle-mounted wireless communication unit, a vehicle-mounted central processing unit compares obtained dangerous point information with self-vehicle information, a safety distance model is called to generate early warning parameters, and a vehicle-mounted alarm unit, a vehicle-mounted display unit and an electronic control braking system are controlled to perform early warning response. The vehicle safety driving early warning method based on the vehicle internet has the advantage that a network background server terminal provides the dangerous point information for a vehicle-mounted terminal, the vehicle-mounted terminal acquires and analyzes vehicle geographical position information and the dangerous point information on the road section where the vehicle is located in real time, the vehicle-mounted alarm unit, the vehicle-mounted display unit and the electronic control braking system are controlled to perform the early warning response, and the vehicle safety driving early warning method based on the vehicle internet has reliability and high efficiency of vehicle network data communication.

...read moreread less

Patent•

Storage system employing mram and array of solid state disks with integrated switch

[...]

Mehdi Asnaashari, Siamack Nemazie, Anilkumar Mandapuram

19 Aug 2013

TL;DR: In this article, a high availability storage system includes a first storage system and a second storage system, which are coupled to a central processing unit (CPU), a first physically-addressed solid state disk (SSD), and a first nonvolatile memory module that is coupled to the first CPU.

...read moreread less

Abstract: A high-availability storage system includes a first storage system and a second storage system. The first storage system includes a first Central Processing Unit (CPU), a first physically-addressed solid state disk (SSD) and a first non-volatile memory module that is coupled to the first CPU. Similarly, the second storage system includes a second CPU and a second SSD. Upon failure of one of the first or second CPUs, or the storage system with the non-failing CPU continues to be operational and the storage system with the failed CPU is deemed inoperational and the first and second SSDs remain accessible.

...read moreread less

Patent•

Rendering to multi-resolution hierarchies

[...]

Eric B. Lum¹, Henry Packard Moreton¹•Institutions (1)

Nvidia¹

16 Aug 2013

TL;DR: In this paper, the authors present techniques for processing a multi-resolution hierarchy, where an application configures a ROP unit to render all the levels included in the multiresolution hierarchy to a single composite render target.

...read moreread less

Abstract: One embodiment of the present invention includes techniques for processing a multi-resolution hierarchy, where an application configures a ROP unit to render all the levels included in the multi-resolution hierarchy to a single composite render target. The ROP unit renders memory pages to the composite render target in pitch order. In contrast, the texture unit accesses the composite render target with memory pages in pitch order for each level of the hierarchy. The application configures the MMU to ensure that the composite render target is correctly interpreted by the texture unit. Notably, the MMU translates ROP unit virtual addresses and texture unit virtual addresses using different mapping strategies to the same physical address space. One advantage of the disclosed embodiments is that rendering to the multi-resolution hierarchy does not require the CPU to execute the state parameter changes that are associated with rendering the different hierarchical levels using prior-art techniques.

...read moreread less

Journal Article•10.1002/NME.4452•

Generation of large finite-element matrices on multiple graphics processors

[...]

Adam Dziekonski¹, P. Sypek¹, Adam Lamecki¹, Michal Mrozowski¹•Institutions (1)

Gdańsk University of Technology¹

13 Apr 2013-International Journal for Numerical Methods in Engineering

TL;DR: This paper proposes to generate the large sparse linear systems arising in finite‐element analysis in an iterative manner on several GPUs and to use the graphics accelerators concurrently with CPUs performing collection and addition of the matrix fragments using a fast multithreaded procedure.

...read moreread less

Abstract: SUMMARY This paper presents techniques for generating very large finite-element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite-element analysis in an iterative manner on several GPUs and to use the graphics accelerators concurrently with CPUs performing collection and addition of the matrix fragments using a fast multithreaded procedure. The scheduling of the threads is organized in such a way that the CPU operations do not affect the performance of the process, and the GPUs are idle only when data are being transferred from GPU to CPU. This approach is verified on two workstations: the first consists of two 6-core Intel Xeon X5690 processors with two Fermi GPUs: each GPU is a GeForce GTX 590 with two graphics processors and 1.5 GB of fast RAM; the second workstation is equipped with two Tesla C2075 boards carrying 6 GB of RAM each and two 12-core Opteron 6174s. For the latter setup, we demonstrate the fast generation of sparse finite-element matrices as large as 10 million unknowns, with over 1 billion nonzero entries. Comparing with the single-threaded and multithreaded CPU implementations, the GPU-based version of the algorithm based on the ideas presented in this paper reduces the finite-element matrix-generation time in double precision by factors of 100 and 30, respectively. Copyright © 2012 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•10.1088/0957-4484/25/28/285201•

Dynamic Computing Random Access Memory

[...]

Fabio L. Traversa, Fabrizio Bonani, Yuriy V. Pershin, Massimiliano Di Ventra

26 Jun 2013-arXiv: Emerging Technologies

TL;DR: It is shown that DCRAM provides massively-parallel and polymorphic digital logic, namely it allows for different logic operations with the same architecture, by varying only the control signals, and therefore can really serve as an alternative to the present computing technology.

...read moreread less

Abstract: The present von Neumann computing paradigm involves a significant amount of information transfer between a central processing unit (CPU) and memory, with concomitant limitations in the actual execution speed. However, it has been recently argued that a different form of computation, dubbed memcomputing [Nature Physics, 9, 200-202 (2013)] and inspired by the operation of our brain, can resolve the intrinsic limitations of present day architectures by allowing for computing and storing of information on the same physical platform. Here we show a simple and practical realization of memcomputing that utilizes easy-to-build memcapacitive systems. We name this architecture Dynamic Computing Random Access Memory (DCRAM). We show that DCRAM provides massively-parallel and polymorphic digital logic, namely it allows for different logic operations with the same architecture, by varying only the control signals. In addition, by taking into account realistic parameters, its energy expenditures can be as low as a few fJ per operation. DCRAM is fully compatible with CMOS technology, can be realized with current fabrication facilities, and therefore can really serve as an alternative to the present computing technology.

...read moreread less

Journal Article•10.1016/J.CAGEO.2013.05.009•

3D seismic reverse time migration on GPGPU

[...]

Guofeng Liu¹, Yaning Liu¹, Li Ren¹, Xiaohong Meng¹•Institutions (1)

China University of Geosciences (Beijing)¹

01 Sep 2013-Computers & Geosciences

TL;DR: This paper presents a fast and computationally inexpensive implementation of RTM using a NVIDIA general purpose graphic processing unit (GPGPU) powered with Compute Unified Device Architecture (CUDA) and introduces a random velocity boundary in the source propagation kernel.

...read moreread less

Journal Article•10.1088/0031-9155/58/11/3705•

Performance-optimized clinical IMRT planning on modern CPUs.

[...]

P Ziegenhein¹, C P Kamerling¹, Mark Bangert¹, Julian M. Kunkel, Uwe Oelfke¹ - Show less +1 more•Institutions (1)

German Cancer Research Center¹

08 May 2013-Physics in Medicine and Biology

TL;DR: This work presents an ultra-fast, high precision implementation of the inverse plan optimization problem using a quasi-Newton method on pre-calculated dose influence data sets and redefined the classical optimization algorithm to achieve a minimal runtime and high scalability on CPUs.

...read moreread less

Abstract: Intensity modulated treatment plan optimization is a computationally expensive task. The feasibility of advanced applications in intensity modulated radiation therapy as every day treatment planning, frequent re-planning for adaptive radiation therapy and large-scale planning research severely depends on the runtime of the plan optimization implementation. Modern computational systems are built as parallel architectures to yield high performance. The use of GPUs, as one class of parallel systems, has become very popular in the field of medical physics. In contrast we utilize the multi-core central processing unit (CPU), which is the heart of every modern computer and does not have to be purchased additionally. In this work we present an ultra-fast, high precision implementation of the inverse plan optimization problem using a quasi-Newton method on pre-calculated dose influence data sets. We redefined the classical optimization algorithm to achieve a minimal runtime and high scalability on CPUs. Using the proposed methods in this work, a total plan optimization process can be carried out in only a few seconds on a low-cost CPU-based desktop computer at clinical resolution and quality. We have shown that our implementation uses the CPU hardware resources efficiently with runtimes comparable to GPU implementations, at lower costs.

...read moreread less

Patent•

Dynamic voltage frequency scaling method and apparatus.

[...]

Hwang Sub Lee¹, Jong Lae Park¹•Institutions (1)

Samsung¹

2 Aug 2013

TL;DR: In this paper, a DVFS table may be selectively used according to any of various scenarios or modes of operation, thereby performing DVFS control in a system-on-a-chip (SOC).

...read moreread less

Abstract: System-on-a-chip (SOC) includes a dynamic voltage frequency scaling (DVFS) control unit; and a central processing unit (CPU) to operate the DVFS control unit. By using the DVFS control unit, a DVFS table may be selectively used according to any of various scenarios or modes of operation, thereby performing DVFS control.

...read moreread less

Patent•

Distributed, real-time online analytical processing (olap)

[...]

Damon Lanphear, Prabuddha Biswas

26 Jun 2013

TL;DR: In this paper, an event stream is parsed and supplemented with additional data from reference data sources, producing an enriched event stream from the parsed event stream data, which is then partitioned into data fields designated as a dimension partition and a metric partition, and the stored storage keys are then aggregated onto a two-dimensional coordinate vector such that, if the computer processor identifies a permuted storage key having metric values for which a corresponding storage key already exists in the database, then the computer processors aggregates the metric values of the identified storage key with the metric value of the corresponding storage

...read moreread less

Abstract: Source data of an event stream is parsed and supplemented with additional data from reference data sources, producing an enriched event stream from the parsed event stream data. The data records of the enriched event stream are partitioned into data fields designated as a dimension partition and a metric partition, which are partitioned into sub-dimension projections mapped to a plurality of storage keys, such that each of the storage keys includes one or more placeholder wildcard values and each of the storage keys is stored into a database of the computer system by the computer processor. The stored storage keys are then aggregated onto a two-dimensional coordinate vector such that, if the computer processor identifies a permuted storage key having metric values for which a corresponding storage key already exists in the database, then the computer processor aggregates the metric values of the identified storage key with the metric values of the corresponding storage key, and if the computer processor does not identify the permuted storage key as having a corresponding storage key that already exists in the database, then the computer processor writes the metric values of the permuted storage key into the database, comprising initial values for the key combination of dimension values.

...read moreread less

Book Chapter•10.1007/978-3-642-38980-1_18•

Hardware architectures for MSP430-based wireless sensor nodes performing elliptic curve cryptography

[...]

Erich Wenger¹•Institutions (1)

Graz University of Technology¹

25 Jun 2013

TL;DR: This paper presents an architecture that drops a hardware accelerator between CPU and RAM, so neither the CPU nor the data memory need to be modified, and shows that the drop-in concept is smaller than the dedicated hardware module, while achieving similarly fast runtimes.

...read moreread less

Abstract: Maximizing the battery lifetime of wireless sensor nodes and equipping them with elliptic curve cryptography is a challenge that requires new energy-saving architectures. In this paper, we present an architecture that drops a hardware accelerator between CPU and RAM. Thus neither the CPU nor the data memory need to be modified. In a detailed comparison with a software-only and a dedicated hardware architecture, we show that the drop-in concept is smaller than the dedicated hardware module, while achieving similarly fast runtimes. Most interesting for micro-chip manufacturers is that only 4 kGE of chip area need to be committed for the dedicated drop-in accelerator.

...read moreread less

Patent•

Multifunctional entrance guard device

[...]

Tang Chunhua

4 Sep 2013

TL;DR: In this article, a multifunctional entrance guard device consisting of a main controller and a cell phone remote terminal is presented, wherein the main controller is provided with an indoor part, an outdoor part, and a remote control part; the indoor part comprises a central processing unit, a doorbell, an indoor voice module, a video display, and an indoor MIC (Microphone), an electronic door lock, and the door opening press button and a data storage device.

...read moreread less

Abstract: The invention discloses a multifunctional entrance guard device. The multifunctional entrance guard device comprises a main controller and a cell phone remote terminal, wherein the main controller is provided with an indoor part, an outdoor part and a remote control part; the indoor part comprises a central processing unit, a doorbell, an indoor voice module, a video display, an indoor MIC (Microphone), an electronic door lock, a door opening press button and a data storage device; the outdoor part comprises a doorbell/password key, an indoor voice module, a camera, an infrared inductive probe and an outdoor MIC; the remote control part comprises a wireless receiving-transmitting module and a remote control press key; and the cell phone remote terminal is provided with a remote control end central processing unit, a remote control end data storage device, a remote control end wireless receiving-transmitting module and a remote control end control output module. The entrance guard device disclosed by the invention can realize intelligent human face identification and decoding prevention based on a human face identification system and a remote wireless control system of the central processing unit; and meanwhile, the entrance guard device can also be applied to a cell phone to carry out remote control.

...read moreread less

Proceedings Article•10.1145/2465351.2465366•

Whose cache line is it anyway?: operating system support for live detection and repair of false sharing

[...]

Mihir Nanavati¹, Mark Spear¹, Nathan G. Taylor¹, Shriram Rajagopalan¹, Dutch T. Meyer¹, William Aiello¹, Andrew Warfield¹ - Show less +3 more•Institutions (1)

University of British Columbia¹

15 Apr 2013

TL;DR: Plastic, a software-based system that detects, diagnoses, and transparently repairs false sharing as it occurs in running applications, is described, capable of rapid, low-overhead detection and diagnosis of false sharing in unmodified, running applications.

...read moreread less

Abstract: As hardware parallelism continues to increase, CPU caches can no longer be considered as a transparent, hardware-level performance optimization. Cache impact on performance, in particular in the face of false sharing, is completely dependent on the software that is executing. To effectively support parallel workloads on cache coherent hardware, the operating system must begin to treat the CPU cache like other shared hardware resources, and manage it appropriately.We demonstrate a prototype example of such support by describing Plastic, a software-based system that detects, diagnoses, and transparently repairs false sharing as it occurs in running applications. Plastic solves two challenging problems. First, it is capable of rapid, low-overhead detection and diagnosis of false sharing in unmodified, running applications. Second, it resolves identified instances of false sharing by providing a sub-page granularity memory remapping facility within the system. Our implementation is capable of identifying and repairing pathological false sharing in under one second of execution and achieves speedups of 3-6x on known examples of false sharing in parallel benchmarks.

...read moreread less

Patent•

Software performance optimization method based on central processing unit (CPU) multi-core platform

[...]

Wu Qing, Zhang Qing

17 Apr 2013

TL;DR: In this paper, a software performance optimization method based on a CPU multi-core platform is presented, which is widely applicable to application occasions with multi-thread parallel processing requirements, software developers are guided to perform multithread parallel optimization improvement on prior software rapidly and efficiently with short developing periods and low developing costs.

...read moreread less

Abstract: The invention provides a software performance optimization method based on a CPU multi-core platform. The method comprises software characteristic analysis, parallel optimization scheme formulation and parallel optimization scheme implementation and iteration tuning. Particularly, the method comprises application software characteristic analysis, serial algorithm analysis, CPU multi-in/thread parallel algorithm design, multi-buffer design, design of communication modes among threads, memory access optimization, cache optimization, processor vectorization optimization, mathematical function library optimization and the like. The method is widely applicable to application occasions with multi-thread parallel processing requirements, software developers are guided to perform multi-thread parallel optimization improvement on prior software rapidly and efficiently with short developing periods and low developing costs, the utilization of system resources by software is optimized, data reading and computing and mutual masking of write-back data are achieved, the software running time is shortened furthest, the hardware resource utilization rate is improved apparently, and the software computing efficiency and the software whole performance are enhanced.

...read moreread less

Patent•

System and methods for cpu copy protection of a computing device

[...]

Michael Kiperberg, Amit Resh, Nezer Jacob Zaidenberg

24 Nov 2013

TL;DR: In this paper, the authors present techniques for system and methods for software-based management of protected data-blocks insertion into the memory cache mechanism of a computerized device, in particular the disclosure relates to preventing protected data blocks from being altered and evicted from the CPU cache coupled with buffered software execution.

...read moreread less

Abstract: The present disclosure relates to techniques for system and methods for software-based management of protected data-blocks insertion into the memory cache mechanism of a computerized device. In particular the disclosure relates to preventing protected data blocks from being altered and evicted from the CPU cache coupled with buffered software execution. The technique is based upon identifying at least one conflicting data-block having a memory mapping indication to a designated memory cache-line and preventing the conflicting data-block from being cached. Functional characteristics of the software product of a vendor, such as gaming or video, may be partially encrypted to allow for protected and functional operability and avoid hacking and malicious usage of non-licensed user.

...read moreread less

Patent•

Intelligent power supply socket and control system

[...]

Wang Donglin, Xu Jianming

20 Nov 2013

TL;DR: In this paper, the authors proposed an intelligent power supply socket and a control system consisting of a central processing unit, an electric energy metering module, a WiFi (wireless fidelity) module, control switch, display module, wheel display button and a sampling circuit.

...read moreread less

Abstract: The invention relates to an intelligent power supply socket and a control system. The intelligent power supply socket comprises a central processing unit, an electric energy metering module, a WiFi (wireless fidelity) module, a control switch, a display module, a wheel display button and a sampling circuit; the central processing unit is respectively connected with the electric energy metering module, the WiFi module, the control switch, the display module and the wheel display button; the sampling circuit is connected with the electric energy metering module and is used for sampling current and voltage and inputting the sampled current and voltage into the electric energy metering module; the electric energy metering module obtains the electric energy according to the current and the voltage; the display module is used for displaying the electric energy; and the central processing unit controls the on/off of the control switch according to the input signal of the WiFi module. The intelligent power supply socket and the control system can compute the electric energy by the electric energy metering module, controls the on/off of the control switch by the input signal of the WiFi module, can display the electric energy through the display module and the like, and is multifunctional and convenient to control.

...read moreread less

Patent•

Systems and methods for high fidelity multi-modal out-of-band biometric authentication through vector-based multi-profile storage

[...]

Eren Kursun¹, Gene Fernandez, Alex Berson, Brian Goodman•Institutions (1)

JPMorgan Chase¹

12 Jul 2013

TL;DR: In this paper, a method for generating multiple biometric profiles for a user is disclosed, which may include (1) receiving data from a user, the data comprising biometric data for user and device specifications for the electronic device; (2) at least one computer processor retrieving and updating existing user profiles; (3) determining whether the data is consistent with one of the existing profiles.

...read moreread less

Abstract: A method for generating multiple biometric profiles for a user is disclosed. According to one embodiment, the method may include (1) receiving data from a user, the data comprising biometric data for a user and device specifications for the electronic device; (2) at least one computer processor retrieving at least one existing user profile; (3) the at least one computer processor determining whether the data is consistent with at least one of the existing profiles; and (4) the at least one computer processor updating at least one existing profile if the data is consistent with the existing profile.

...read moreread less

Patent•

Control method and electronic device

[...]

Qian Zhao, Guowen Zhang, Yue Liu

24 Jul 2013

TL;DR: In this paper, a control method is applied to an electronic device including a CPU to regulate the maximum operating frequency of the CPU adaptively based on the current state of the electronic device.

...read moreread less

Abstract: A control method and an electronic device are disclosed in the application. The control method is applied to an electronic device including a CPU. The method includes: obtaining a current state of the electronic device; judging whether the current state is a first or a second state; generating a first control instruction in the case that the current state is the first state, or generating a second control instruction in the case that the current state is the second state; performing the first control instruction to control the operating frequency of the CPU within the first maximum operating frequency or performing the second control instruction to control the operating frequency of the CPU within the second maximum operating frequency. Enabling to regulate the maximum operating frequency of the CPU adaptively based on the current state of the electronic device.

...read moreread less

Patent•

Application program background process management method and device

[...]

Liu Yudong, Huang Yonghua, Zeng Hongyan, Zhang Yinxiang, Chen Peisi, Lin Zhiyong, Guan Mingchi - Show less +3 more

20 Mar 2013

TL;DR: In this paper, an application program background process management method for mobile terminals is presented, which comprises the steps of switching a process which is run by a system foreground and provided with a program to a system background after receiving a return instruction of the application program, suspending the process run in the system background, and recovering the process of the program corresponding to a process recovery instruction.

...read moreread less

Abstract: The invention is suitable for the technical field of mobile terminals, and provides an application program background process management method and a device. The application program background process management method comprises the steps of switching a process which is run by a system foreground and provided with an application program to a system background after receiving a return instruction of the application program, suspending the process run in the system background, and recovering the process of the application program corresponding to a process recovery instruction and switching the process recovery instruction to be run in the foreground after receiving the process recovery instruction. The application program background process management method and the device is capable of suspending the process which is run in the background and provided with the application program, and limiting permissions of the processes which can not be called and executed by central processing unit (CPU). Therefore, CPU resources and electric quantity are saved, and sometimes waste of additional traffic is avoided.

...read moreread less

Patent•

System and method of electromigration mitigation in stacked ic designs

[...]

Chi-Yeh Yu¹, Chung-min Fu¹, Ping-Heng Yeh¹•Institutions (1)

TSMC¹

10 Dec 2013

TL;DR: In this article, a computer implemented method comprises accessing a 3D-IC model stored in a tangible, non-transitory machine readable medium, processing the model in a computer processor to generate a temperature map containing temperatures at a plurality of points of the 3D IC under the operating condition; identifying an electromigration (EM) rating factor, and calculating and outputting from the processor data representing a temperature-dependent EM current constraint at each point.

...read moreread less

Abstract: A computer implemented method comprises accessing a 3D-IC model stored in a tangible, non-transitory machine readable medium, processing the model in a computer processor to generate a temperature map containing temperatures at a plurality of points of the 3D-IC under the operating condition; identifying an electromigration (EM) rating factor, and calculating and outputting from the processor data representing a temperature-dependent EM current constraint at each point.

...read moreread less

Proceedings Article•

Fabrication of a 99%-energy-less nonvolatile multi-functional CAM chip using hierarchical power gating for a massively-parallel full-text-search engine

[...]

Shoun Matsunaga¹, Noboru Sakimura², Ryusuke Nebashi², Yukihide Tsuji², A. Morioka², Tadahiko Sugibayashi², Sadahiko Miura², Hiroaki Honjo², Keizo Kinoshita¹, Hiroki Sato¹, Shunsuke Fukami¹, Masanori Natsui¹, Akira Mochizuki¹, Shoji Ikeda¹, Tetsuo Endoh¹, Hideo Ohno¹, Takahiro Hanyu¹ - Show less +13 more•Institutions (2)

Tohoku University¹, NEC²

11 Jun 2013

TL;DR: By the massively parallel comparison with the hierarchical power gating, energy consumption of the proposed search engine is reduced within 1% in compared with the conventional CPU-based full-text search system, where repetitive comparisons between the CPU and a memory consume much energy.

...read moreread less

Abstract: A ternary content-addressable memory (TCAM)-based hardware called nonvolatile “multi-functional CAM (MF-CAM)” is proposed for an ultra-low-energy “full-text search” system in recent data centers. The proposed nonvolatile MF-CAM-based full-text search engine can perform parallel comparison while eliminating leakage energy by hierarchical power gating. By the massively parallel comparison with the hierarchical power gating, energy consumption of the proposed search engine is reduced within 1% in comparison with the conventional CPU-based full-text search system, where repetitive comparisons between the CPU and a memory consume much energy.

...read moreread less

...

Expand