Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Central processing unit
  4. 2006
  1. Home
  2. Topics
  3. Central processing unit
  4. 2006
Showing papers on "Central processing unit published in 2006"
GPU-based Video Feature Tracking And Matching

[...]

Sudipta N. Sinha, Jan-Michael Frahm, Marc Pollefeys, Yakup Genc
1 May 2006
TL;DR: Novel implementations of the KLT feature track- ing and SIFT feature extraction algorithms that run on the graphics processing unit (GPU) and is suitable for video analysis in real-time vision systems.
Abstract: This paper describes novel implementations of the KLT feature track- ing and SIFT feature extraction algorithms that run on the graphics processing unit (GPU) and is suitable for video analysis in real-time vision systems. While significant acceleration over standard CPU implementations is obtained by ex- ploiting parallelism provided by modern programmable graphics hardware, the CPU is freed up to run other computations in parallel. Our GPU-based KLT im- plementation tracks about a thousand features in real-time at 30 Hz on 1024 £ 768 resolution video which is a 20 times improvement over the CPU. It works on both ATI and NVIDIA graphics cards. The GPU-based SIFT implementation works on NVIDIA cards and extracts about 800 features from 640 £ 480 video at 10Hz which is approximately 10 times faster than an optimized CPU implementation.

382 citations

Proceedings Article•10.1145/1188455.1188567•
Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs

[...]

Min Yeol Lim1, Vincent W. Freeh, David K. Lowenthal2•
North Carolina State University1, University of Georgia2
11 Nov 2006
TL;DR: An MPI runtime system that dynamically reduces CPU performance during communication phases in MPI programs and, without profiling or training, selects the CPU frequency in order to minimize energy-delay product is presented.
Abstract: Although users of high-performance computing are most interested in raw performance, both energy and power consumption have become critical concerns. Some microprocessors allow frequency and voltage scaling, which enables a system to reduce CPU performance and power when the CPU is not on the critical path. When properly directed, such dynamic frequency and voltage scaling can produce significant energy savings with little performance penalty. This paper presents an MPI runtime system that dynamically reduces CPU performance during communication phases in MPI programs. It dynamically identifies such phases and, without profiling or training, selects the CPU frequency in order to minimize energy-delay product. All analysis and subsequent frequency and voltage scaling is within MPI and so is entirely transparent to the application. This means that the large number of existing MPI programs, as well as new ones being developed, can use our system without modification. Results show that the average reduction in energy-delay product over the NAS benchmark suite is 10% - the average energy reduction is 12% while the average execution time increase is only 2.1%

202 citations

Patent•
Method and apparatus for parallel data preparation and processing of integrated circuit graphical design data

[...]

Daria R. Dooling1, Kenneth T. Settlemyer1, Jacek G. Smolinski1, Stephen D. Thomas1, Ralph J. Williams1 •
IBM1
27 Sep 2006
TL;DR: In this paper, a method for implementing an ORC process to facilitate physical verification of an integrated circuit (IC) graphical design is presented, which includes partitioning the IC graphical design data into files by a host machine such that the files correspond to regions of interest or partitions with defined margins, dispersing the partitioned data files to available cpus within the network.
Abstract: A method for implementing an ORC process to facilitate physical verification of an integrated circuit (IC) graphical design. The method includes partitioning the IC graphical design data into files by a host machine such that the files correspond to regions of interest or partitions with defined margins, dispersing the partitioned data files to available cpus within the network, processing of each job by the cpu receiving the file, wherein artifacts arising from bisection of partitioning margins during the partitioning, including cut-induced false errors, are detected and removed, and the shape-altering effects of such artifact errors are minimized and transmitting the results of processing at each cpu to the host machine for aggregate processing.

155 citations

Proceedings Article•10.1109/RTSS.2006.48•
System-Level Energy Management for Periodic Real-Time Tasks

[...]

Hakan Aydin1, Vinay Devadas1, Dakai Zhu2•
George Mason University1, University of Texas at San Antonio2
5 Dec 2006
TL;DR: This paper forms the system-wide energy management problem as a non-linear optimization problem and provides a polynomial-time solution that provides significant gains over the previous solutions that focused on dynamic CPU power at the expense of ignoring other power components.
Abstract: In this paper, we consider the system-wide energy management problem for a set of periodic real-time tasks running on a DVS-enabled processor. Our solution uses a generalized power model, in which frequency-dependent and frequency-independent power components are explicitly considered. Further, variations in power dissipations and on-chip/off-chip access patterns of different tasks are encoded in the problem formulation. Using this generalized power model, we show that it is possible to obtain analytically the task-level energy-efficient speed below which DVS starts to affect overall energy consumption negatively. Then, we formulate the system-wide energy management problem as a non-linear optimization problem and provide a polynomial-time solution. We also provide a dynamic slack reclaiming extension which considers the effects of slow-down on the system-wide energy consumption. Our experimental evaluation shows that the optimal solution provides significant (up to 50%) gains over the previous solutions that focused on dynamic CPU power at the expense of ignoring other power components

154 citations

Proceedings Article•10.1109/RT.2006.280210•
Ray Tracing on the Cell Processor

[...]

Carsten Benthin, Ingo Wald1, Michael Scherbaum, Heiko Friedrich2•
University of Utah1, Saarland University2
1 Sep 2006
TL;DR: Using a combination of low-level optimized kernel routines, a streaming software architecture, explicit caching, and a virtual software-hyperthreading approach to hide DMA latencies, for a single cell a pure ray tracing performance of nearly one order of magnitude over that achieved by a commodity CPU is achieved.
Abstract: Over the last three decades, higher CPU performance has been achieved almost exclusively by raising the CPU's clock rate Today, the resulting power consumption and heat dissipation threaten to end this trend, and CPU designers are looking for alternative ways of providing more compute power In particular, they are looking towards three concepts: a streaming compute model, vector-like SIMD units, and multi-core architectures One particular example of such an architecture is the Cell Broadband Engine Architecture (CBEA), a multi-core processor that offers a raw compute power of up to 200 GFlops per 32 GHz chip The Cell bears a huge potential for compute-intensive applications like ray tracing, but also requires addressing the challenges caused by this processor's unconventional architecture In this paper, we describe an implementation of realtime ray tracing on a Cell Using a combination of low-level optimized kernel routines, a streaming software architecture, explicit caching, and a virtual software-hyperthreading approach to hide DMA latencies, we achieve for a single Cell a pure ray tracing performance of nearly one order of magnitude over that achieved by a commodity CPU

149 citations

Book Chapter•10.1016/B978-075068112-4/50019-X•
1 – Programmable logic controllers

[...]

W. Bolton
1 Jan 2006
TL;DR: This chapter presents an introduction to the programmable logic controller, its general function, hardware forms, and internal architecture.
Abstract: Publisher Summary This chapter presents an introduction to the programmable logic controller, its general function, hardware forms, and internal architecture. A programmable logic controller (PLC) is a special form of microprocessor-based controller that uses a programmable memory to store instructions and to implement functions such as logic, sequencing, timing, counting, and arithmetic to control machines and processes. It is designed to be operated by engineers with perhaps a limited knowledge of computers and computing languages. Input devices and output devices in a system being controlled are connected to the PLC. The operator enters a sequence of instructions, that is, a program, into the memory of the PLC. The controller then monitors the inputs and outputs according to this program and carries out the control rules for which it has been programmed. PLCs are now widely used and extend from small self-contained units for use with perhaps 20 digital inputs/outputs to modular systems, which can be used for large numbers of inputs/outputs, handle digital or analogue inputs/outputs, and also carry out proportional-integral-derivative control modes. A PLC system has the basic functional components of a processor unit, memory, power supply unit, input/output interface section, communications interface, and programming device. It consists of a central processing unit (CPU) containing the system microprocessor, memory, and input/output circuitry. The CPU controls and processes all the operations within the PLC.

145 citations

Proceedings Article•10.1145/1182807.1182809•
t-kernel: providing reliable OS support to wireless sensor networks

[...]

Lin Gu1, John A. Stankovic1•
University of Virginia1
31 Oct 2006
TL;DR: The t-kernel significantly enhances developers' ability to design reliable and sophisticated sensor networks, and includes several new design techniques, such as efficient binary translation on highly constrained sensor nodes, differentiated virtual memory without repeatedly writable swapping devices, and the protection of the OS from application errors without privileged execution hardware.
Abstract: The development of a reliable large-scale wireless sensor network (WSN) is very difficult because of resource constraints, energy budget, and demanding application requirements. Three OS features-OS protection, virtual memory, and preemptive scheduling-can significantly improve the reliability of WSN systems and facilitate developing complex WSN software. However, due to the lack of hardware support for privileged execution and address translation, it is impossible to implement these features with traditional OS design techniques. To solve this problem, we design a new OS kernel, the t-kernel, to perform extensive code modification at load time. The modified code and the OS work in a collaborative way supporting the aforementioned features. Having implemented the t-kernel on MICA2 motes, we evaluate its performance by measuring the overhead and execution speed. We analyze the CPU utilization of sensor network applications, and verify that, though CPU-bound tasks execute 1.5-3 times as long as in native mode, application performance under typical workloads does not noticeably degrade. The t-kernel significantly enhances developers' ability to design reliable and sophisticated sensor networks, and includes several new design techniques, such as efficient binary translation on highly constrained sensor nodes, differentiated virtual memory without repeatedly writable swapping devices, and the protection of the OS from application errors without privileged execution hardware.

140 citations

Patent•
Hardware implementation of network testing and performance monitoring in a network device

[...]

Nir Arad, Tsahi Daniel, Maxim Mondaeev
22 Mar 2006
TL;DR: In this article, the authors offload the generation and monitoring of test packets from a Central Processing Unit (CPU) to a dedicated network integrated circuit, such as a router, bridge or switch chip associated with the CPU.
Abstract: An embodiment of the present invention offloads the generation and monitoring of test packets from a Central processing Unit (CPU) to a dedicated network integrated circuit, such as a router, bridge or switch chip associated with the CPU. The CPU may download test routines and test data to the network IC, which then generates the test packets, identifies and handles received test packets, collects test statistics, and performs other test functions all without loading the CPU. The CPU may be notified when certain events occur, such as when throughput or jitter thresholds for the network are exceeded.

120 citations

Journal Article•10.1109/MM.2006.45•
Xbox 360 System Architecture

[...]

Jeffrey A. Andrews1, Nicholas R. Baker1•
Microsoft1
01 Mar 2006-IEEE Micro
TL;DR: The Xbox 360 contains an aggressive hardware architecture and implementation targeted at game console workloads that implements the product designers' goal of providing game developers a hardware platform to implement their next-generation game ambitions.
Abstract: This article covers the Xbox 360's high-level technical requirements, a short system overview, and details of the CPU and the GPU. The Xbox 360 contains an aggressive hardware architecture and implementation targeted at game console workloads. The core silicon implements the product designers' goal of providing game developers a hardware platform to implement their next-generation game ambitions. The core chips include the standard conceptual blocks of CPU, graphics processing unit (GPU), memory, and I/O. Each of these components and their interconnections are customized to provide a user-friendly game console product. The authors describe their architectural trade-offs and summarize the system's software programming support

96 citations

Patent•
Computer system and i/o bridge

[...]

Toshiaki Tarui1, Yoshiko Yasuda1•
Hitachi1
21 Jul 2006
TL;DR: In this article, the authors propose a virtual switch-based I/O switch for virtual machines to reduce overhead in achieving the goal of sharing of I/Os between virtual machines.
Abstract: PROBLEM TO BE SOLVED: To reduce overhead in achievement of sharing of I/O between virtual machines using a versatile I/O switch. SOLUTION: The system comprises a CPU module #0 including a plurality of CPU cores, an AS bridge 15 connected to the CPU cores, and a main storage accessible from the CPU cores or AS bridge 15, and AS switches SW0 and SW1 connecting the AS bridge 15 of the CPU module #0 to an I/O blade #5. The CPU module #0 has a hypervisor dividing the plurality of CPU cores and the main storage to a plurality of logical partitions, and the AS bridge 15 includes a virtual switch SWv1 a, when AS packets to be transmitted and received between the logical partitions and the I/O blade #5 are relayed, virtual route information set for each logical partition and route information from the AS bridge 15 to the I/O blade #5 to route information of the AS packets and switching the AS packets with the I/O blade #5 for each logical partition. COPYRIGHT: (C)2007,JPO&INPIT

84 citations

Patent•
System and method for authenticating an operating system to a central processing unit, providing the CPU/OS with secure storage, and authenticating the CPU/OS to a third party

[...]

Paul England1, John D. DeTreville1, Butler W. Lampson1•
Microsoft1
22 Dec 2006
TL;DR: In this article, a computer system has a central processing unit (CPU) and an operating system (OS), the CPU having a pair of private and public keys and a software identity register that holds an identity of the operating system.
Abstract: In accordance with certain aspects, a computer system has a central processing unit (CPU) and an operating system (OS), the CPU having a pair of private and public keys and a software identity register that holds an identity of the operating system. An OS certificate is created including the identity from the software identity register, information describing the operating system, and the CPU public key. The created OS certificate is signed using the CPU private key.
Temperature measurement in the Intel® CoreTM Duo Processor

[...]

Efraim Rotem, Jim San Jose Hermerding, Aviad Cohen, H. Cain
1 Jan 2006
TL;DR: The new Intel CoreTM Duo processor temperature sensing capability is introduced and performance benefits measurements and results are presented.
Abstract: Modern CPUs with increasing core frequency and power are rapidly reaching a point where the CPU frequency and performance are limited by the amount of heat that can be extracted by the cooling technology. In mobile environment, this issue is becoming more apparent, as form factors become thinner and lighter. Often, mobile platforms trade CPU performance in order to reduce power and manage thermals. This enables the delivery of high performance computing together with improved ergonomics by lowering skin temperature and reducing fan acoustic noise. Most of available high performance CPUs provide thermal sensor on the die to allow thermal management, typically in the form of analog thermal diode. Operating system algorithms and platform embedded controllers read the temperature and control the processor power. Improved thermal sensors directly translate into better system performance, reliability and ergonomics. In this paper we will introduce the new Intel Core Duo processor temperature sensing capability and present performance benefits measurements and results.
Patent•
Semiconductor Integrated circuit

[...]

Yutaka Shinagawa1, Takeshi Kataoka2, Eiichi Ishikawa1, Toshihiro Tanaka1, Kazumasa Yanagisawa1, Kazufumi Suzukawa1 •
Renesas Electronics1, NEC2
17 Oct 2006
TL;DR: In this article, a semiconductor integrated circuit (SIC) has a central processing unit and a rewritable nonvolatile memory area disposed in an address space of the SIC.
Abstract: A semiconductor integrated circuit has a central processing unit and a rewritable nonvolatile memory area disposed in an address space of the central processing unit. The nonvolatile memory area has a first nonvolatile memory area and a second nonvolatile memory area, which memorize information depending on the difference of threshold voltages. The first nonvolatile memory area has the maximum variation width of a threshold voltage for memorizing information set larger than that of the second nonvolatile memory area. When the maximum variation width of the threshold voltage for memorizing information is larger, since stress to a memory cell owing to a rewrite operation of memory information becomes larger, it is inferior in a point of guaranteeing the number of times of rewrite operation; however, since a read current becomes larger, a read speed of memory information can be expedited. The first nonvolatile memory area can be prioritized to expedite a read speed of the memory information and the second nonvolatile memory area can be prioritized in guaranteeing the number of times of rewrite operation of memory information more.
Proceedings Article•10.1109/PDCAT.2006.77•
Load Balancing in a Cluster Computer

[...]

P. Werstein1, Hailing Situ1, Zhiyi Huang1•
University of Otago1
4 Dec 2006
TL;DR: A load balancing algorithm for distributed use of a cluster computer that uses load information including CPU queue length, CPU utilisation, memory utilisation and network traffic to decide the load of each node is proposed.
Abstract: This paper proposes a load balancing algorithm for distributed use of a cluster computer It uses load information including CPU queue length, CPU utilisation, memory utilisation and network traffic to decide the load of each node This algorithm is compared to an algorithm using only the CPU queue length The performance evaluation results show that the proposed algorithm performs well
Patent•
DMA engine for protocol processing

[...]

Thomas Alexander1, Marc Quattromani1, Alexander David Rekow1•
PMC-Sierra1
10 Mar 2006
TL;DR: In this article, a DMA controller allocates a block within the associative buffer and loads the data into the allocated block, which is done under the control of the controller.
Abstract: A DMA engine, includes, in part, a DMA controller, an associative memory buffer, a request FIFO accepting data transfer requests from a programmable engine, such as a CPU, and a response FIFO that returns the completion status of the transfer requests to the CPU. Each request includes, in part, a target external memory address from which data is to be loaded or to which data is to be stored; a block size, specifying the amount of data to be transferred; and context information. The associative buffer holds data fetched from the external memory; and provides the data to the CPUs for processing. Loading into and storing from the associative buffer is done under the control of the DMA controller. When a request to fetch data from the external memory is processed, the DMA controller allocates a block within the associative buffer and loads the data into the allocated block.
Patent•
Processor and information processing method

[...]

Akihiko Tamura1, Katsuya Tanaka1•
Epson1
13 Jan 2006
TL;DR: In this paper, an external interrupt control section is used to prevent a unit processor that is not executing a task or the lowest priority task to execute an interrupt processing that was input.
Abstract: A multiprocessor is provided that efficiently processes high priority processing. A mobile telephone 1 comprises a CPU 10 having therein an external interrupt control section 11 that causes a unit processor that is not executing a task or the unit processor executing the lowest priority task to execute an interrupt processing that was input. Thus, interrupt processing that occurred can be executed within the CPU 10 without, as far as possible, reducing the capacity to process tasks. Accordingly, interrupt processing can be efficiently processed within the CPU 10 as a multiprocessor.
Journal Article•10.1109/TEMC.2006.882844•
Susceptibility of Personal Computer Systems to Fast Transient Electromagnetic Pulses

[...]

M. Camp, Heyno Garbe1•
Information Technology University1
20 Nov 2006-IEEE Transactions on Electromagnetic Compatibility
TL;DR: The major result is that susceptibility increases significantly with each computer generation.
Abstract: In this paper, the susceptibility of personal computer systems (mainboard class vary from 8088 processor based system up to Pentium III system) to fast transient electromagnetic pulses (EMP) with double exponential pulse shapes [EMP, ultra wideband (UWB)] is determined. The influence of computer generation, random access memory (RAM)-values, program states, and pulse shapes, as well as the destruction thresholds of single personal computer (PC)-components [central processing unit (CPU), RAM, basic input/output system (BIOS), mainboard] have been investigated. The major result is that susceptibility increases significantly with each computer generation
Proceedings Article•10.1109/FCCM.2006.40•
Enabling a Uniform Programming Model Across the Software/Hardware Boundary

[...]

Erik K. Anderson1, Jason Agron1, W. Peck1, Jim Stevens1, Fabrice Baijot1, Ed Komp1, Ron Sass1, David L. Andrews1 •
University of Kansas1
24 Apr 2006
TL;DR: The hardware thread interface (HWTI) component provides an abstract, platform independent compilation target that enables thread and instruction-level parallelism across the software/hardware boundary.
Abstract: In this paper, we present hthreads, a unifying programming model for specifying application threads running within a hybrid CPU/FPGA system. Threads are specified from a single pthreads multithreaded application program and compiled to run on the CPU or synthesized to run on the FPGA. The hthreads system, in general, is unique within the reconfigurable computing community as it abstracts the CPU/FPGA components into a unified custom threaded multiprocessor architecture platform. To support the abstraction of the CPU/FPGA component boundary, we have created the hardware thread interface (HWTI) component that frees the designer from having to specify and embed platform specific instructions to form customized hardware/ software interactions. Instead, the hardware thread interface supports the generalized pthreads API semantics, and allows passing of abstract data types between hardware and software threads. Thus the hardware thread interface provides an abstract, platform independent compilation target that enables thread and instruction-level parallelism across the software/hardware boundary.
Patent•
Wireless mesh networking in wagering game environments

[...]

Mark B. Gagner, Daniel Norman St. John, Dale R. Buchholz
14 Jul 2006
TL;DR: In this article, the authors describe a wireless mesh network in a gaming environment, which includes a network interface unit to wirelessly receive gaming data from ones of a plurality of components of a WSN.
Abstract: Systems and methods for wireless mesh networking in a gaming environment are described herein. In one embodiment, the system includes a network interface unit to wirelessly receive gaming data from ones of a plurality of components of a wireless mesh network, the network interface unit to wirelessly transmit the gaming data to others of the plurality of components of the wireless mesh network. The system also includes a memory unit to store certain of the gaming data and to store instructions for conducting wagering games and a central processing unit to perform operations based in part on the certain of the gaming data and to perform operations based on the instructions.
Proceedings Article•10.1109/HPCA.2006.1598112•
A decoupled KILO-instruction processor

[...]

Miquel Pericas, A. Cristal, Ruben Gonzalez, Daniel A. Jimenez, Mateo Valero 
27 Feb 2006
TL;DR: It is demonstrated that a decoupled microarchitecture, using small structures and many in-order components, can achieve the same performance as much more aggressive proposals while minimizing design complexity.
Abstract: Building processors with large instruction windows has been proposed as a mechanism for overcoming the memory wall, but finding a feasible and implementable design has been an elusive goal. Traditional processors are composed of structures that do not scale to large instruction windows because of timing and power constraints. However, the behavior of programs executed with large instruction windows gives rise to a natural and simple alternative to scaling. We characterize this phenomenon of execution locality and propose a microarchitecture to exploit it to achieve the benefit of a large instruction window processor with low implementation cost. Execution locality is the tendency of instructions to exhibit high or low latency based on their dependence on memory operations. In this paper we propose a decoupled microarchitecture that executes low latency instructions on a cache processor and high latency instructions on a memory processor. We demonstrate that such a design, using small structures and many in-order components, can achieve the same performance as much more aggressive proposals while minimizing design complexity.
Patent•
Method and apparatus for determining

[...]

David Holmes
16 Mar 2006
TL;DR: In this paper, a method for image recognition of a material object that utilizes graphical modeling of the corner points of a vertex which includes projecting a point on a digital display to an inward depth, a one half pixel distance in the plane of the display, with a conic to the digital display, and a square block containing one half size child blocks that are scaled to depth, projecting the corners of a node and replacing the bisecting points of edge features detected in a digital displays scaled at an increasing rate of congruency to the dimensions of an object.
Abstract: A method for image recognition of a material object that utilizes graphical modeling of the corner points of a vertex which includes projecting a point on a digital display to an inward depth, a one half pixel distance in the plane of the display, with a conic to a digital display, and a square block containing one half size child blocks that are scaled to depth, projecting the corner points of a vertex and replacing the bisecting points of edge features detected in a digital display scaled at an increasing rate of congruency to the dimensions of an object. The method may further include producing a digital image of the material object, providing a central processing unit, providing memory associated with a central processing unit; providing a display associated with a central processing unit; loading the digital image into the memory; defining the edges of features within the digital image; and a finding fight crucial points from registrations projected on to an edge feature display.
Patent•
GPU pipeline multiple level synchronization controller processor and method

[...]

Timour Paltashev1, Hsilin Huang1, Boris Prokopenko1, Qunfeng (Fred) Liao1•
VIA Technologies1
25 Oct 2006
TL;DR: In this paper, the authors propose a method for high level synchronization between an application and a graphics pipeline, which comprises receiving an application instruction in an input stream at a predetermined component, such as a command stream processor (CSP), as sent by a central processing unit.
Abstract: A method for high level synchronization between an application and a graphics pipeline comprises receiving an application instruction in an input stream at a predetermined component, such as a command stream processor (CSP), as sent by a central processing unit. The CSP may have a first portion coupled to a next component in the graphics pipeline and a second portion coupled to a plurality of components of the graphics pipeline. A command associated with the application instruction may be forwarded from the first portion to the next component in the graphics pipeline or some other component coupled thereto. The command may be received and thereafter executed. A response may be communicated on a feedback path to the second portion of the CSP. Nonlimiting exemplary application instructions that may be received and executed by the CSP include check surface fault, trap, wait, signal, stall, flip, and trigger.
Patent•
Parallel processing method and system, for instance for supporting embedded cluster platforms, computer program product therefor

[...]

Diego Melpignano1, David Siorpaes1, Paolo Zambotti1, Antonio Maria Borneo1•
STMicroelectronics1
18 Apr 2006
TL;DR: In this article, a multi-processing system on-chip including a cluster of processors having respective CPUs is operated by defining a master CPU within the respective CPUs to coordinate operation of said multiprocessing system, running on the CPU a cluster manager agent is adapted to dynamically migrate software processes between the CPUs of said plurality and change power settings therein.
Abstract: A multi-processing system-on-chip including a cluster of processors having respective CPUs is operated by: defining a master CPU within the respective CPUs to coordinate operation of said multi-processing system, running on the CPU a cluster manager agent. The cluster manager agent is adapted to dynamically migrate software processes between the CPUs of said plurality and change power settings therein.
Patent•
C/c++ language extensions for general-purpose graphics processing unit

[...]

Ian Buck1, Bastiaan Aarts1•
Nvidia1
2 Nov 2006
TL;DR: In this article, a general-purpose programming environment allows users to program a GPU as a generalpurpose computation engine using familiar C/C++ programming constructs using declaration specifiers to identify which portions of a program are to be compiled for a CPU or a GPU.
Abstract: A general-purpose programming environment allows users to program a GPU as a general-purpose computation engine using familiar C/C++ programming constructs Users may use declaration specifiers to identify which portions of a program are to be compiled for a CPU or a GPU Specifically, functions, objects and variables may be specified for GPU binary compilation using declaration specifiers A compiler separates the GPU binary code and the CPU binary code in a source file using the declaration specifiers The location of objects and variables in different memory locations in the system may be identified using the declaration specifiers CTA threading information is also provided for the GPU to support parallel processing
Patent•
Dynamic enablement and customization of tracing information in a data processing system

[...]

Janice M. Girouard1, James K. Lewis, Michael Thomas Strosaker, Wendel Glenn Voigt•
IBM1
7 Jun 2006
TL;DR: In this paper, the authors present a staged tracing approach to detect potential problems or issues at a sub-system level, followed by a dynamic tracing state, with a more detailed level of tracing for an identified problematic sub system.
Abstract: A computer implemented method, system, and computer usable program code for staged tracing, where an initial high-level trace is performed to detect potential problems or issues at a sub-system level, followed by a dynamic tracing state, with a more detailed level of tracing for an identified problematic sub-system. During such dynamic tracing, the CPU consumption or processing time is monitored and if such consumption remains below a given threshold, additional trace points may be added. If such CPU consumption exceeds the given threshold, existing trace-points are selectively backed-out or removed. The dynamic adding and removing of trace-points allows for the CPU to perform in a desired window of execution performance such that the overall system performance is not adversely affected when tracing is enabled.
Patent•
Power reduction for processor front-end by caching decoded instructions

[...]

Baruch Solomon1, Ronny Ronen1, Doron Orenstien1•
Intel1
31 Oct 2006
TL;DR: In this paper, a power aware front-end unit for a processor may include a UOP cache that disables instruction synchronization circuitry, instruction decode circuitry and, optionally, instruction fetch circuitry while instruction look-ups are underway in both a block cache and an instruction cache.
Abstract: A power aware front-end unit for a processor may include a UOP cache that disables other circuitry within the front-end unit. In an embodiment, a front-end unit may disable instruction synchronization circuitry, instruction decode circuitry and, optionally, instruction fetch circuitry while instruction look-ups are underway in both a block cache and an instruction cache. If the instruction look-up indicates a miss, the disabled circuitry thereafter may be enabled.
Patent•
Electronic apparatus, communication system, and program

[...]

Yasuhiko Watanabe1•
Hitachi1
21 Nov 2006
TL;DR: In this paper, the main power is switched off, and when determined that the emergency mode is set, stops power supply to display sections, etc. to pretend that the apparatus is completely switched off.
Abstract: When an emergency mode is set, a CPU outputs an alert from an alerting unit and notifies the current position obtained by a GPS processing unit to a predetermined destination by a communication unit. When the main power is switched off, the CPU determines whether or not the emergency mode is set, and when determined that the emergency mode is set, stops power supply to display sections, etc. to pretend that the apparatus is completely switched off, but continues power supply to the GPS processing unit and communication unit to keep notifying the current position.
Patent•
Geological response data imaging with stream processors

[...]

Tor Dokken, Martin Ofstad Henriksen, Jørg E. Aarnes, Knut-Andreas Lie
18 Oct 2006
TL;DR: In this paper, the authors describe a method to convert geological response data to graphical raw data by using at least one stream processor for this purpose, where the pre-processed response data is fed into one or more stream processors, and the stream processor then does the calculation intensive work on the preprocessed reaction data and returns the processing results back to the CPU which does some post-processing on the results coming from the stream processors.
Abstract: The invention describes a method to convert geological response data to graphical raw data by using at least one stream processor for this purpose. The geological response data is pre-processed by a CPU and the preprocessed geological response data is fed into one or more stream processors. The stream processor then does the calculation intensive work on the preprocessed geological response data and returns the processing results back to the CPU which does some post-processing on the results coming from the stream processor. Stream processors comprise single or multiple programmable GPUs, clusters/networks of nodes with one or several GPU's; cell processors (or processors derived from it) or a cluster of cell processor nodes, game computers (in the spirit of Sony's PlayStation, Nintendo's GameCube, etc.) or clusters of game computers.
Patent•
System for improving overall battery life of a gsm communication device

[...]

Hc Sandip1•
Samsung1
27 Dec 2006
TL;DR: In this paper, a system for improving the overall battery life of a GSM device according to an optimization mechanism for suspending neighbor-cell scanning in GSM wireless communication system is presented.
Abstract: Disclosed is a system for improving the overall battery life of a GSM device according to an optimization mechanism for suspending neighbor-cell scanning in a GSM wireless communication system, the system having a wireless device including: (a) a Central Processing Unit (CPU) executing software programs intended to comply with GSM protocol specifications; (b) an RF transmission unit and an RF reception unit functioning either independently or as a single unit; (c) a specialized Digital Signal Processor being able to process received signal at a corresponding receiving antenna and offering estimates of the received signal level and quality; (d) a logic process by which the mobile terminal powers off an RF module thereof for a definite period of time and wakes up at a pre-determined interval to listen to paging messages transmitted thereto; and (e) firmware/software performing neighbor cell monitoring in compliance with a protocol mandated by GSM standards.
Patent•
Method, apparatus and system for enhanced CPU frequency governers

[...]

Steven L. Grobman1•
Intel1
7 Sep 2006
TL;DR: In this paper, a method, apparatus and system enable enhanced processor frequency governors to comprehend virtualized platforms and utilize predictive information to enhance performance in virtualised platforms, where an enhanced frequency governor in a virtual host may run within a virtual machine on the host and interact with a VM manager to collect predictive information from application(s) running within each virtual machine.
Abstract: A method, apparatus and system enable enhanced processor frequency governors to comprehend virtualized platforms and utilize predictive information to enhance performance in virtualized platforms. Specifically, in one embodiment, an enhanced frequency governor in a virtual host may run within a virtual machine on the host and interact with a virtual machine manager to collect predictive information from application(s) running within each virtual machine on the host. The enhanced frequency governor may then utilize the predictive information to determine future CPU frequency requirements and raise or lower the CPU frequency and/or voltage in anticipation of the needs of the various applications.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve