Top 738 papers published in the topic of Central processing unit in 2006

Showing papers on "Central processing unit published in 2006"

GPU-based Video Feature Tracking And Matching

[...]

Sudipta N. Sinha, Jan-Michael Frahm, Marc Pollefeys, Yakup Genc

1 May 2006

TL;DR: Novel implementations of the KLT feature track- ing and SIFT feature extraction algorithms that run on the graphics processing unit (GPU) and is suitable for video analysis in real-time vision systems.

...read moreread less

Abstract: This paper describes novel implementations of the KLT feature track- ing and SIFT feature extraction algorithms that run on the graphics processing unit (GPU) and is suitable for video analysis in real-time vision systems. While significant acceleration over standard CPU implementations is obtained by ex- ploiting parallelism provided by modern programmable graphics hardware, the CPU is freed up to run other computations in parallel. Our GPU-based KLT im- plementation tracks about a thousand features in real-time at 30 Hz on 1024 £ 768 resolution video which is a 20 times improvement over the CPU. It works on both ATI and NVIDIA graphics cards. The GPU-based SIFT implementation works on NVIDIA cards and extracts about 800 features from 640 £ 480 video at 10Hz which is approximately 10 times faster than an optimized CPU implementation.

...read moreread less

382 citations

Proceedings Article•10.1145/1188455.1188567•

Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs

[...]

Min Yeol Lim¹, Vincent W. Freeh, David K. Lowenthal²•Institutions (2)

North Carolina State University¹, University of Georgia²

11 Nov 2006

TL;DR: An MPI runtime system that dynamically reduces CPU performance during communication phases in MPI programs and, without profiling or training, selects the CPU frequency in order to minimize energy-delay product is presented.

...read moreread less

Abstract: Although users of high-performance computing are most interested in raw performance, both energy and power consumption have become critical concerns. Some microprocessors allow frequency and voltage scaling, which enables a system to reduce CPU performance and power when the CPU is not on the critical path. When properly directed, such dynamic frequency and voltage scaling can produce significant energy savings with little performance penalty. This paper presents an MPI runtime system that dynamically reduces CPU performance during communication phases in MPI programs. It dynamically identifies such phases and, without profiling or training, selects the CPU frequency in order to minimize energy-delay product. All analysis and subsequent frequency and voltage scaling is within MPI and so is entirely transparent to the application. This means that the large number of existing MPI programs, as well as new ones being developed, can use our system without modification. Results show that the average reduction in energy-delay product over the NAS benchmark suite is 10% - the average energy reduction is 12% while the average execution time increase is only 2.1%

...read moreread less

202 citations

Patent•

Method and apparatus for parallel data preparation and processing of integrated circuit graphical design data

[...]

Daria R. Dooling¹, Kenneth T. Settlemyer¹, Jacek G. Smolinski¹, Stephen D. Thomas¹, Ralph J. Williams¹ - Show less +1 more•Institutions (1)

IBM¹

27 Sep 2006

TL;DR: In this paper, a method for implementing an ORC process to facilitate physical verification of an integrated circuit (IC) graphical design is presented, which includes partitioning the IC graphical design data into files by a host machine such that the files correspond to regions of interest or partitions with defined margins, dispersing the partitioned data files to available cpus within the network.

...read moreread less

Abstract: A method for implementing an ORC process to facilitate physical verification of an integrated circuit (IC) graphical design. The method includes partitioning the IC graphical design data into files by a host machine such that the files correspond to regions of interest or partitions with defined margins, dispersing the partitioned data files to available cpus within the network, processing of each job by the cpu receiving the file, wherein artifacts arising from bisection of partitioning margins during the partitioning, including cut-induced false errors, are detected and removed, and the shape-altering effects of such artifact errors are minimized and transmitting the results of processing at each cpu to the host machine for aggregate processing.

...read moreread less

155 citations

Proceedings Article•10.1109/RTSS.2006.48•

System-Level Energy Management for Periodic Real-Time Tasks

[...]

Hakan Aydin¹, Vinay Devadas¹, Dakai Zhu²•Institutions (2)

George Mason University¹, University of Texas at San Antonio²

5 Dec 2006

TL;DR: This paper forms the system-wide energy management problem as a non-linear optimization problem and provides a polynomial-time solution that provides significant gains over the previous solutions that focused on dynamic CPU power at the expense of ignoring other power components.

...read moreread less

Abstract: In this paper, we consider the system-wide energy management problem for a set of periodic real-time tasks running on a DVS-enabled processor. Our solution uses a generalized power model, in which frequency-dependent and frequency-independent power components are explicitly considered. Further, variations in power dissipations and on-chip/off-chip access patterns of different tasks are encoded in the problem formulation. Using this generalized power model, we show that it is possible to obtain analytically the task-level energy-efficient speed below which DVS starts to affect overall energy consumption negatively. Then, we formulate the system-wide energy management problem as a non-linear optimization problem and provide a polynomial-time solution. We also provide a dynamic slack reclaiming extension which considers the effects of slow-down on the system-wide energy consumption. Our experimental evaluation shows that the optimal solution provides significant (up to 50%) gains over the previous solutions that focused on dynamic CPU power at the expense of ignoring other power components

...read moreread less

154 citations

Proceedings Article•10.1109/RT.2006.280210•

Ray Tracing on the Cell Processor

[...]

Carsten Benthin, Ingo Wald¹, Michael Scherbaum, Heiko Friedrich²•Institutions (2)

University of Utah¹, Saarland University²

1 Sep 2006

TL;DR: Using a combination of low-level optimized kernel routines, a streaming software architecture, explicit caching, and a virtual software-hyperthreading approach to hide DMA latencies, for a single cell a pure ray tracing performance of nearly one order of magnitude over that achieved by a commodity CPU is achieved.

...read moreread less

Abstract: Over the last three decades, higher CPU performance has been achieved almost exclusively by raising the CPU's clock rate Today, the resulting power consumption and heat dissipation threaten to end this trend, and CPU designers are looking for alternative ways of providing more compute power In particular, they are looking towards three concepts: a streaming compute model, vector-like SIMD units, and multi-core architectures One particular example of such an architecture is the Cell Broadband Engine Architecture (CBEA), a multi-core processor that offers a raw compute power of up to 200 GFlops per 32 GHz chip The Cell bears a huge potential for compute-intensive applications like ray tracing, but also requires addressing the challenges caused by this processor's unconventional architecture In this paper, we describe an implementation of realtime ray tracing on a Cell Using a combination of low-level optimized kernel routines, a streaming software architecture, explicit caching, and a virtual software-hyperthreading approach to hide DMA latencies, we achieve for a single Cell a pure ray tracing performance of nearly one order of magnitude over that achieved by a commodity CPU

...read moreread less

149 citations

Book Chapter•10.1016/B978-075068112-4/50019-X•

1 – Programmable logic controllers

[...]

W. Bolton

1 Jan 2006

TL;DR: This chapter presents an introduction to the programmable logic controller, its general function, hardware forms, and internal architecture.

...read moreread less

Abstract: Publisher Summary This chapter presents an introduction to the programmable logic controller, its general function, hardware forms, and internal architecture. A programmable logic controller (PLC) is a special form of microprocessor-based controller that uses a programmable memory to store instructions and to implement functions such as logic, sequencing, timing, counting, and arithmetic to control machines and processes. It is designed to be operated by engineers with perhaps a limited knowledge of computers and computing languages. Input devices and output devices in a system being controlled are connected to the PLC. The operator enters a sequence of instructions, that is, a program, into the memory of the PLC. The controller then monitors the inputs and outputs according to this program and carries out the control rules for which it has been programmed. PLCs are now widely used and extend from small self-contained units for use with perhaps 20 digital inputs/outputs to modular systems, which can be used for large numbers of inputs/outputs, handle digital or analogue inputs/outputs, and also carry out proportional-integral-derivative control modes. A PLC system has the basic functional components of a processor unit, memory, power supply unit, input/output interface section, communications interface, and programming device. It consists of a central processing unit (CPU) containing the system microprocessor, memory, and input/output circuitry. The CPU controls and processes all the operations within the PLC.

...read moreread less

145 citations

Proceedings Article•10.1145/1182807.1182809•

t-kernel: providing reliable OS support to wireless sensor networks

[...]

Lin Gu¹, John A. Stankovic¹•Institutions (1)

University of Virginia¹

31 Oct 2006

TL;DR: The t-kernel significantly enhances developers' ability to design reliable and sophisticated sensor networks, and includes several new design techniques, such as efficient binary translation on highly constrained sensor nodes, differentiated virtual memory without repeatedly writable swapping devices, and the protection of the OS from application errors without privileged execution hardware.

...read moreread less

Abstract: The development of a reliable large-scale wireless sensor network (WSN) is very difficult because of resource constraints, energy budget, and demanding application requirements. Three OS features-OS protection, virtual memory, and preemptive scheduling-can significantly improve the reliability of WSN systems and facilitate developing complex WSN software. However, due to the lack of hardware support for privileged execution and address translation, it is impossible to implement these features with traditional OS design techniques. To solve this problem, we design a new OS kernel, the t-kernel, to perform extensive code modification at load time. The modified code and the OS work in a collaborative way supporting the aforementioned features. Having implemented the t-kernel on MICA2 motes, we evaluate its performance by measuring the overhead and execution speed. We analyze the CPU utilization of sensor network applications, and verify that, though CPU-bound tasks execute 1.5-3 times as long as in native mode, application performance under typical workloads does not noticeably degrade. The t-kernel significantly enhances developers' ability to design reliable and sophisticated sensor networks, and includes several new design techniques, such as efficient binary translation on highly constrained sensor nodes, differentiated virtual memory without repeatedly writable swapping devices, and the protection of the OS from application errors without privileged execution hardware.

...read moreread less

140 citations

Patent•

Hardware implementation of network testing and performance monitoring in a network device

[...]

Nir Arad, Tsahi Daniel, Maxim Mondaeev

22 Mar 2006

TL;DR: In this article, the authors offload the generation and monitoring of test packets from a Central Processing Unit (CPU) to a dedicated network integrated circuit, such as a router, bridge or switch chip associated with the CPU.

...read moreread less

Abstract: An embodiment of the present invention offloads the generation and monitoring of test packets from a Central processing Unit (CPU) to a dedicated network integrated circuit, such as a router, bridge or switch chip associated with the CPU. The CPU may download test routines and test data to the network IC, which then generates the test packets, identifies and handles received test packets, collects test statistics, and performs other test functions all without loading the CPU. The CPU may be notified when certain events occur, such as when throughput or jitter thresholds for the network are exceeded.

...read moreread less

120 citations

Journal Article•10.1109/MM.2006.45•

Xbox 360 System Architecture

[...]

Jeffrey A. Andrews¹, Nicholas R. Baker¹•Institutions (1)

Microsoft¹

01 Mar 2006-IEEE Micro

TL;DR: The Xbox 360 contains an aggressive hardware architecture and implementation targeted at game console workloads that implements the product designers' goal of providing game developers a hardware platform to implement their next-generation game ambitions.

...read moreread less

Abstract: This article covers the Xbox 360's high-level technical requirements, a short system overview, and details of the CPU and the GPU. The Xbox 360 contains an aggressive hardware architecture and implementation targeted at game console workloads. The core silicon implements the product designers' goal of providing game developers a hardware platform to implement their next-generation game ambitions. The core chips include the standard conceptual blocks of CPU, graphics processing unit (GPU), memory, and I/O. Each of these components and their interconnections are customized to provide a user-friendly game console product. The authors describe their architectural trade-offs and summarize the system's software programming support

...read moreread less

96 citations

Patent•

Computer system and i/o bridge

[...]

Toshiaki Tarui¹, Yoshiko Yasuda¹•Institutions (1)

Hitachi¹

21 Jul 2006

TL;DR: In this article, the authors propose a virtual switch-based I/O switch for virtual machines to reduce overhead in achieving the goal of sharing of I/Os between virtual machines.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To reduce overhead in achievement of sharing of I/O between virtual machines using a versatile I/O switch. SOLUTION: The system comprises a CPU module #0 including a plurality of CPU cores, an AS bridge 15 connected to the CPU cores, and a main storage accessible from the CPU cores or AS bridge 15, and AS switches SW0 and SW1 connecting the AS bridge 15 of the CPU module #0 to an I/O blade #5. The CPU module #0 has a hypervisor dividing the plurality of CPU cores and the main storage to a plurality of logical partitions, and the AS bridge 15 includes a virtual switch SWv1 a, when AS packets to be transmitted and received between the logical partitions and the I/O blade #5 are relayed, virtual route information set for each logical partition and route information from the AS bridge 15 to the I/O blade #5 to route information of the AS packets and switching the AS packets with the I/O blade #5 for each logical partition. COPYRIGHT: (C)2007,JPO&INPIT

...read moreread less

84 citations

Patent•

System and method for authenticating an operating system to a central processing unit, providing the CPU/OS with secure storage, and authenticating the CPU/OS to a third party

[...]

Paul England¹, John D. DeTreville¹, Butler W. Lampson¹•Institutions (1)

Microsoft¹

22 Dec 2006

TL;DR: In this article, a computer system has a central processing unit (CPU) and an operating system (OS), the CPU having a pair of private and public keys and a software identity register that holds an identity of the operating system.

...read moreread less

Abstract: In accordance with certain aspects, a computer system has a central processing unit (CPU) and an operating system (OS), the CPU having a pair of private and public keys and a software identity register that holds an identity of the operating system. An OS certificate is created including the identity from the software identity register, information describing the operating system, and the CPU public key. The created OS certificate is signed using the CPU private key.

...read moreread less

Temperature measurement in the Intel® CoreTM Duo Processor

[...]

Efraim Rotem, Jim San Jose Hermerding, Aviad Cohen, H. Cain

1 Jan 2006

TL;DR: The new Intel CoreTM Duo processor temperature sensing capability is introduced and performance benefits measurements and results are presented.

...read moreread less

Abstract: Modern CPUs with increasing core frequency and power are rapidly reaching a point where the CPU frequency and performance are limited by the amount of heat that can be extracted by the cooling technology. In mobile environment, this issue is becoming more apparent, as form factors become thinner and lighter. Often, mobile platforms trade CPU performance in order to reduce power and manage thermals. This enables the delivery of high performance computing together with improved ergonomics by lowering skin temperature and reducing fan acoustic noise. Most of available high performance CPUs provide thermal sensor on the die to allow thermal management, typically in the form of analog thermal diode. Operating system algorithms and platform embedded controllers read the temperature and control the processor power. Improved thermal sensors directly translate into better system performance, reliability and ergonomics. In this paper we will introduce the new Intel Core Duo processor temperature sensing capability and present performance benefits measurements and results.

...read moreread less

Patent•

Semiconductor Integrated circuit

[...]

Yutaka Shinagawa¹, Takeshi Kataoka², Eiichi Ishikawa¹, Toshihiro Tanaka¹, Kazumasa Yanagisawa¹, Kazufumi Suzukawa¹ - Show less +2 more•Institutions (2)

Renesas Electronics¹, NEC²

17 Oct 2006

TL;DR: In this article, a semiconductor integrated circuit (SIC) has a central processing unit and a rewritable nonvolatile memory area disposed in an address space of the SIC.

...read moreread less

Abstract: A semiconductor integrated circuit has a central processing unit and a rewritable nonvolatile memory area disposed in an address space of the central processing unit. The nonvolatile memory area has a first nonvolatile memory area and a second nonvolatile memory area, which memorize information depending on the difference of threshold voltages. The first nonvolatile memory area has the maximum variation width of a threshold voltage for memorizing information set larger than that of the second nonvolatile memory area. When the maximum variation width of the threshold voltage for memorizing information is larger, since stress to a memory cell owing to a rewrite operation of memory information becomes larger, it is inferior in a point of guaranteeing the number of times of rewrite operation; however, since a read current becomes larger, a read speed of memory information can be expedited. The first nonvolatile memory area can be prioritized to expedite a read speed of the memory information and the second nonvolatile memory area can be prioritized in guaranteeing the number of times of rewrite operation of memory information more.

...read moreread less

Proceedings Article•10.1109/PDCAT.2006.77•

Load Balancing in a Cluster Computer

[...]

P. Werstein¹, Hailing Situ¹, Zhiyi Huang¹•Institutions (1)

University of Otago¹

4 Dec 2006

TL;DR: A load balancing algorithm for distributed use of a cluster computer that uses load information including CPU queue length, CPU utilisation, memory utilisation and network traffic to decide the load of each node is proposed.

...read moreread less

Abstract: This paper proposes a load balancing algorithm for distributed use of a cluster computer It uses load information including CPU queue length, CPU utilisation, memory utilisation and network traffic to decide the load of each node This algorithm is compared to an algorithm using only the CPU queue length The performance evaluation results show that the proposed algorithm performs well

...read moreread less

Patent•

DMA engine for protocol processing

[...]

Thomas Alexander¹, Marc Quattromani¹, Alexander David Rekow¹•Institutions (1)

PMC-Sierra¹

10 Mar 2006

TL;DR: In this article, a DMA controller allocates a block within the associative buffer and loads the data into the allocated block, which is done under the control of the controller.

...read moreread less

Abstract: A DMA engine, includes, in part, a DMA controller, an associative memory buffer, a request FIFO accepting data transfer requests from a programmable engine, such as a CPU, and a response FIFO that returns the completion status of the transfer requests to the CPU. Each request includes, in part, a target external memory address from which data is to be loaded or to which data is to be stored; a block size, specifying the amount of data to be transferred; and context information. The associative buffer holds data fetched from the external memory; and provides the data to the CPUs for processing. Loading into and storing from the associative buffer is done under the control of the DMA controller. When a request to fetch data from the external memory is processed, the DMA controller allocates a block within the associative buffer and loads the data into the allocated block.

...read moreread less

Patent•

Processor and information processing method

[...]

Akihiko Tamura¹, Katsuya Tanaka¹•Institutions (1)

Epson¹

13 Jan 2006

TL;DR: In this paper, an external interrupt control section is used to prevent a unit processor that is not executing a task or the lowest priority task to execute an interrupt processing that was input.

...read moreread less

Abstract: A multiprocessor is provided that efficiently processes high priority processing. A mobile telephone 1 comprises a CPU 10 having therein an external interrupt control section 11 that causes a unit processor that is not executing a task or the unit processor executing the lowest priority task to execute an interrupt processing that was input. Thus, interrupt processing that occurred can be executed within the CPU 10 without, as far as possible, reducing the capacity to process tasks. Accordingly, interrupt processing can be efficiently processed within the CPU 10 as a multiprocessor.

...read moreread less

Journal Article•10.1109/TEMC.2006.882844•

Susceptibility of Personal Computer Systems to Fast Transient Electromagnetic Pulses

[...]

M. Camp, Heyno Garbe¹•Institutions (1)

Information Technology University¹

20 Nov 2006-IEEE Transactions on Electromagnetic Compatibility

TL;DR: The major result is that susceptibility increases significantly with each computer generation.

...read moreread less

Abstract: In this paper, the susceptibility of personal computer systems (mainboard class vary from 8088 processor based system up to Pentium III system) to fast transient electromagnetic pulses (EMP) with double exponential pulse shapes [EMP, ultra wideband (UWB)] is determined. The influence of computer generation, random access memory (RAM)-values, program states, and pulse shapes, as well as the destruction thresholds of single personal computer (PC)-components [central processing unit (CPU), RAM, basic input/output system (BIOS), mainboard] have been investigated. The major result is that susceptibility increases significantly with each computer generation

...read moreread less

Proceedings Article•10.1109/FCCM.2006.40•

Enabling a Uniform Programming Model Across the Software/Hardware Boundary

[...]

Erik K. Anderson¹, Jason Agron¹, W. Peck¹, Jim Stevens¹, Fabrice Baijot¹, Ed Komp¹, Ron Sass¹, David L. Andrews¹ - Show less +4 more•Institutions (1)

University of Kansas¹

24 Apr 2006

TL;DR: The hardware thread interface (HWTI) component provides an abstract, platform independent compilation target that enables thread and instruction-level parallelism across the software/hardware boundary.

...read moreread less

Abstract: In this paper, we present hthreads, a unifying programming model for specifying application threads running within a hybrid CPU/FPGA system. Threads are specified from a single pthreads multithreaded application program and compiled to run on the CPU or synthesized to run on the FPGA. The hthreads system, in general, is unique within the reconfigurable computing community as it abstracts the CPU/FPGA components into a unified custom threaded multiprocessor architecture platform. To support the abstraction of the CPU/FPGA component boundary, we have created the hardware thread interface (HWTI) component that frees the designer from having to specify and embed platform specific instructions to form customized hardware/ software interactions. Instead, the hardware thread interface supports the generalized pthreads API semantics, and allows passing of abstract data types between hardware and software threads. Thus the hardware thread interface provides an abstract, platform independent compilation target that enables thread and instruction-level parallelism across the software/hardware boundary.

...read moreread less

Patent•

Wireless mesh networking in wagering game environments

[...]

Mark B. Gagner, Daniel Norman St. John, Dale R. Buchholz

14 Jul 2006

TL;DR: In this article, the authors describe a wireless mesh network in a gaming environment, which includes a network interface unit to wirelessly receive gaming data from ones of a plurality of components of a WSN.

...read moreread less

Abstract: Systems and methods for wireless mesh networking in a gaming environment are described herein. In one embodiment, the system includes a network interface unit to wirelessly receive gaming data from ones of a plurality of components of a wireless mesh network, the network interface unit to wirelessly transmit the gaming data to others of the plurality of components of the wireless mesh network. The system also includes a memory unit to store certain of the gaming data and to store instructions for conducting wagering games and a central processing unit to perform operations based in part on the certain of the gaming data and to perform operations based on the instructions.

...read moreread less

Proceedings Article•10.1109/HPCA.2006.1598112•

A decoupled KILO-instruction processor

[...]

Miquel Pericas, A. Cristal, Ruben Gonzalez, Daniel A. Jimenez, Mateo Valero - Show less +1 more

27 Feb 2006

TL;DR: It is demonstrated that a decoupled microarchitecture, using small structures and many in-order components, can achieve the same performance as much more aggressive proposals while minimizing design complexity.

...read moreread less

Abstract: Building processors with large instruction windows has been proposed as a mechanism for overcoming the memory wall, but finding a feasible and implementable design has been an elusive goal. Traditional processors are composed of structures that do not scale to large instruction windows because of timing and power constraints. However, the behavior of programs executed with large instruction windows gives rise to a natural and simple alternative to scaling. We characterize this phenomenon of execution locality and propose a microarchitecture to exploit it to achieve the benefit of a large instruction window processor with low implementation cost. Execution locality is the tendency of instructions to exhibit high or low latency based on their dependence on memory operations. In this paper we propose a decoupled microarchitecture that executes low latency instructions on a cache processor and high latency instructions on a memory processor. We demonstrate that such a design, using small structures and many in-order components, can achieve the same performance as much more aggressive proposals while minimizing design complexity.

...read moreread less

Patent•

Method and apparatus for determining

[...]

David Holmes

16 Mar 2006

TL;DR: In this paper, a method for image recognition of a material object that utilizes graphical modeling of the corner points of a vertex which includes projecting a point on a digital display to an inward depth, a one half pixel distance in the plane of the display, with a conic to the digital display, and a square block containing one half size child blocks that are scaled to depth, projecting the corners of a node and replacing the bisecting points of edge features detected in a digital displays scaled at an increasing rate of congruency to the dimensions of an object.

...read moreread less

Abstract: A method for image recognition of a material object that utilizes graphical modeling of the corner points of a vertex which includes projecting a point on a digital display to an inward depth, a one half pixel distance in the plane of the display, with a conic to a digital display, and a square block containing one half size child blocks that are scaled to depth, projecting the corner points of a vertex and replacing the bisecting points of edge features detected in a digital display scaled at an increasing rate of congruency to the dimensions of an object. The method may further include producing a digital image of the material object, providing a central processing unit, providing memory associated with a central processing unit; providing a display associated with a central processing unit; loading the digital image into the memory; defining the edges of features within the digital image; and a finding fight crucial points from registrations projected on to an edge feature display.

...read moreread less

Patent•

GPU pipeline multiple level synchronization controller processor and method

[...]

Timour Paltashev¹, Hsilin Huang¹, Boris Prokopenko¹, Qunfeng (Fred) Liao¹•Institutions (1)

VIA Technologies¹

25 Oct 2006

TL;DR: In this paper, the authors propose a method for high level synchronization between an application and a graphics pipeline, which comprises receiving an application instruction in an input stream at a predetermined component, such as a command stream processor (CSP), as sent by a central processing unit.

...read moreread less

Abstract: A method for high level synchronization between an application and a graphics pipeline comprises receiving an application instruction in an input stream at a predetermined component, such as a command stream processor (CSP), as sent by a central processing unit. The CSP may have a first portion coupled to a next component in the graphics pipeline and a second portion coupled to a plurality of components of the graphics pipeline. A command associated with the application instruction may be forwarded from the first portion to the next component in the graphics pipeline or some other component coupled thereto. The command may be received and thereafter executed. A response may be communicated on a feedback path to the second portion of the CSP. Nonlimiting exemplary application instructions that may be received and executed by the CSP include check surface fault, trap, wait, signal, stall, flip, and trigger.

...read moreread less

Patent•

Parallel processing method and system, for instance for supporting embedded cluster platforms, computer program product therefor

[...]

Diego Melpignano¹, David Siorpaes¹, Paolo Zambotti¹, Antonio Maria Borneo¹•Institutions (1)

STMicroelectronics¹

18 Apr 2006

TL;DR: In this article, a multi-processing system on-chip including a cluster of processors having respective CPUs is operated by defining a master CPU within the respective CPUs to coordinate operation of said multiprocessing system, running on the CPU a cluster manager agent is adapted to dynamically migrate software processes between the CPUs of said plurality and change power settings therein.

...read moreread less

Abstract: A multi-processing system-on-chip including a cluster of processors having respective CPUs is operated by: defining a master CPU within the respective CPUs to coordinate operation of said multi-processing system, running on the CPU a cluster manager agent. The cluster manager agent is adapted to dynamically migrate software processes between the CPUs of said plurality and change power settings therein.

...read moreread less

Patent•

C/c++ language extensions for general-purpose graphics processing unit

[...]

Ian Buck¹, Bastiaan Aarts¹•Institutions (1)

Nvidia¹

2 Nov 2006

TL;DR: In this article, a general-purpose programming environment allows users to program a GPU as a generalpurpose computation engine using familiar C/C++ programming constructs using declaration specifiers to identify which portions of a program are to be compiled for a CPU or a GPU.

...read moreread less

Abstract: A general-purpose programming environment allows users to program a GPU as a general-purpose computation engine using familiar C/C++ programming constructs Users may use declaration specifiers to identify which portions of a program are to be compiled for a CPU or a GPU Specifically, functions, objects and variables may be specified for GPU binary compilation using declaration specifiers A compiler separates the GPU binary code and the CPU binary code in a source file using the declaration specifiers The location of objects and variables in different memory locations in the system may be identified using the declaration specifiers CTA threading information is also provided for the GPU to support parallel processing

...read moreread less

Patent•

Dynamic enablement and customization of tracing information in a data processing system

[...]

Janice M. Girouard¹, James K. Lewis, Michael Thomas Strosaker, Wendel Glenn Voigt•Institutions (1)

IBM¹

7 Jun 2006

TL;DR: In this paper, the authors present a staged tracing approach to detect potential problems or issues at a sub-system level, followed by a dynamic tracing state, with a more detailed level of tracing for an identified problematic sub system.

...read moreread less

Abstract: A computer implemented method, system, and computer usable program code for staged tracing, where an initial high-level trace is performed to detect potential problems or issues at a sub-system level, followed by a dynamic tracing state, with a more detailed level of tracing for an identified problematic sub-system. During such dynamic tracing, the CPU consumption or processing time is monitored and if such consumption remains below a given threshold, additional trace points may be added. If such CPU consumption exceeds the given threshold, existing trace-points are selectively backed-out or removed. The dynamic adding and removing of trace-points allows for the CPU to perform in a desired window of execution performance such that the overall system performance is not adversely affected when tracing is enabled.

...read moreread less

Patent•

Power reduction for processor front-end by caching decoded instructions

[...]

Baruch Solomon¹, Ronny Ronen¹, Doron Orenstien¹•Institutions (1)

Intel¹

31 Oct 2006

TL;DR: In this paper, a power aware front-end unit for a processor may include a UOP cache that disables instruction synchronization circuitry, instruction decode circuitry and, optionally, instruction fetch circuitry while instruction look-ups are underway in both a block cache and an instruction cache.

...read moreread less

Abstract: A power aware front-end unit for a processor may include a UOP cache that disables other circuitry within the front-end unit. In an embodiment, a front-end unit may disable instruction synchronization circuitry, instruction decode circuitry and, optionally, instruction fetch circuitry while instruction look-ups are underway in both a block cache and an instruction cache. If the instruction look-up indicates a miss, the disabled circuitry thereafter may be enabled.

...read moreread less

Patent•

Electronic apparatus, communication system, and program

[...]

Yasuhiko Watanabe¹•Institutions (1)

Hitachi¹

21 Nov 2006

TL;DR: In this paper, the main power is switched off, and when determined that the emergency mode is set, stops power supply to display sections, etc. to pretend that the apparatus is completely switched off.

...read moreread less

Abstract: When an emergency mode is set, a CPU outputs an alert from an alerting unit and notifies the current position obtained by a GPS processing unit to a predetermined destination by a communication unit. When the main power is switched off, the CPU determines whether or not the emergency mode is set, and when determined that the emergency mode is set, stops power supply to display sections, etc. to pretend that the apparatus is completely switched off, but continues power supply to the GPS processing unit and communication unit to keep notifying the current position.

...read moreread less

Patent•

Geological response data imaging with stream processors

[...]

Tor Dokken, Martin Ofstad Henriksen, Jørg E. Aarnes, Knut-Andreas Lie

18 Oct 2006

TL;DR: In this paper, the authors describe a method to convert geological response data to graphical raw data by using at least one stream processor for this purpose, where the pre-processed response data is fed into one or more stream processors, and the stream processor then does the calculation intensive work on the preprocessed reaction data and returns the processing results back to the CPU which does some post-processing on the results coming from the stream processors.

...read moreread less

Abstract: The invention describes a method to convert geological response data to graphical raw data by using at least one stream processor for this purpose. The geological response data is pre-processed by a CPU and the preprocessed geological response data is fed into one or more stream processors. The stream processor then does the calculation intensive work on the preprocessed geological response data and returns the processing results back to the CPU which does some post-processing on the results coming from the stream processor. Stream processors comprise single or multiple programmable GPUs, clusters/networks of nodes with one or several GPU's; cell processors (or processors derived from it) or a cluster of cell processor nodes, game computers (in the spirit of Sony's PlayStation, Nintendo's GameCube, etc.) or clusters of game computers.

...read moreread less

Patent•

System for improving overall battery life of a gsm communication device

[...]

Hc Sandip¹•Institutions (1)

Samsung¹

27 Dec 2006

TL;DR: In this paper, a system for improving the overall battery life of a GSM device according to an optimization mechanism for suspending neighbor-cell scanning in GSM wireless communication system is presented.

...read moreread less

Abstract: Disclosed is a system for improving the overall battery life of a GSM device according to an optimization mechanism for suspending neighbor-cell scanning in a GSM wireless communication system, the system having a wireless device including: (a) a Central Processing Unit (CPU) executing software programs intended to comply with GSM protocol specifications; (b) an RF transmission unit and an RF reception unit functioning either independently or as a single unit; (c) a specialized Digital Signal Processor being able to process received signal at a corresponding receiving antenna and offering estimates of the received signal level and quality; (d) a logic process by which the mobile terminal powers off an RF module thereof for a definite period of time and wakes up at a pre-determined interval to listen to paging messages transmitted thereto; and (e) firmware/software performing neighbor cell monitoring in compliance with a protocol mandated by GSM standards.

...read moreread less

Patent•

Method, apparatus and system for enhanced CPU frequency governers

[...]

Steven L. Grobman¹•Institutions (1)

Intel¹

7 Sep 2006

TL;DR: In this paper, a method, apparatus and system enable enhanced processor frequency governors to comprehend virtualized platforms and utilize predictive information to enhance performance in virtualised platforms, where an enhanced frequency governor in a virtual host may run within a virtual machine on the host and interact with a VM manager to collect predictive information from application(s) running within each virtual machine.

...read moreread less

Abstract: A method, apparatus and system enable enhanced processor frequency governors to comprehend virtualized platforms and utilize predictive information to enhance performance in virtualized platforms. Specifically, in one embodiment, an enhanced frequency governor in a virtual host may run within a virtual machine on the host and interact with a virtual machine manager to collect predictive information from application(s) running within each virtual machine on the host. The enhanced frequency governor may then utilize the predictive information to determine future CPU frequency requirements and raise or lower the CPU frequency and/or voltage in anticipation of the needs of the various applications.

...read moreread less

...

Expand