TL;DR: This work presents its work on hardware accelerated genomics pipelines, using either FPGAs or GPUs to accelerate execution of BWA-MEM, a widely-used algorithm for genomic short read mapping, and introduces methods to ameliorate the impact of longer read length.
TL;DR: The sensing mechanism, design and operation of these sensors are reviewed, with focuses on the approaches towards performance improvement and CMOS compatibility.
Abstract: The recent development of the Internet of Things (IoT) in healthcare and indoor air quality monitoring expands the market for miniaturized gas sensors. Metal oxide gas sensors based on microhotplates fabricated with micro-electro-mechanical system (MEMS) technology dominate the market due to their balance in performance and cost. Integrating sensors with signal conditioning circuits on a single chip can significantly reduce the noise and package size. However, the fabrication process of MEMS sensors must be compatible with the complementary metal oxide semiconductor (CMOS) circuits, which imposes restrictions on the materials and design. In this paper, the sensing mechanism, design and operation of these sensors are reviewed, with focuses on the approaches towards performance improvement and CMOS compatibility.
TL;DR: PHub is proposed, a high performance multi-tenant, rack-scale PS design that co-designs the PS software and hardware to accelerate rack-level and hierarchical cross-rack parameter exchange, with an API compatible with many DDNN training frameworks.
Abstract: Distributed deep neural network (DDNN) training constitutes an increasingly important workload that frequently runs in the cloud. Larger DNN models and faster compute engines are shifting DDNN training bottlenecks from computation to communication. This paper characterizes DDNN training to precisely pinpoint these bottlenecks. We found that timely training requires high performance parameter servers (PSs) with optimized network stacks and gradient processing pipelines, as well as server and network hardware with balanced computation and communication resources. We therefore propose PHub, a high performance multi-tenant, rack-scale PS design. PHub co-designs the PS software and hardware to accelerate rack-level and hierarchical cross-rack parameter exchange, with an API compatible with many DDNN training frameworks. PHub provides a performance improvement of up to 2.7x compared to state-of-the-art cloud-based distributed training techniques for image classification workloads, with 25% better throughput per dollar.
TL;DR: In this paper, a holistic performance analysis for waveguide-based electro-absorption modulators is performed and the performance metric switching energy per unit bandwidth (speed) is determined by the ratio of the differential absorption cross-section of the broadening and the waveguide effective mode area.
Abstract: Electro-optic modulators perform a key function for data processing and communication. Rapid growth in data volume and increasing bits per second rates demand increased transmitter and thus modulator performance. Recent years have seen the introduction of new materials and modulator designs to include polaritonic optical modes aimed at achieving advanced performance in terms of speed, energy efficiency, and footprint. Such ad hoc modulator designs, however, leave a universal design for these novel material classes of devices missing. Here we execute a holistic performance analysis for waveguide-based electro-absorption modulators and use the performance metric switching energy per unit bandwidth (speed). We show that the performance is fundamentally determined by the ratio of the differential absorption cross-section of the switching material's broadening and the waveguide effective mode area. We find that the former shows highest performance for a broad class of materials relying on Pauli-blocking (absorption saturation), such as semiconductor quantum wells, quantum dots, graphene, and other 2D materials, but is quite similar amongst these classes. In this respect these materials are clearly superior to those relying on free carrier absorption, such as Si and ITO. The performance improvement on the material side is fundamentally limited by the oscillator sum rule and thermal broadening of the Fermi-Dirac distribution. We also find that performance scales with modal waveguide confinement. Thus, we find highest energy-bandwidth-ratio modulator designs to be graphene, QD, QW, or 2D material-based plasmonic slot waveguides where the electric field is in-plane with the switching material dimension. We show that this improvement always comes at the expense of increased insertion loss. Incorporating fundamental device physics, design trade-offs, and resulting performance, this analysis aims to guide future experimental modulator explorations.
TL;DR: The results show that the performance of KT does vary by architectures and transfer techniques, and a good performance improvement is obtained by transferring knowledge from both the intermediate layers and last layer of the teacher to a shallower student.
Abstract: With the emergence of edge computing paradigm, many applications such as image recognition and augmented reality require to perform machine learning (ML) and artificial intelligence (AI) tasks on edge devices. Most AI and ML models are large and computational-heavy, whereas edge devices are usually equipped with limited computational and storage resources. Such models can be compressed and reduced for deployment on edge devices, but they may lose their capability and not perform well. Recent works used knowledge transfer techniques to transfer information from a large network (termed teacher) to a small one (termed student) in order to improve the performance of the latter. This approach seems to be promising for learning on edge devices, but a thorough investigation on its effectiveness is lacking. This paper provides an extensive study on the performance (in both accuracy and convergence speed) of knowledge transfer, considering different student architectures and different techniques for transferring knowledge from teacher to student. The results show that the performance of KT does vary by architectures and transfer techniques. A good performance improvement is obtained by transferring knowledge from both the intermediate layers and last layer of the teacher to a shallower student. But other architectures and transfer techniques do not fare so well and some of them even lead to negative performance impact.
TL;DR: A new vectorial approach to minimize pulsating torque and improve dynamic performance in a five-phase PM motor with short-circuit fault is proposed, which allows minimal reconfiguration of the control structure from healthy operation to fault-tolerance one and exhibits the improved dynamic performance.
Abstract: Multiphase permanent-magnet (PM) brushless motors are popularly adopted for their high efficiency and high power density. However, short-circuit phase fault results in serious problems, such as increased torque fluctuations and deteriorated dynamic performance. This paper proposes a new vectorial approach to minimize pulsating torque and improve dynamic performance in a five-phase PM motor with short-circuit fault. The novelty of the proposed strategy is voltage feedforward compensation based on the relation of the short-circuit current and its fault-phase back electromotive force. First, the compensatory voltages are used to eliminate the impact of the short-circuit current. Then, its combination with the orthogonal reduced-order transformation matrices derived from fault-tolerant current references can improve the dynamic performance of the faulty PM motor. The effect of the short-circuit phase fault on the PM motor model under rotating synchronous frame is also discussed. This control strategy allows minimal reconfiguration of the control structure from healthy operation to fault-tolerant one and exhibits the improved dynamic performance. The simulated and experimental results are presented as validation for the proposed strategy.
TL;DR: An architecture-aware graph clustering algorithm is developed that exploits the FPGA-HMC platform»s capability to improve data locality and memory access efficiency and is further improved by designing a memory request merging unit to take advantage of the increased data locality resulting fromgraph clustering.
Abstract: Graph analytics, which explores the relationships among interconnected entities, is becoming increasingly important due to its broad applicability, from machine learning to social sciences. However, due to the irregular data access patterns in graph computations, one major challenge for graph processing systems is performance. The algorithms, softwares, and hardwares that have been tailored for mainstream parallel applications are generally not effective for massive, sparse graphs from the real-world problems, due to their complex and irregular structures. To address the performance issues in large-scale graph analytics, we leverage the exceptional random access performance of the emerging Hybrid Memory Cube (HMC) combined with the flexibility and efficiency of modern FPGAs. In particular, we develop a collaborative software/hardware technique to perform a level-synchronized Breadth First Search (BFS) on a FPGA-HMC platform. From the software perspective, we develop an architecture-aware graph clustering algorithm that exploits the FPGA-HMC platform»s capability to improve data locality and memory access efficiency. From the hardware perspective, we further improve the FPGA-HMC graph processor architecture by designing a memory request merging unit to take advantage of the increased data locality resulting from graph clustering. We evaluate the performance of our BFS implementation using the AC-510 development kit from Micron and achieve $2.8 \times$ average performance improvement compared to the latest FPGA-HMC based graph processing system over a set of benchmarks from a wide range of applications.
TL;DR: In this paper, the authors investigate the effect of some important Lean related management actions on the relationship between Lean and the level of process improvement: envisioning and communicating the meaning of Lean, setting goals and active steering on improvement performance metrics and encouraging continuous improvement.
Abstract: It is commonly agreed that the success of Lean management is not only determined by its technical practices, but also by the so-called soft practices such as behavior and actions of employees and management. Lean Management behavior is in itself paradoxical in nature as it incorporates technical aspects (e.g., fact-based management, analysis and adhering to the standard operating procedures for sake of efficiency) and social, follower-related aspects (e.g., promotion of employee responsibility to continuously improve their work processes). In this paper, we investigate the (moderating) effect of some important Lean related management actions on the relationship between Lean and the level of process improvement: i) envisioning and communicating the meaning of Lean, ii) setting goals and active steering on improvement performance metrics and ii) encouraging continuous improvement. Survey data of 178 responses from Dutch organizations, shows that these management actions have a positive effect on both Lean and the level of process improvement. In addition, active steering on performance improvement has a reinforcing effect on the relationship between Lean and process improvement. For respondents with a low level of steering on performance improvement Lean does not lead to process improvement, while it does for respondents with average and high levels of steering on performance improvement. The more management operates on performance improvement, the more Lean will result in a higher level of process improvement.
TL;DR: The simulations are used to analyze the influence of train light-weighting, train control, and the load ratio on the energy efficiency of train operation, which can help to improve the system performance.
TL;DR: In this paper, a Geographical Scheduling Algorithm (GSA) is proposed to improve the performance of the precoding in multi-beam SatCom systems by considering multiple channel matrices.
Abstract: Current State-of-the-Art High Throughput Satellite systems provide wide-area connectivity through multi-beam architectures. Due to the tremendous system throughput requirements that next generation Satellite Communications (SatCom) expect to achieve, traditional 4-colour frequency reuse schemes are not sufficient anymore and more aggressive solutions as full frequency reuse are being considered for multi-beam SatCom. These approaches require advanced interference management techniques to cope with the significantly increased inter-beam interference both at the transmitter, e.g., precoding, and at the receiver, e.g., Multi User Detection (MUD). With respect to the former, several peculiar challenges arise when designed for SatCom systems. In particular, multiple users are multiplexed in the same transmission radio frame, thus imposing to consider multiple channel matrices when computing the precoding coefficients. In previous works, the main focus has been on the users’ clustering and precoding design. However, even though achieving significant throughput gains, no analysis has been performed on the impact of the system scheduling algorithm on multicast precoding, which is typically assumed random. In this paper, we focus on this aspect by showing that, although the overall system performance is improved, a random scheduler does not properly tackle specific scenarios in which the precoding algorithm can poorly perform. Based on these considerations, we design a Geographical Scheduling Algorithm (GSA) aimed at improving the precoding performance in these critical scenarios and, consequently, the performance at system level as well. Through extensive numerical simulations, we show that the proposed GSA provides a significant performance improvement with respect to the legacy random scheduling.
TL;DR: This study proposes a feature selection-based approach to identify reasonable spatial-temporal traffic patterns related to the target link, in order to improve the online-prediction performance and is a promising methodology for short-term traffic prediction.
Abstract: This study proposes a feature selection-based approach to identify reasonable spatial-temporal traffic patterns related to the target link, in order to improve the online-prediction performance. The prediction task is composed of two steps: one hybrid intelligent algorithm-based feature selector (FS) is proposed to optimise original state vectors, which are designed empirically during the offline process and optimised state vectors are employed to carry out the online prediction. Numerical experiments by three non-parametric algorithms are conducted with taxis' global positioning system data in an urban road network of Changsha, China. It is concluded that: (i) under optimised state vectors, the prediction accuracies improve or almost maintain the same; (ii) K-nearest neighbour (KNN) with the simplest state vectors obtains the greatest improvement of prediction performance; (iii) although the performance improvement of e-support vector regression is limited with optimised state vectors, it always outperforms backward-propagation neural network and KNN; and (iv) three non-parametric approaches with optimised state vectors outperform auto-regressive integrated moving average in relatively longer prediction horizons. In conclusion, such FS-based approach is able to improve or guarantee the prediction performance under the remarkably reduced model complexity, and is a promising methodology for short-term traffic prediction.
TL;DR: To the best of the knowledge, PShifter is the first approach to transparently and automatically apply power capping non-uniformly across processors of a job in a dynamic manner adapting to phase changes.
Abstract: The US Department of Energy (DOE) has set a power target of 20-30MW on the first exascale machines. To achieve one exaFLOPS under this power constraint, it is necessary to manage power intelligently while maximizing performance. Most production-level parallel applications suffer from computational load imbalance across distributed processes due to non-uniform work decomposition. Other factors like manufacturing variation and thermal variation in the machine room may amplify this imbalance. As a result of this imbalance, some processes of a job reach the blocking calls, collectives or barriers earlier and wait for others to reach the same point. This waiting results in a wastage of energy and CPU cycles which degrades application efficiency and performance.We address this problem for power-limited jobs via Power Shifter (PShifter), a dual-level, feedback-based mechanism that intelligently and automatically detects such imbalance and reduces it by dynamically re-distributing a job's power budget across processors to improve the overall performance of the job compared to a naive uniform power distribution across nodes. In contrast to prior work, PShifter ensures that a given power budget is not violated. At the bottom level of PShifter, local agents monitor and control the performance of processors by actuating different power levels. They reduce power from the processors that incur substantial wait times. At the top level, the cluster agent that has the global view of the system, monitors the job's power consumption and provides feedback on the unused power, which is then distributed across the processors of the same job. Our evaluation on an Intel cluster shows that PShifter achieves performance improvement of up to 21% and energy savings of up to 23% compared to uniform power allocation, outperforms static approaches by up to 40% and 22% for codes with and without phase changes, respectively, and outperforms dynamic schemes by up to 19%. To the best of our knowledge, PShifter is the first approach to transparently and automatically apply power capping non-uniformly across processors of a job in a dynamic manner adapting to phase changes.
TL;DR: DLV improves flash access speeds based on process variations and data retention time difference across flash blocks and integrates access speed optimization with access scheduling such that the average access response time can be effectively reduced on flash memory storage systems.
Abstract: NAND flash has been widely adopted in storage systems due to its better read and write performance and lower power consumption over traditional mechanical hard drives. To meet the increasing performance demand of modern applications, recent studies speed up flash accesses by exploiting access latency variations at the device level. Unfortunately, existing flash access schedulers are still oblivious to such variations, leading to suboptimal I/O performance improvements. In this paper, we propose DLV, a novel flash access scheduler for exploring scheduling opportunities due to device level access latency variations. DLV improves flash access speeds based on process variations and data retention time difference across flash blocks. More importantly, DLV integrates access speed optimization with access scheduling such that the average access response time can be effectively reduced on flash memory storage systems. Our experimental results show that DLV achieves an average of 41.5% performance improvement over the state-of-the-art.
TL;DR: This paper proposes a novel architecture, called SmarCo, which allows high-throughput applications to be processed more efficiently in datacenters, and implements large-scale many-core architecture with in-pair threads to support high-concurrency processing and introduces a hierarchical ring topology and laxity-aware task scheduler to guarantee hard real-time response.
Abstract: Fast-growing high-throughput applications, such as web services, are characterized by high-concurrency processing, hard real-time response, and high-bandwidth memory access. The newly-born applications bring severe challenges to processors in datacenters, both in concurrent processing performance and energy efficiency. To offer a satisfactory quality of services, it is of critical importance to meet these newly emerging demands of high-throughput applications in the future datacenters in a more efficient way. In this paper, we propose a novel architecture, called SmarCo, which allows high-throughput applications to be processed more efficiently in datacenters. Based on the dominant characteristics of high-throughput applications, we implement large-scale many-core architecture with in-pair threads to support high-concurrency processing; we also introduce a hierarchical ring topology and laxity-aware task scheduler to guarantee hard real-time response; furthermore, we propose high-throughput datapath to improve memory access efficiency. We verify the efficiency of SmarCo by using simulators, large-scale FPGA and prototype with TSMC 40-nm technology node. The experimental results show that, compared to Intel Xeon E7-8890V4, SmarCo achieves 10.11X performance improvement and 6.95X energy-efficiency improvement with higher throughput and a better guarantee of real-time response.
TL;DR: In this paper, an online distributed reinforcement learning (OD-RL)-based DVFS control algorithm for many-core system performance improvement under both power and performance constraints is presented, where a per-core RL method is used to learn the optimal control policy of the voltage/frequency (VF) levels in a model-free manner.
Abstract: As power density emerges as the main constraint for many-core systems, controlling power consumption under the thermal design power while maximizing the performance becomes increasingly critical To dynamically save power, dynamic voltage frequency scaling techniques have proved to be effective and are widely available commercially Meanwhile, systems have certain performance constraints that the applications should satisfy to ensure quality of service In this paper, we present an online distributed reinforcement learning (OD-RL)-based DVFS control algorithm for many-core system performance improvement under both power and performance constraints At the finer grain, a per-core RL method is used to learn the optimal control policy of the voltage/frequency (VF) levels in a model-free manner At the coarser grain, an efficient global power budget reallocation algorithm is used to maximize the overall performance The experiments show that compared to the state-of-the-art algorithms: 1) OD-RL produces up to 98% less budget overshoot; 2) up to 23% higher energy efficiency; and 3) two orders of magnitude speedup over state-of-the-art techniques for systems with hundreds of cores Furthermore, priority-aware OD-RL can better satisfy performance constraints than OD-RL with: 1) $178\boldsymbol {\times }$ more epochs satisfying the performance constraints; 2) $56\boldsymbol {\times }$ better performance gain; and 3) $200\boldsymbol {\times }$ better performance-power tradeoffs under similar efficiency and scalability
TL;DR: The aim of this work is to assess the performance improvement offered by several driving strategies of a dual-input digital Doherty power amplifier with respect to the equivalent single-input topology and shows a superior performance of the digital DPA over the analog one, thus justifying the additional.
Abstract: The aim of this work is to assess the performance improvement offered by several driving strategies of a dual-input digital Doherty power amplifier with respect to the equivalent single-input topology. To offer a fair comparison, an analog amplifier and the equivalent digital version, which is equal in all parts except for the absence of the input power divider, are designed at 3.5 GHz. The flexibility of a dual-input control allows to implement power-dependent input signal splitting and phase alignment between the main and auxiliary branches, thus allowing to overcome several shortcomings of traditional analog Doherty amplifiers. The proposed analysis focuses on the gain and efficiency performance over a 6 dB back-off range. The comparison over the 3.1–3.7 GHz range shows a superior performance of the digital DPA over the analog one, thus justifying the additional.
TL;DR: In this article, the authors examined the mediating effect of organizational commitment on the relationship between strategic orientation, organizational culture, organizational IMO, and organizational performance, and found significant positive direct relationships between organizational commitment and organizational culture.
Abstract: In recent times, there has been an increasing interest in the strategic attributes which aims to achieve the superior organizational performance that allows organizations, including the banks, to be competitive with time. Therefore, to achieve superior organizational performance and successful bank growth, the banks need to focus on their strategic attributes. The key strategic attributes include strategic orientation, organizational culture, organizational IMO, and organizational commitment. Drawing upon the resource-based view theory (RBV) and the social exchange theory (SET), this study examined the influence of these strategic attributes on organizational performance. Moreover, this study also examined the mediating effect of organizational commitment on the relationship between strategic orientation, organizational culture, organizational IMO and organizational performance. The data was collected from the 260 bank managers working in the branches of six-large banks of Pakistan. The results of PLS path modeling revealed the significant positive direct relationships between strategic orientation, organizational culture, organizational IMO and organizational commitment, and organizational performance. Similarly, the study also found significant positive direct relationships between strategic orientation and organizational culture, and organizational commitment. However, no significant relationship existed between organizational IMO and organizational commitment. Furthermore, the bootstrapping results revealed that organizational commitment mediated the relationships between strategic orientation, organizational culture, and organizational performance. In contrast, the study did not find any mediation of organizational commitment between organizational IMO and organizational performance relationship. In general, the findings showcased that organizational performance can be enhanced through the examined key strategic attributes of the study. Accordingly, the study has forwarded noteworthy claims regarding the mediating effect of organizational commitment on these variables. The study offers theoretical and practical contributions. This study also highlights the crucial role of these strategic attributes for performance improvement in the banking sector. Lastly, limitations and scope of further studies are also provided.
TL;DR: A new control approach based on type-2 fuzzy neural controller (T2FNC) is employed in order to improve the dynamic response of an ultra-lift Luo DC–DC converter under different operational conditions.
TL;DR: A new algorithm is proposed for data Hows management in optical label switched networks that provides the intelligence of scheduling and quality control functionality by using machine learning techniques.
Abstract: Modern optical transport networks are currently facing an unprecedented traffic growth driven by rapid development of cloud technologies, Internet of Things and ubiquitous computing. The global data volume doubles every two year, requiring urgent improvement of the transport infrastructure around the world. In this paper, we propose a new algorithm for data Hows management in optical label switched networks. Unlike existing solutions, our algorithm provides the intelligence of scheduling and quality control functionality by using machine learning techniques. Intelligence, introduced to the network, improves the accuracy of scheduling and overall performance. Although, initially our algorithm does not provide the near optimal performance like many other approaches, it is able to improve over time by learning from previous experience.
TL;DR: The Smoothed MFIIC (SMFIIC) method is developed, which does not suffer from the undesirable learning transient behavior and is achieved by adaptively regulating the learning speed to ensure smooth convergence.
Abstract: Model-Free Inversion-based Iterative Control (MFIIC) enables tracking performance improvement of systems that perform repeating tasks without using a model of the system. The aim of this paper is (i) to show that MFIIC can result in a severe loss of performance if the Signal-to-Disturbance-Ratio (SDR) approaches 1, and (ii) to propose a solution to this problem. The Smoothed MFIIC (SMFIIC) method is developed, which does not suffer from the undesirable learning transient behavior. This is achieved by adaptively regulating the learning speed to ensure smooth convergence. The existence of bad learning transients in MFIIC and the efficacy of SMFIIC are illustrated on an experimental desktop printer.
TL;DR: A survey of the most important and state of the art approaches and models to be used for performance measurement and evaluation of different operating systems using multiple metrics is presented.
Abstract: Through the huge growth of heavy computing applications which require a high level of performance, it is observed that the interest of monitoring operating system performance has also demanded to be grown widely. In the past several years since OS performance has become a critical issue, many research studies have been produced to investigate and evaluate the stability status of OSs performance. This paper presents a survey of the most important and state of the art approaches and models to be used for performance measurement and evaluation. Furthermore, the research marks the capabilities of the performance-improvement of different operating systems using multiple metrics. The selection of metrics which will be used for monitoring the performance depends on monitoring goals and performance requirements. Many previous works related to this subject have been addressed, explained in details, and compared to highlight the top important features that will very beneficial to be depended for the best approach selection.
TL;DR: This work evaluates the performance of file operations on OverlayFS and discusses a method for improving the performance by disabling this synchronization, and shows that the method can significantly improve the writing performance with copy_up by 680 times at most.
Abstract: Server consolidation with virtualization is a popular method to address the issue of a large amount of power consumption of inter-connected computers in data centers. The more computers are consolidated, the more energy is saved. However, highly consolidating, wherein many servers are consolidated into one physical computer, results in large performance decline. Especially, I/O performance is severely decreased as reported. In this work, we focus on Docker, a popular container-based virtualizing system, and OverlayFS. OverlayFS is widely recognized method for improving I/O performance in Docker. First, we evaluate the performance of file operations on OverlayFS. In particular, we focus on the performance of file writing involving copy_up and show that the performance is severely low. Second, we investigate the performance and behavior the filesystem during copy_up and demonstrate that synchronization is the most important issue. Third, we discuss a method for improving the performance by disabling this synchronization. Fourth, we evaluate the improving method and show that the method can significantly improve the writing performance with copy_up by 680 times at most.
TL;DR: Direct and surrogate-based optimization methods, including space mapping, are proposed based on suitable objective functions to efficiently tune the transmitter and receiver equalizers in physical layer (PHY) tuning process, confirming dramatic speed up in PHY tuning and substantial performance improvement.
Abstract: As microprocessor design scales to nanometric technology, traditional post-silicon validation techniques are inappropriate to get a full system functional coverage. Physical complexity and extreme technology process variations introduce design challenges to guarantee performance over process, voltage, and temperature conditions. In addition, there is an increasingly higher number of mixed-signal circuits within microprocessors. Many of them correspond to high-speed input/output (HSIO) links. Improvements in signaling methods, circuits, and process technology have allowed HSIO data rates to scale beyond 10 Gb/s, where undesired effects can create multiple signal integrity problems. With all of these elements, post-silicon validation of HSIO links is tough and time-consuming. One of the major challenges in electrical validation of HSIO links lies in the physical layer (PHY) tuning process, where equalization techniques are used to cancel these undesired effects. Typical current industrial practices for PHY tuning require massive lab measurements, since they are based on exhaustive enumeration methods. In this work, direct and surrogate-based optimization methods, including space mapping, are proposed based on suitable objective functions to efficiently tune the transmitter and receiver equalizers. The proposed methodologies are evaluated by lab measurements on realistic industrial post-silicon validation platforms, confirming dramatic speed up in PHY tuning and substantial performance improvement.
TL;DR: An Overall System Efficiency (OSE) decision support model is described for use in the analysis and prediction of customer satisfaction goals, which uses customer service level in terms of stockout frequency as a trade-off parameter when optimising overall performance achievable from the production line.
TL;DR: The performance improvement for a continuous state system is introduced, which can be used to measure the improvement of systems performance comparing pre- and postmaintenance time.
Abstract: The continuous state system is a special kind of a system in which the states of the system and its components have continuous values, ranging from perfect functioning to complete failure. This paper introduces the performance improvement for a continuous state system, which can be used to measure the improvement of systems performance comparing pre- and postmaintenance time. The probabilistic characteristics of performance improvement are discussed in detail. Then, the performance improvement for multicomponent maintenance and corresponding calculation method are also put forward to establish the objective function for maintenance optimization. Third, a maintenance optimization model for such a system is studied, and corresponding performance improvement based genetic algorithm is provided to search a near global optimal solution. Finally, two numerical examples and an oil transportation system application case study are implemented to verify the effectiveness of the proposed method.
TL;DR: In this article, power quality improvement and enhancement techniques with the aid of intelligent controllers and experimental results are discussed, which helps readers understand the power quality from its fundamental to experimental implementations.
Abstract: This book focusses on power quality improvement and enhancement techniques with aid of intelligent controllers and experimental results. It covers topics ranging from the fundamentals of power quality indices, mitigation methods, advanced controller design and its step by step approach, simulation of the proposed controllers for real time applications and its corresponding experimental results, performance improvement paradigms and its overall analysis, which helps readers understand power quality from its fundamental to experimental implementations. The book also covers implementation of power quality improvement practices.
Key Features
Provides solution for the power quality improvement with intelligent techniques
Incorporated and Illustrated with simulation and experimental results
Discusses renewable energy integration and multiple case studies pertaining to various loads
Combines the power quality literature with power electronics based solutions
Includes implementation examples, datasets, experimental and simulation procedures
TL;DR: By using Compressive Sensing (CS), a lossy data compression method, the bottleneck is lifted from the storage, increasing the bandwidth utilization of the memory to gain further performance improvement from a high-end memory.
Abstract: The gap between computation speed and I/O access on modern computing systems imposes processing limitations in data-intensive applications. Employing high-end memory has proven not to enhance the performance for I/O bound applications, given the low utilization of memory bandwidth in such applications, as highlighted in recent studies. Despite several solutions to improve the performance of storage, none of them is able to shift the bottleneck from the I/O access to the memory subsystem for I/O bound applications. In this paper, we show that in the case of data-intensive multimedia applications, by using Compressive Sensing (CS), a lossy data compression method, the bottleneck is lifted from the storage, increasing the bandwidth utilization of the memory to gain further performance improvement from a high-end memory. The reconstruction of compressed data is however time and memory consuming. To address this challenge, we employ and compare the hardware and software acceleration of Orthogonal Matching Pursuit (OMP), a greedy algorithm, which solves the problem by choosing the most significant variable to reduce the least square error. Our implementation results show that CS increases memory bandwidth utilization by 1.4x and using high bandwidth memory results in 24% performance improvement. Overall, the proposed solution of CS of storage data with FPGA accelerator achieves up to 45% speedup in an end-to-end implementation by only 4.6% accuracy degradation.
TL;DR: An efficient performance optimization engine called Hedgehog is proposed to evaluate the performance based on "Law of Diminishing Marginal Utility" and give an optimal configuration setting and show that this optimization can gain 19.6% performance improvement compared to the naive configuration.
Abstract: Along with the explosive growth of data, there is a great demand to speedup the ability to process them. Although there are several platforms such as Spark that have made analysis easier to developers, the performance tuning for such platforms meanwhile becomes complex. In this paper, we propose an efficient performance optimization engine called Hedgehog to evaluate the performance based on "Law of Diminishing Marginal Utility" and give an optimal configuration setting. The initial experiments show that our optimization can gain 19.6% performance improvement compared to the naive configuration by tuning only 3 parameters.