TL;DR: A reinforcement learning approach is proposed to achieve the maximum long-term overall network utility while guaranteeing the quality of service requirements of user equipments (UEs) in the downlink of heterogeneous cellular networks.
Abstract: Heterogeneous cellular networks can offload the mobile traffic and reduce the deployment costs, which have been considered to be a promising technique in the next-generation wireless network. Due to the non-convex and combinatorial characteristics, it is challenging to obtain an optimal strategy for the joint user association and resource allocation issue. In this paper, a reinforcement learning (RL) approach is proposed to achieve the maximum long-term overall network utility while guaranteeing the quality of service requirements of user equipments (UEs) in the downlink of heterogeneous cellular networks. A distributed optimization method based on multi-agent RL is developed. Moreover, to solve the computationally expensive problem with the large action space, multi-agent deep RL method is proposed. Specifically, the state, action and reward function are defined for UEs, and dueling double deep Q-network (D3QN) strategy is introduced to obtain the nearly optimal policy. Through message passing, the distributed UEs can obtain the global state space with a small communication overhead. With the double-Q strategy and dueling architecture, D3QN can rapidly converge to a subgame perfect Nash equilibrium. Simulation results demonstrate that D3QN achieves the better performance than other RL approaches in solving large-scale learning problems.
TL;DR: In this article, an experience-driven approach that can learn to well control a communication network from its own experience rather than an accurate mathematical model is proposed. And two new techniques, TE-aware exploration and actor-critic-based prioritized experience replay, are proposed to optimize the general DRL framework particularly for TE.
Abstract: Modern communication networks have become very complicated and highly dynamic, which makes them hard to model, predict and control. In this paper, we develop a novel experience-driven approach that can learn to well control a communication network from its own experience rather than an accurate mathematical model, just as a human learns a new skill (such as driving, swimming, etc). Specifically, we, for the first time, propose to leverage emerging Deep Reinforcement Learning (DRL) for enabling model-free control in communication networks; and present a novel and highly effective DRL-based control framework, DRL-TE, for a fundamental networking problem: Traffic Engineering (TE). The proposed framework maximizes a widely-used utility function by jointly learning network environment and its dynamics, and making decisions under the guidance of powerful Deep Neural Networks (DNNs). We propose two new techniques, TE-aware exploration and actor-critic-based prioritized experience replay, to optimize the general DRL framework particularly for TE. To validate and evaluate the proposed framework, we implemented it in ns-3, and tested it comprehensively with both representative and randomly generated network topologies. Extensive packet-level simulation results show that 1) compared to several widely-used baseline methods, DRL-TE significantly reduces end-to-end delay and consistently improves the network utility, while offering better or comparable throughput; 2) DRL-TE is robust to network changes; and 3) DRL-TE consistently outperforms a state-of-the-art DRL method (for continuous control), Deep Deterministic Policy Gradient (DDPG), which, however, does not offer satisfying performance.
TL;DR: It is proved that the optimal network utility obtained from the fluid-based optimization is an upper bound on the utility in the finite car system for any routing policy, both static and dynamic, under which the closed queueing network has a stationary distribution.
Abstract: This paper considers a closed queueing network model of ridesharing systems, such as Didi Chuxing, Lyft, and Uber. We focus on empty-car routing, a mechanism by which we control car flow in the network to optimize system-wide utility functions, for example, the availability of empty cars when a passenger arrives. We establish both process-level and steady-state convergence of the queueing network to a fluid limit in a large market regime where demand for rides and supply of cars tend to infinity and use this limit to study a fluid-based optimization problem. We prove that the optimal network utility obtained from the fluid-based optimization is an upper bound on the utility in the finite car system for any routing policy, both static and dynamic, under which the closed queueing network has a stationary distribution. This upper bound is achieved asymptotically under the fluid-based optimal routing policy. Simulation results with real-world data released by Didi Chuxing demonstrate the benefit of using the fluid-based optimal routing policy compared with various other policies.
TL;DR: This paper first explores the performance of an efficient dual decomposition and subgradient method based algorithm, called QuickFix, for computing the data sampling rate and routes and improves the total data rate while significantly improving the network utility.
Abstract: Energy harvesting sensor platforms have opened up a new dimension to the design of network protocols. In order to sustain the network operation, the energy consumption rate cannot be higher than the energy harvesting rate, otherwise, sensor nodes will eventually deplete their batteries. In contrast to traditional network resource allocation problems where the resources are static, the time-varying recharging rate presents a new challenge. In this paper, We first explore the performance of an efficient dual decomposition and subgradient method based algorithm, called QuickFix, for computing the data sampling rate and routes. However, fluctuations in recharging can happen at a faster time-scale than the convergence time of the traditional approach. This leads to battery outage and overflow scenarios, that are both undesirable due to missed samples and lost energy harvesting opportunities respectively. To address such dynamics, a local algorithm, called SnapIt, is designed to adapt the sampling rate with the objective of maintaining the battery at a target level. Our evaluations using the TOSSIM simulator show that QuickFix and SnapIt working in tandem can track the instantaneous optimum network utility while maintaining the battery at a target level. When compared with IFRC, a backpressure-based approach, our solution improves the total data rate by 42% on the average while significantly improving the network utility.
TL;DR: The contributions of this paper are employing dynamic spectrum access to mitigate with the channel impairments, defining multi-attribute priority classes and designing a distributed control algorithm for data delivery that maximizes the network utility under QoS constraints.
Abstract: Electromagnetic interference, equipment noise, multi-path effects and obstructions in harsh smart grid environments make the quality-of-service (QoS) communication a challenging task for WSN-based smart grid applications. To address these challenges, a cognitive communication based cross-layer framework has been proposed. The proposed framework exploits the emerging cognitive radio technology to mitigate the noisy and congested spectrum bands, yielding reliable and high capacity links for wireless communication in smart grids. To meet the QoS requirements of diverse smart grid applications, it differentiates the traffic flows into different priority classes according to their QoS needs and maintains three dimensional service queues attributing delay, bandwidth and reliability of data. The problem is formulated as a Lyapunov drift optimization with the objective of maximizing the weighted service of the traffic flows belonging to different classes. A suboptimal distributed control algorithm (DCA) is presented to efficiently support QoS through channel control, flow control, scheduling and routing decisions. In particular, the contributions of this paper are three folds; employing dynamic spectrum access to mitigate with the channel impairments, defining multi-attribute priority classes and designing a distributed control algorithm for data delivery that maximizes the network utility under QoS constraints. Performance evaluations in ns-2 reveal that the proposed framework achieves required QoS communication in smart grid.