A GPU-accelerated Branch-and-Bound Algorithm for the Flow-Shop Scheduling Problem
Nouredine Melab,Imen Chakroun,Mohand Mezmaz,Daniel Tuyttens +3 more
- 24 Sep 2012
- pp 10-17
TL;DR: In this article, a parallel branch-and-bound algorithm based on a GPU-accelerated bounding model is proposed to improve the performance of the bounding mechanism by optimizing data access management.
read more
Abstract: Branch-and-Bound (BaB) algorithms are time-intensive tree-based exploration methods for solving to optimality combinatorial optimization problems. In this paper, we investigate the use of GPU computing as a major complementary way to speed up those methods. The focus is put on the bounding mechanism of BaB algorithms, which is the most time consuming part of their exploration process. We propose a parallel BaB algorithm based on a GPU-accelerated bounding model. The proposed approach concentrate on optimizing data access management to further improve the performance of the bounding mechanism which uses large and intermediate data sets that do not completely fit in GPU memory. Extensive experiments of the contribution have been carried out on well-known FSP benchmarks using an Nvidia Tesla C2050 GPU card. We compared the obtained performances to a single and a multithreaded CPU-based execution. Accelerations up to X100 are achieved for large problem instances.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Fig. 4. Average parallel efficiency for different problem instances: PTM and JM are put together in the shared memory, the pool size is fixed to 1024 × 256. 
TABLE II PARALLEL EFFICIENCY FOR DIFFERENT PROBLEM INSTANCES AND POOL SIZES. ALL THE MATRICES JM , PTM , LM , RM , QM AND MM ARE LOCATED IN THE GPU GLOBAL MEMORY. 
Fig. 1. The search tree generated and explored by a B&B algorithm for solving an FSP with 3 jobs. Nodes with a lower bound (LB) greater (resp. lower or equal) than the upper bound (UB) are pruned (resp. decomposed or branched). 
Fig. 5. Comparison between the average parallel efficiency for different problem instances obtained with a GPU and a multithreaded-based B&B for a same computational power (500 GFLOPs). 
TABLE III PARALLEL EFFICIENCY FOR DIFFERENT FSP INSTANCES AND POOL SIZES OBTAINED WITH DATA ACCESS OPTIMIZATION.PTM AND JM ARE PLACED TOGETHER IN SHARED MEMORY AND ALL OTHERS ARE PLACED IN GLOBAL MEMORY. 
TABLE I THE DIFFERENT DATA STRUCTURES OF THE LB ALGORITHM AND THEIR ASSOCIATED COMPLEXITIES IN MEMORY SIZE AND NUMBERS OF ACCESSES. THE PARAMETERS n, m AND n′ DESIGNATE RESP. THE TOTAL NUMBER OF JOBS, THE TOTAL NUMBER OF MACHINES AND THE NUMBER OF REMAINING JOBS TO BE SCHEDULED FOR THE SUB-PROBLEMS THE LOWER BOUND IS BEING COMPUTED.
Citations
GPU based parallel genetic algorithm for solving an energy efficient dynamic flexible flow shop scheduling problem
TL;DR: An energy efficient dynamic flexible flow shop scheduling model using the peak power value with consideration of new arrival jobs with a priority based hybrid parallel Genetic Algorithm with a predictive reactive complete rescheduling strategy is developed.
53
Benefits of in-Vehicle Consolidation in Less than Truckload Freight Transportation Operations
TL;DR: In this paper, a branch-and-price algorithm is used to solve a multi-commodity one-to-one pickup-anddelivery vehicle routing problem that is solved using a branch and price algorithm.
30
Benefits of in-vehicle consolidation in less than truckload freight transportation operations
TL;DR: In this paper, a branch-and-price algorithm is used to solve a multi-commodity one-to-one pickup-anddelivery vehicle routing problem that is solved using a branch and price algorithm.
26
An Extended Flexible Job Shop Scheduling Model for Flight Deck Scheduling with Priority, Parallel Operations, and Sequence Flexibility
TL;DR: A novel mixed integer liner programming formulation (MILP) for the flight deck scheduling problem is proposed and an improved differential evolution algorithm combined with typical local search strategies are designed to improve computational efficiency.
Towards a heterogeneous and adaptive parallel Branch-and-Bound algorithm
Imen Chakroun,Nouredine Melab +1 more
TL;DR: This work revisits the design and implementation of the Branch-and-Bound (BB) and proposes new patterns for combining multi-core and GPU computing for B&B.
14
References
•Book
Computers and Intractability: A Guide to the Theory of NP-Completeness
Michael Randolph Garey,David S. Johnson +1 more
- 01 Jan 1979
TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.
Optimal two- and three-stage production schedules with setup times included
TL;DR: A simple decision rule is obtained in this paper for the optimal scheduling of the production so that the total elapsed time is a minimum.
•Book
Using OpenMP: Portable Shared Memory Parallel Programming
Barbara Chapman,Gabriele Jost,Ruud van der Pas +2 more
- 12 Oct 2007
TL;DR: Using OpenMP describes how to use OpenMP in full-scale applications to achieve high performance on large-scale architectures, and explains how OpenMP is translated into explicitly multithreaded code, providing a valuable behind-the-scenes account of OpenMP program performance.
1.3K
Solving the traveling salesman problem with a distributed branch-and-bound algorithm on a 1024 processor network
S. Tschoke,R. Lubling,B. Monien +2 more
- 25 Apr 1995
TL;DR: This paper is the first to present a parallelization of a highly efficient best-first branch-and-bound algorithm to solve large symmetric traveling salesman problems on a massively parallel computer containing 1024 processors.
65