TL;DR: The experience in building an edge computing platform called DNR that uses a distributed data flow programming model based on the popular open source Node-RED tool and a new approach in applying the concept of exogenous coordination is presented.
Abstract: Technology advancement has pushed computation to the network edge, paving the way for a class of IoT applications that leverage CPU, storage and communications in edge devices. Building these new IoT applications is not an easy task however. Two key challenges are supporting the dynamic nature of the edge network and the context-dependent characteristics of application logic. In this paper we report our experience in building an edge computing platform called Distributed Node-RED (DNR) that uses a distributed data flow programming model based on the popular open source Node-RED tool. We describe some of the challenges we faced as well as some novel solutions that were implemented in our platform. A new approach in applying the concept of exogenous coordination is also presented and shown to be necessary in building large-scale IoT applications across the edge, fog and cloud.
TL;DR: A code-generation-based optimization approach to bringing performance and scalability to distributed stream processing applications using an operator-based, stream-centric language called SPADE, which supports composing distributed data flow graphs out of toolkits of type-generic operators.
Abstract: We present a code-generation-based optimization approach to bringing performance and scalability to distributed stream processing applications. We express stream processing applications using an operator-based, stream-centric language called SPADE, which supports composing distributed data flow graphs out of toolkits of type-generic operators. A major challenge in building such applications is to find an effective and flexible way of mapping the logical graph of operators into a physical one that can be deployed on a set of distributed nodes. This involves finding how best operators map to processes and how best processes map to computing nodes. In this paper, we take a two-stage optimization approach, where an instrumented version of the application is first generated by the SPADE compiler to profile and collect statistics about the processing and communication characteristics of the operators within the application. In the second stage, the profiling information is fed to an optimizer to come up with a physical data flow graph that is deployable across nodes in a computing cluster. This approach not only creates highly optimized applications that are tailored to the underlying computing and networking infrastructure, but also makes it possible to re-target the application to a different hardware setup by simply repeating the optimization step and re-compiling the application to match the physical flow graph produced by the optimizer. Using real-world applications, from diverse domains such as finance and radio-astronomy, we demonstrate the effectiveness of our approach on System S -- a large-scale, distributed stream processing platform.
TL;DR: This paper presents a lightweight hybrid workflow architecture and concrete API, based on a centralised control flow, distributed data flow model, that maintains the robustness and simplicity of centralised orchestration, but facilitates choreography by allowing services to exchange data directly with one another, reducing data that needs to be transferred through acentralised server.
Abstract: When orchestrating data-centric workflows as are commonly found in the sciences, centralised servers can become a bottleneck to the performance of a workflow; output from service invocations are normally transferred via a centralised orchestration engine, when they should be passed directly to where they are needed at the next service in the workflow. To address this performance bottleneck, this paper presents a lightweight hybrid workflow architecture and concrete API, based on a centralised control flow, distributed data flow model. Our architecture maintains the robustness and simplicity of centralised orchestration, but facilitates choreography by allowing services to exchange data directly with one another, reducing data that needs to be transferred through a centralised server. Furthermore our architecture is standards compliment, flexible and is a non-disruptive solution; service definitions do not have to be altered prior to enactment.
TL;DR: This paper studies service integration infrastructures that support the execution of megaservices --- large-scale applications that are composed of autonomous service modules and concludes that the distributed data-flow model is in general superior in performance.
Abstract: This paper studies service integration infrastructures that support the execution of megaservices --- large-scale applications that are composed of autonomous service modules. Integration infrastructures are classified according to their control-flow and data-flow structures. We analyze the effects of data-flows on the performances of the centralized and distributed data-flow models. A mathematical model is built to compare the performances of megaservices. Particularly, aggregated cost and response time metrics are defined and evaluated. We arrive at the conclusion that the distributed data-flow model is in general superior in performance. We also identify the key system parameters as well as system bottlenecks. The analysis provides recommendations for a few techniques to build high-performance and scalable service integration infrastructures based on the distribution of data-flows.
TL;DR: The different characteristics of cloud and fog computing platforms are explained in this chapter and the detailed architecture of both platforms is introduced with a comparative analysis.
Abstract: There is a great impact on our day-to-day life by integrating platforms of cloud computing and Internet-of-things (IoT). Also, some of the limitations exist in today’s era. Although various services of cloud are freely available and are also comparatively cheaper. But it consumes a large amount of network bandwidth. The main disadvantage of cloud computing is the distance between the data center and the data source. Fog computing offers a solution to these kinds of problems in cloud computing. It is one of the distributed service computing models. It completely utilizes the various computing functions of terminal devices. It also exhibits para-virtualized architecture. The different characteristics of cloud and fog computing platforms are explained in this chapter. Also, the detailed architecture of both platforms is introduced with a comparative analysis. On the fog server, fog analytics tool performs data localization. All the methods of application management such as resource coordination technique, distributed application deployment, and distributed data flow method are discussed. Further, research direction in using Deep Learning to Big Data is detailed as the improved formulation of data abstractions, dimensionality reduction, etc. Also, the possible solutions are presented.