Data migration

Topic Tools

Papers published on a yearly basis

Papers

Journal Article•10.1109/ACCESS.2014.2332453•

Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

[...]

Han Hu¹, Yonggang Wen², Tat-Seng Chua¹, Xuelong Li³•Institutions (3)

National University of Singapore¹, Nanyang Technological University², Chinese Academy of Sciences³

24 Jun 2014-IEEE Access

TL;DR: This paper presents a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics, and presents the prevalent Hadoop framework for addressing big data challenges.

...read moreread less

Abstract: Recent technological advancements have led to a deluge of data from distinctive domains (e.g., health care and scientific sensors, user-generated data, Internet and financial companies, and supply chain systems) over the past two decades. The term big data was coined to capture the meaning of this emerging trend. In addition to its sheer volume, big data also exhibits other unique characteristics as compared with traditional data. For instance, big data is commonly unstructured and require more real-time analysis. This development calls for new system architectures for data acquisition, transmission, storage, and large-scale data processing mechanisms. In this paper, we present a literature survey and system tutorial for big data analytics platforms, aiming to provide an overall picture for nonexpert readers and instill a do-it-yourself spirit for advanced audiences to customize their own big-data solutions. First, we present the definition of big data and discuss big data challenges. Next, we present a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics. These four modules form a big data value chain. Following that, we present a detailed survey of numerous approaches and mechanisms from research and industry communities. In addition, we present the prevalent Hadoop framework for addressing big data challenges. Finally, we outline several evaluation benchmarks and potential research directions for big data systems.

...read moreread less

1,195 citations

Book•

Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation

[...]

Jez Humble, David Farley

27 Jul 2010

TL;DR: This groundbreaking new book sets out the principles and technical practices that enable rapid, incremental delivery of high quality, valuable new functionality to users, and introduces state-of-the-art techniques, including automated infrastructure management and data migration, and the use of virtualization.

...read moreread less

Abstract: Getting software released to users is often a painful, risky, and time-consuming process. This groundbreaking new book sets out the principles and technical practices that enable rapid, incremental delivery of high quality, valuable new functionality to users. Through automation of the build, deployment, and testing process, and improved collaboration between developers, testers, and operations, delivery teams can get changes released in a matter of hours sometimes even minutesno matter what the size of a project or the complexity of its code base. Jez Humble and David Farley begin by presenting the foundations of a rapid, reliable, low-risk delivery process. Next, they introduce the deployment pipeline, an automated process for managing all changes, from check-in to release. Finally, they discuss the ecosystem needed to support continuous delivery, from infrastructure, data and configuration management to governance. The authors introduce state-of-the-art techniques, including automated infrastructure management and data migration, and the use of virtualization. For each, they review key issues, identify best practices, and demonstrate how to mitigate risks. Coverage includes Automating all facets of building, integrating, testing, and deploying software Implementing deployment pipelines at team and organizational levels Improving collaboration between developers, testers, and operations Developing features incrementally on large and distributed teams Implementing an effective configuration management strategy Automating acceptance testing, from analysis to implementation Testing capacity and other non-functional requirements Implementing continuous deployment and zero-downtime releases Managing infrastructure, data, components and dependencies Navigating risk management, compliance, and auditing Whether youre a developer, systems administrator, tester, or manager, this book will help your organization move from idea to release faster than everso you can deliver value to your business rapidly and reliably.

...read moreread less

1,167 citations

Journal Article•10.1023/A:1015398403337•

Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms*

[...]

Chenyang Lu¹, John A. Stankovic¹, Gang Tao¹, Sang H. Son¹•Institutions (1)

University of Virginia¹

01 Jan 2001-Real-time Systems

TL;DR: Performance evaluation results demonstrate that the analytically tuned FCS algorithms provide robust transient and steady state performance guarantees for periodic and aperiodic tasks even when the task execution times vary by as much as 100% from the initial estimate.

...read moreread less

Abstract: We develop Feedback Control real-time Scheduling (FCS) as a unified framework to provide Quality of Service (QoS) guarantees in unpredictable environments (such as e-business servers on the Internet). FCS includes four major components. First, novel scheduling architectures provide performance control to a new category of QoS critical systems that cannot be addressed by traditional open loop scheduling paradigms. Second, we derive dynamic models for computing systems for the purpose of performance control. These models provide a theoretical foundation for adaptive performance control. Third, we apply established control methodology to design scheduling algorithms with proven performance guarantees, which is in contrast with existing heuristics-based solutions relying on laborious design/tuning/testing iterations. Fourth, a set of control-based performance specifications characterizes the efficiency, accuracy, and robustness of QoS guarantees. The generality and strength of FCS are demonstrated by its instantiations in three important applications with significantly different characteristics. First, we develop real-time CPU scheduling algorithms that guarantees low deadline miss ratios in systems where task execution times may deviate from estimations at run-time. We solve the saturation problems of real-time CPU scheduling systems with a novel integrated control structure. Second, we develop an adaptive web server architecture to provide relative and absolute delay guarantees to different service classes with unpredictable workloads. The adaptive architecture has been implemented by modifying an Apache web server. Evaluation experiments on a testbed of networked Linux PC's demonstrate that our server provides robust relative/absolute delay guarantees despite of instantaneous changes in the user population. Third, we develop a data migration executor for networked storage systems that migrate data on-line while guaranteeing specified I/O throughput of concurrent applications.

...read moreread less

662 citations

Patent•

Systems and methods for classifying and transferring information in a storage network

[...]

Anand Prahlad, Jeremy A. Schwartz, David Ngo, Brian Brockway, Marcus S. Muller - Show less +1 more

28 Nov 2006

TL;DR: In this article, the authors describe systems and methods for data classification to facilitate and improve data management within an enterprise and present methods for generating a data structure of metadata that describes system data and storage operations.

...read moreread less

Abstract: Systems and methods for data classification to facilitate and improve data management within an enterprise are described. The disclosed systems and methods evaluate and define data management operations based on data characteristics rather than data location, among other things. Also provided are methods for generating a data structure of metadata that describes system data and storage operations. This data structure may be consulted to determine changes in system data rather than scanning the data files themselves.

...read moreread less

633 citations

Journal Article•10.1109/TPDS.2012.66•

Cooperative Provable Data Possession for Integrity Verification in Multicloud Storage

[...]

Yan Zhu¹, Hongxin Hu², Gail-Joon Ahn², Mengyang Yu¹•Institutions (2)

Peking University¹, Arizona State University²

01 Dec 2012-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper addresses the construction of an efficient PDP scheme for distributed cloud storage to support the scalability of service and data migration, in which it considers the existence of multiple cloud service providers to cooperatively store and maintain the clients' data.

...read moreread less

Abstract: Provable data possession (PDP) is a technique for ensuring the integrity of data in storage outsourcing. In this paper, we address the construction of an efficient PDP scheme for distributed cloud storage to support the scalability of service and data migration, in which we consider the existence of multiple cloud service providers to cooperatively store and maintain the clients' data. We present a cooperative PDP (CPDP) scheme based on homomorphic verifiable response and hash index hierarchy. We prove the security of our scheme based on multiprover zero-knowledge proof system, which can satisfy completeness, knowledge soundness, and zero-knowledge properties. In addition, we articulate performance optimization mechanisms for our scheme, and in particular present an efficient method for selecting optimal parameter values to minimize the computation costs of clients and storage service providers. Our experiments show that our solution introduces lower computation and communication overheads in comparison with noncooperative approaches.

...read moreread less

583 citations

...

Expand

Year	Papers
2025	16
2024	23
2023	19
2022	31
2021	92
2020	173

Topic Tools

Papers published on a yearly basis

Papers

Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation

Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms*

Systems and methods for classifying and transferring information in a storage network

Cooperative Provable Data Possession for Integrity Verification in Multicloud Storage

Related Topics (5)

Performance Metrics