TL;DR: In this paper, content indexing, containerized deduplication, and policy-driven storage are performed within a cloud environment, and methods for providing a cloud gateway and a scalable data object store within a Cloud environment are disclosed, along with other features.
Abstract: Data storage operations, including content-indexing, containerized deduplication, and policy-driven storage, are performed within a cloud environment. The systems support a variety of clients and cloud storage sites that may connect to the system in a cloud environment that requires data transfer over wide area networks, such as the Internet, which may have appreciable latency and/or packet loss, using various network protocols, including HTTP and FTP. Methods are disclosed for content indexing data stored within a cloud environment to facilitate later searching, including collaborative searching. Methods are also disclosed for performing containerized deduplication to reduce the strain on a system namespace, effectuate cost savings, etc. Methods are disclosed for identifying suitable storage locations, including suitable cloud storage sites, for data files subject to a storage policy. Further, systems and methods for providing a cloud gateway and a scalable data object store within a cloud environment are disclosed, along with other features.
TL;DR: New public-key cryptosystems that produce constant-size ciphertexts such that efficient delegation of decryption rights for any set of ciphertextS are possible are described, giving the first public-keys patient-controlled encryption for flexible hierarchy.
Abstract: Data sharing is an important functionality in cloud storage. In this paper, we show how to securely, efficiently, and flexibly share data with others in cloud storage. We describe new public-key cryptosystems that produce constant-size ciphertexts such that efficient delegation of decryption rights for any set of ciphertexts are possible. The novelty is that one can aggregate any set of secret keys and make them as compact as a single key, but encompassing the power of all the keys being aggregated. In other words, the secret key holder can release a constant-size aggregate key for flexible choices of ciphertext set in cloud storage, but the other encrypted files outside the set remain confidential. This compact aggregate key can be conveniently sent to others or be stored in a smart card with very limited secure storage. We provide formal security analysis of our schemes in the standard model. We also describe other application of our schemes. In particular, our schemes give the first public-key patient-controlled encryption for flexible hierarchy, which was yet to be known.
TL;DR: In this paper, a new dynamic searchable symmetric encryption (DSE) scheme is presented, which is simpler and more efficient than existing schemes while revealing less information to the server than prior schemes, achieving fully adaptive security against honest but curious servers.
Abstract: Dynamic Searchable Symmetric Encryption allows a client to store a dynamic collection of encrypted documents with a server, and later quickly carry out keyword searches on these encrypted documents, while revealing minimal information to the server. In this paper we present a new dynamic SSE scheme that is simpler and more efficient than existing schemes while revealing less information to the server than prior schemes, achieving fully adaptive security against honest-but-curious servers. We implemented a prototype of our scheme and demonstrated its efficiency on datasets from prior work. Apart from its concrete efficiency, our scheme is also simpler: in particular, it does not require the server to support any operation other than upload and download of data. Thus the server in our scheme can be based solely on a cloud storage service, rather than a cloud computation service as well, as in prior work. In building our dynamic SSE scheme, we introduce a new primitive called Blind Storage, which allows a client to store a set of files on a remote server in such a way that the server does not learn how many files are stored, or the lengths of the individual files, as each file is retrieved, the server learns about its existence (and can notice the same file being downloaded subsequently), but the file's name and contents are not revealed. This is a primitive with several applications other than SSE, and is of independent interest.
TL;DR: The proposed system is capable of supporting public auditability, data dynamics and Multiple TPA are used for the auditing process and extends the concept to ring signatures in which HARS scheme is used.
Abstract: By using Cloud storage, users can access applications, services, software whenever they requires over the internet. Users can put their data remotely to cloud storage and get benefit of on-demand services and application from the resources. The cloud must have to ensure data integrity and security of data of user. The issue about cloud storage is integrity and privacy of data of user can arise. To maintain to overkill this issue here, we are giving public auditing process for cloud storage that users can make use of a third-party auditor (TPA) to check the integrity of data. Not only verification of data integrity, the proposed system also supports data dynamics. The work that has been done in this line lacks data dynamics and true public auditability. The auditing task monitors data modifications, insertions and deletions. The proposed system is capable of supporting public auditability, data dynamics and Multiple TPA are used for the auditing process. We also extend our concept to ring signatures in which HARS scheme is used. Merkle Hash Tree is used to improve block level authentication. Further we extend our result to enable the TPA to perform audits for multiple users simultaneously through Batch auditing. Index Terms: Cloud Storage, Data Dynamics, Public Auditing, Privacy Preserving, Ring Signatures.
TL;DR: This paper designs an expressive, efficient and revocable data access control scheme for multi-authority cloud storage systems, where there are multiple authorities co-exist and each authority is able to issue attributes independently.
Abstract: Data access control is an effective way to ensure the data security in the cloud Due to data outsourcing and untrusted cloud servers, the data access control becomes a challenging issue in cloud storage systems Ciphertext-Policy Attribute-based Encryption (CP-ABE) is regarded as one of the most suitable technologies for data access control in cloud storage, because it gives data owners more direct control on access policies However, it is difficult to directly apply existing CP-ABE schemes to data access control for cloud storage systems because of the attribute revocation problem In this paper, we design an expressive, efficient and revocable data access control scheme for multi-authority cloud storage systems, where there are multiple authorities co-exist and each authority is able to issue attributes independently Specifically, we propose a revocable multi-authority CP-ABE scheme, and apply it as the underlying techniques to design the data access control scheme Our attribute revocation method can efficiently achieve both forward security and backward security The analysis and simulation results show that our proposed data access control scheme is secure in the random oracle model and is more efficient than previous works
TL;DR: A new decentralized access control scheme for secure data storage in clouds that supports anonymous authentication and access control and has the added feature of access control in which only valid users are able to decrypt the stored information.
Abstract: We propose a new decentralized access control scheme for secure data storage in clouds that supports anonymous authentication. In the proposed scheme, the cloud verifies the authenticity of the series without knowing the user's identity before storing data. Our scheme also has the added feature of access control in which only valid users are able to decrypt the stored information. The scheme prevents replay attacks and supports creation, modification, and reading data stored in the cloud. We also address user revocation. Moreover, our authentication and access control scheme is decentralized and robust, unlike other access control schemes designed for clouds which are centralized. The communication, computation, and storage overheads are comparable to centralized approaches.
TL;DR: It is concluded that there remains a need for further research with a focus on real world applicability of a method or methods to address the digital forensic data volume challenge.
TL;DR: This work designs an encryption scheme that guarantees semantic security for unpopular data and provides weaker security and better storage and bandwidth benefits for popular data, and shows that the scheme is secure under the Symmetric External Decisional Diffie-Hellman Assumption in the random oracle model.
Abstract: As more corporate and private users outsource their data to cloud storage providers, recent data breach incidents make end-to-end encryption an increasingly prominent requirement. Unfortunately, semantically secure encryption schemes render various cost-effective storage optimization techniques, such as data deduplication, ineffective. We present a novel idea that differentiates data according to their popularity. Based on this idea, we design an encryption scheme that guarantees semantic security for unpopular data and provides weaker security and better storage and bandwidth benefits for popular data. This way, data deduplication can be effective for popular data, whilst semantically secure encryption protects unpopular content. We show that our scheme is secure under the Symmetric External Decisional Diffie-Hellman Assumption in the random oracle model.
TL;DR: A artefacts were identified that are likely to remain after the use of cloud storage, in the context of the experiments, on a computer hard drive and Apple iPhone3G, and the potential access point(s) for digital forensics examiners to secure evidence.
TL;DR: In this article, an improved approach for using advanced metadata to implement an architecture for managing I/O operations and storage devices for a virtualization environment is presented, where the advanced metadata is used to track data within the storage devices.
Abstract: Disclosed is an improved approach for using advanced metadata to implement an architecture for managing I/O operations and storage devices for a virtualization environment. According to some embodiments, a Service VM is employed to control and manage any type of storage device, including directly attached storage in addition to networked and cloud storage. The advanced metadata is used to track data within the storage devices. A lock-free approach is implemented in some embodiments to access and modify the metadata.
TL;DR: The data vaporizer as mentioned in this paper provides secure online distributed data storage services that securely store and retrieve data in a public distributed storage substrate such as public cloud, and is configurable for different domain requirements including data privacy and anonymization requirements, encryption mechanisms, regulatory compliance of storage locations and backup and recovery constraints.
Abstract: The data vaporizer provides secure online distributed data storage services that securely store and retrieve data in a public distributed storage substrate such as public cloud. The data vaporizer vaporizes (e.g., fragmented into tiny chunks of configurable sizes) data and distributes the fragments to multiple storage nodes so that the data is not vulnerable to local disk failures, secures data so that even if some of the storage nodes are compromised, the data is undecipherable to the attacker, stores data across multiple cloud storage providers and/or parties using keys (e.g., tokens) provided by multiple parties (including the owners of the data) and maintains data confidentiality and integrity even where one or more data storage provider is compromised. The data vaporizer is configurable for different domain requirements including data privacy and anonymization requirements, encryption mechanisms, regulatory compliance of storage locations, and backup and recovery constraints.
TL;DR: This paper presents the first searchable encryption scheme whose updates leak no more information than the access pattern, that still has asymptotically optimal search time, linear, very small and asymptonically optimal index size and can be implemented without storage on the client (except the key).
Abstract: Searchable (symmetric) encryption allows encryption while still enabling search for keywords. Its immediate application is cloud storage where a client outsources its files while the (cloud) service provider should search and selectively retrieve those. Searchable encryption is an active area of research and a number of schemes with different efficiency and security characteristics have been proposed in the literature. Any scheme for practical adoption should be efficient -- i.e. have sub-linear search time --, dynamic -- i.e. allow updates -- and semantically secure to the most possible extent. Unfortunately, efficient, dynamic searchable encryption schemes suffer from various drawbacks. Either they deteriorate from semantic security to the security of deterministic encryption under updates, they require to store information on the client and for deleted files and keywords or they have very large index sizes. All of this is a problem, since we can expect the majority of data to be later added or changed. Since these schemes are also less efficient than deterministic encryption, they are currently an unfavorable choice for encryption in the cloud. In this paper we present the first searchable encryption scheme whose updates leak no more information than the access pattern, that still has asymptotically optimal search time, linear, very small and asymptotically optimal index size and can be implemented without storage on the client (except the key). Our construction is based on the novel idea of learning the index for efficient access from the access pattern itself. Furthermore, we implement our system and show that it is highly efficient for cloud storage.
TL;DR: In this paper, a new dynamic searchable symmetric encryption (DSE) scheme is presented, which is simpler and more efficient than existing schemes while revealing less information to the server than prior schemes, achieving fully adaptive security against honest but curious servers.
Abstract: Dynamic Searchable Symmetric Encryption allows a client to store a dynamic collection of encrypted documents with a server, and later quickly carry out keyword searches on these encrypted documents, while revealing minimal information to the server. In this paper we present a new dynamic SSE scheme that is simpler and more efficient than existing schemes while revealing less information to the server than prior schemes, achieving fully adaptive security against honest-but-curious servers. We implemented a prototype of our scheme and demonstrated its efficiency on datasets from prior work. Apart from its concrete efficiency, our scheme is also simpler: in particular, it does not require the server to support any operation other than upload and download of data. Thus the server in our scheme can be based solely on a cloud storage service, rather than a cloud computation service as well, as in prior work. In building our dynamic SSE scheme, we introduce a new primitive called Blind Storage, which allows a client to store a set of files on a remote server in such a way that the server does not learn how many files are stored, or the lengths of the individual files; as each file is retrieved, the server learns about its existence (and can notice the same file being downloaded subsequently), but the file’s name and contents are not revealed. This is a primitive with several applications other than SSE, and is of independent interest.
TL;DR: The authors present the first ID-RDPC protocol proven to be secure assuming the hardness of the standard computational Diffie-Hellman problem, which outperforms the existing RDPC protocols in the PKI setting in terms of computation and communication.
Abstract: Checking remote data possession is of crucial importance in public cloud storage It enables the users to check whether their outsourced data have been kept intact without downloading the original data The existing remote data possession checking (RDPC) protocols have been designed in the PKI (public key infrastructure) setting The cloud server has to validate the users' certificates before storing the data uploaded by the users in order to prevent spam This incurs considerable costs since numerous users may frequently upload data to the cloud server This study addresses this problem with a new model of identity-based RDPC (ID-RDPC) protocols The authors present the first ID-RDPC protocol proven to be secure assuming the hardness of the standard computational Diffie-Hellman problem In addition to the structural advantage of elimination of certificate management and verification, the authors ID-RDPC protocol also outperforms the existing RDPC protocols in the PKI setting in terms of computation and communication
TL;DR: A public auditing scheme with third party auditor (TPA), who performs data auditing on behalf of user(s) is proposed, which is proved secure in the random oracle model and the performance analysis shows the scheme is efficient.
TL;DR: In this article, the authors propose a storage provisioning for a virtual machine (VM) in a hybrid cloud environment that includes an enterprise network in communication with a cloud and a nested virtual machine container (NVC) in the cloud.
Abstract: A system and a method implement a cloud storage gateway configured to provide secure storage services in a cloud environment. A method can include implementing storage provisioning for a virtual machine (VM) in a hybrid cloud environment that includes an enterprise network in communication with a cloud. Enterprise network includes enterprise storage, and cloud includes cloud storage. The storage provisioning is implemented by deploying a cloud storage gateway in the cloud that facilitates secure migration of data associated with the VM between enterprise storage and cloud storage. A nested virtual machine container (NVC) is also deployed in the cloud, where NVC abstracts an interface that is transparent to a cloud infrastructure of the cloud. Cloud storage gateway can then be executed as a virtual machine within NVC. Such storage provisioning is further implemented by deploying the VM in a NVC in the cloud and directly attaching storage to the VM.
TL;DR: This work presents a proxy-based storage system for fault-tolerant multiple-cloud storage called NCCloud, which achieves cost-effective repair for a permanent single-cloud failure and validate that FMSR codes provide significant monetary cost savings in repair over RAID-6 codes, while having comparable response time performance in normal cloud storage operations such as upload/download.
Abstract: To provide fault tolerance for cloud storage, recent studies propose to stripe data across multiple cloud vendors. However, if a cloud suffers from a permanent failure and loses all its data, we need to repair the lost data with the help of the other surviving clouds to preserve data redundancy. We present a proxy-based storage system for fault-tolerant multiple-cloud storage called NCCloud, which achieves cost-effective repair for a permanent single-cloud failure. NCCloud is built on top of a network-coding-based storage scheme called the functional minimum-storage regenerating (FMSR) codes, which maintain the same fault tolerance and data redundancy as in traditional erasure codes (e.g., RAID-6), but use less repair traffic and, hence, incur less monetary cost due to data transfer. One key design feature of our FMSR codes is that we relax the encoding requirement of storage nodes during repair, while preserving the benefits of network coding in repair. We implement a proof-of-concept prototype of NCCloud and deploy it atop both local and commercial clouds. We validate that FMSR codes provide significant monetary cost savings in repair over RAID-6 codes, while having comparable response time performance in normal cloud storage operations such as upload/download.
TL;DR: A better understanding of the security challenges of cloud computing is provided to identify approaches and solutions which have been proposed and adopted by the cloud service industry.
TL;DR: Experimental results conclusively demonstrate that the proposed Multi-objective Optimized Replication Management (MORM) is energy effective and outperforms default replication management of HDFS and MOE (Multi-objectives Evolutionary) algorithm in terms of performance and load balancing for large-scale cloud storage cluster.
TL;DR: The authors describe the approaches taken by Globus to create standard data interfaces and common security models for performing these actions on large quantities of data, allowing users to access different types of cloud storage with the same ease with which they access local storage.
Abstract: Cloud computing provides a scalable computing platform through which large datasets can be stored and analyzed. However, because of the number of storage models used and rapidly increasing data sizes, it is often difficult to efficiently and securely access, transfer, synchronize, and share data. The authors describe the approaches taken by Globus to create standard data interfaces and common security models for performing these actions on large quantities of data. These approaches are general, allowing users to access different types of cloud storage with the same ease with which they access local storage. Through an existing network of more than 8,000 active storage endpoints and support for direct access to cloud storage, Globus has demonstrated both the effectiveness and scalability of the approaches presented.
TL;DR: This paper introduces a new security notion appropriate for the setting of deduplication and shows that it is strictly stronger than all relevant notions, and provides a rigorous proof of security against this notion, in the random oracle model, for the DupLESS architecture which is lacking in the original paper.
Abstract: Large-scale cloud storage systems often attempt to achieve two seemingly conflicting goals: (1) the systems need to reduce the copies of redundant data to save space, a process called deduplication; and (2) users demand encryption of their data to ensure privacy. Conventional encryption makes deduplication on ciphertexts ineffective, as it destroys data redundancy. A line of work, originated from Convergent Encryption [27], and evolved into Message Locked Encryption [13] and the latest DupLESS architecture [12], strives to solve this problem. DupLESS relies on a key server to help the clients generate encryption keys that result in convergent ciphertexts. In this paper, we first introduce a new security notion appropriate for the setting of deduplication and show that it is strictly stronger than all relevant notions. We then provide a rigorous proof of security against this notion, in the random oracle model, for the DupLESS architecture which is lacking in the original paper. Our proof shows that using additional secret, other than the data itself, for generating encryption keys achieves the best possible security under current deduplication paradigm. We also introduce a distributed protocol that eliminates the need for the key server. This not only provides better protection but also allows less managed systems such as P2P systems to enjoy the high security level. Implementation and evaluation show that the scheme is both robust and practical.
TL;DR: SAR, an SSD (solid-state drive)-Assisted Read scheme, is proposed, that effectively exploits the high random-read performance properties of SSDs and the unique data-sharing characteristic of deduplication-based storage systems by storing in SSDs theunique data chunks with high reference count, small size, and nonsequential characteristics.
Abstract: Data deduplication has been demonstrated to be an effective technique in reducing the total data transferred over the network and the storage space in cloud backup, archiving, and primary storage systems, such as VM (virtual machine) platforms. However, the performance of restore operations from a deduplicated backup can be significantly lower than that without deduplication. The main reason lies in the fact that a file or block is split into multiple small data chunks that are often located in different disks after deduplication, which can cause a subsequent read operation to invoke many disk IOs involving multiple disks and thus degrade the read performance significantly. While this problem has been by and large ignored in the literature thus far, we argue that the time is ripe for us to pay significant attention to it in light of the emerging cloud storage applications and the increasing popularity of the VM platform in the cloud. This is because, in a cloud storage or VM environment, a simple read request on the client side may translate into a restore operation if the data to be read or a VM suspended by the user was previously deduplicated when written to the cloud or the VM storage server, a likely scenario considering the network bandwidth and storage capacity concerns in such an environment.To address this problem, in this article, we propose SAR, an SSD (solid-state drive)-Assisted Read scheme, that effectively exploits the high random-read performance properties of SSDs and the unique data-sharing characteristic of deduplication-based storage systems by storing in SSDs the unique data chunks with high reference count, small size, and nonsequential characteristics. In this way, many read requests to HDDs are replaced by read requests to SSDs, thus significantly improving the read performance of the deduplication-based storage systems in the cloud. The extensive trace-driven and VM restore evaluations on the prototype implementation of SAR show that SAR outperforms the traditional deduplication-based and flash-based cache schemes significantly, in terms of the average response times.
TL;DR: The results of several Cloud QoE studies performed for different Cloud-based services, ranging from services with low requirements in terms of latency and interactivity to highly interactive services, provide a ground truth basis for developing future Cloud services withQoE requirements, as well as for dimensioning the underlying network provisioning infrastructures, particularly with regard to mobile access technologies.
TL;DR: A novel metric named TUE is defined to quantify the Traffic Usage Efficiency} of data synchronization of cloud storage services and demonstrates that a considerable portion of the data sync traffic is in a sense wasteful, and can be effectively avoided or significantly reduced via carefully designed data sync mechanisms.
Abstract: Cloud storage services such as Dropbox, Google Drive, and Microsoft OneDrive provide users with a convenient and reliable way to store and share data from anywhere, on any device, and at any time. The cornerstone of these services is the data synchronization (sync) operation which automatically maps the changes in users' local filesystems to the cloud via a series of network communications in a timely manner. If not designed properly, however, the tremendous amount of data sync traffic can potentially cause (financial) pains to both service providers and users. This paper addresses a simple yet critical question: Is the current data sync traffic of cloud storage services efficiently used? We first define a novel metric named TUE to quantify the Traffic Usage Efficiency} of data synchronization. Based on both real-world traces and comprehensive experiments, we study and characterize the TUE of six widely used cloud storage services. Our results demonstrate that a considerable portion of the data sync traffic is in a sense wasteful, and can be effectively avoided or significantly reduced via carefully designed data sync mechanisms. All in all, our study of TUE of cloud storage services not only provides guidance for service providers to develop more efficient, traffic-economic services, but also helps users pick appropriate services that best fit their needs and budgets.
TL;DR: An in-depth survey on recent research activities of cloud storage security in association with cloud computing, focusing on the key security requirement triad, i.e., data integrity, data confidentiality, and availability.
Abstract: Cloud Computing has become a well-known primitive nowadays; many researchers and companies are embracing this fascinating technology with feverish haste. In the meantime, security and privacy challenges are brought forward while the number of cloud storage user increases expeditiously. In this work, we conduct an in-depth survey on recent research activities of cloud storage security in association with cloud computing. After an overview of the cloud storage system and its security problem, we focus on the key security requirement triad, i.e., data integrity, data confidentiality, and availability. For each of the three security objectives, we discuss the new unique challenges faced by the cloud storage services, summarize key issues discussed in the current literature, examine, and compare the existing and emerging approaches proposed to meet those new challenges, and point out possible extensions and futuristic research opportunities. The goal of our paper is to provide a state-of-the-art knowledge to new researchers who would like to join this exciting new field.
TL;DR: This is the first secure data-oblivious shuffle that is not based on sorting, and can be used to improve previous oblivious storage solutions for network-based outsourcing of data.
Abstract: We present a simple, efficient, and secure data-oblivious randomized shuffle algorithm. This is the first secure data-oblivious shuffle that is not based on sorting. Our method can be used to improve previous oblivious storage solutions for network-based outsourcing of data.
TL;DR: Hybris key-value store is presented, the first robust hybrid cloud storage system, aiming at addressing security, reliability, and consistency concerns leveraging both private and public cloud resources, and significantly outperforms comparable multi-cloud storage systems.
Abstract: Besides well-known benefits, commodity cloud storage also raises concerns that include security, reliability, and consistency. We present Hybris key-value store, the first robust hybrid cloud storage system, aiming at addressing these concerns leveraging both private and public cloud resources.Hybris robustly replicates metadata on trusted private premises (private cloud), separately from data which is dispersed (using replication or erasure coding) across multiple untrusted public clouds. Hybris maintains metadata stored on private premises at the order of few dozens of bytes per key, avoiding the scalability bottleneck at the private cloud. In turn, the hybrid design allows Hybris to efficiently and robustly tolerate cloud outages, but also potential malice in clouds without overhead. Namely, to tolerate up to f malicious clouds, in the common case of the Hybris variant with data replication, writes replicate data across f + 1 clouds, whereas reads involve a single cloud. In the worst case, only up to f additional clouds are used. This is considerably better than earlier multi-cloud storage systems that required costly 3f + 1 clouds to mask f potentially malicious clouds. Finally, Hybris leverages strong metadata consistency to guarantee to Hybris applications strong data consistency without any modifications to the eventually consistent public clouds.We implemented Hybris in Java and evaluated it using a series of micro and macrobenchmarks. Our results show that Hybris significantly outperforms comparable multi-cloud storage systems and approaches the performance of bare-bone commodity public cloud storage.
TL;DR: It is demonstrated that the protocol is insecure when an active adversary is involved in the cloud environment and the adversary is able to arbitrarily modify the cloud data without being detected by the auditor in the auditing process.
Abstract: Using cloud storage, data owners can remotely store their data and enjoy the on-demand high quality cloud services without the burden of local data storage and maintenance. However, this new paradigm does trigger many security concerns. A major concern is how to ensure the integrity of the outsourced data. To address this issue, recently, a highly efficient dynamic auditing protocol (IEEE Transactions on Parallel and Distributed Systems, doi:10.1109/TPDS.2013.199) for cloud storage was proposed which enjoys many desirable features. Unfortunately, in this letter, we demonstrate that the protocol is insecure when an active adversary is involved in the cloud environment. We show that the adversary is able to arbitrarily modify the cloud data without being detected by the auditor in the auditing process. We also suggest a solution to fix the problem while preserving all the properties of the original protocol.
TL;DR: To have efficient cloud storage confidentiality, this paper uses encryption and obfuscation as two different techniques to protect the data in the cloud storage.
Abstract: Cloud computing provides an enormous amount of virtual storage to the users. Cloud storage mainly helps to small and medium scale industries to reduce their investments and maintenance of storage servers. Cloud storage is efficient for data storage. Users' data are sent to the cloud is to be stored in the public cloud environment. Data stored in the cloud storage might mingle with other users' data. This will lead to the data protection issue in cloud storage. If the confidentiality of cloud data is broken, then it will cause loss of data to the industry. Security of cloud storage is ensured through confidentiality parameter. To ensure the confidentiality, the most common used technique is encryption. But encryption alone doesn't give maximum protection to the data in the cloud storage. To have efficient cloud storage confidentiality, this paper uses encryption and obfuscation as two different techniques to protect the data in the cloud storage. Encryption is the process of converting the readable text into unreadable form using an algorithm and a key. Obfuscation is same like encryption. Obfuscation is a process which disguises illegal users by implementing a particular mathematical function or using programming techniques. Based on the type of data, encryption and obfuscation can be applied. Encryption can be applied to alphabets and alphanumeric type of data and obfuscation can be applied to a numeric type of data. Applying encryption and obfuscation techniques on the cloud data will provide more protection against unauthorized usage. Confidentiality could be achieved with a combination of encryption and obfuscation.
TL;DR: ALG-Dedupe is presented, an Application-aware Local-Global source deduplication scheme that improves data dedUplication efficiency by exploiting application awareness, and further combines local and global duplicate detection to strike a good balance between cloud storage capacity saving and deduPLication time reduction.
Abstract: In personal computing devices that rely on a cloud storage environment for data backup, an imminent challenge facing source deduplication for cloud backup services is the low deduplication efficiency due to a combination of the resource-intensive nature of deduplication and the limited system resources. In this paper, we present ALG-Dedupe, an Application-aware Local-Global source deduplication scheme that improves data deduplication efficiency by exploiting application awareness, and further combines local and global duplicate detection to strike a good balance between cloud storage capacity saving and deduplication time reduction. We perform experiments via prototype implementation to demonstrate that our scheme can significantly improve deduplication efficiency over the state-of-the-art methods with low system overhead, resulting in shortened backup window, increased power efficiency and reduced cost for cloud backup services of personal storage.