TL;DR: In this paper, the authors propose a system for backuping files from disk volumes on multiple nodes of a computer network to a common random-access backup storage means, where duplicate files (or portions of files) may be identified across nodes, so that only a single copy of the contents of the duplicate files or portions thereof is stored in the backup storage mean.
Abstract: A system for backing up files from disk volumes on multiple nodes of a computer network to a common random-access backup storage means. As part of the backup process, duplicate files (or portions of files) may be identified across nodes, so that only a single copy of the contents of the duplicate files (or portions thereof) is stored in the backup storage means. For each backup operation after the initial backup on a particular volume, only those files which have changed since the previous backup are actually read from the volume and stored on the backup storage means. In addition, differences between a file and its version in the previous backup may be computed so that only the changes to the file need to be written on the backup storage means. All of these enhancements significantly reduce both the amount of storage and the amount of network bandwidth required for performing the backup. Even when the backup data is stored on a shared-file server, data privacy can be maintained by encrypting each file using a key generated from a fingerprint of the file contents, so that only users who have a copy of the file are able to produce the encryption key and access the file contents. To view or restore files from a backup, a user may mount the backup set as a disk volume with a directory structure identical to that of the entire original disk volume at the time of the backup.
TL;DR: In this article, an improved backup storage system and method for use in conjunction with hierarchical or mass storage servers and networks is described, where baseline, full and incremental backup procedures are used to save file copies.
Abstract: An improved backup storage system and method for use in conjunction with hierarchical or mass storage servers and networks is disclosed. Baseline, full and incremental backup procedures are used to save file copies. In one preferred embodiment, the baseline backup procedure is used to store copies of stable files, i.e. files that are modified less frequently, if at all. With a hierarchical storage server, such files are typically those stored on tertiary storage media, e.g. erasable optical disks, WORMs or magnetic tape. The full backup procedure stores, as full backup copies, copies of all files not in the baseline backup and files that have been changed since the time of their baseline backup. The full backup procedure also stores file identifiers and signal representative of storage locations of baseline backup copies for files which have not been changed since the time of the baseline backup. The incremental backup procedure stores, as incremental backup copies, copies of files not in the baseline or full backups, e.g., new files, and files that have changed since the time of their last backup (baseline, full or incremental). The incremental backup procedure also stores file identifiers and signals representative of storage locations of baseline backup copies for files which have not been changed since the time of the baseline backup, and also stores file identifiers and signals representative of storage locations of full and incremental backup copies for files which have not been changed since the time of their full or incremental backup.
TL;DR: In this article, the file selection processes are distributed throughout the network, which considerably increases the speed of back up over the network and reduces the number of files to be stored in the network.
Abstract: A computer network for backing up data and program files located on networked workstations onto a centralized backup media of a backup device of the network. The backup computer network allows users of workstations of remote nodes to preselect files (by name and by file selection criteria) on their workstation which are to be backed onto the centralized backup media. The file selection processes are therefore distributed throughout the network which considerably increases the speed of backing up over the network.
TL;DR: In this article, a file server maintains files in a shared name space and transmits the requested file with the file server to the second backup client program, which transmits a message to the first backup client to provide a file.
Abstract: Disclosed is a system for backing up files in a distributed computing system. A file server maintains files in a shared name space. The file server provides a first backup client program and a second backup client program with access to the files in the shared name space. The first backup client program initiates a backup request to backup a requested file. A determination is made as to whether the requested file is maintained in a shared name space. The backup request is transmitted to the second backup client program upon determining that the requested file is maintained in the shared name space. The second backup client program transmits a message to the file server to provide the requested file. The file server transmits the requested file with the file server to the second backup client program. The second backup client program then transmits the requested file to a backup server program. The backup server program stores the requested file in a storage device.
TL;DR: In this paper, the backup server (240) is provided with a mechanism to receive backup requests from the clientagents (215) and accept or reject backup requests on the basis of backup server loading, networkloading, or the like.
Abstract: In a computer network environment (200), multiple clients (210) and multiple servers (230) are
connected via a local area network (LAN) (220) to a backup file server (240). Each client (210)
and each server is provided with backup agent software (215), which schedules backup
operations on the basis of time since the last backup, the amount of information generated since
the last backup, or the like. An agent (215) also sends a request to the backup server (240), prior
to an actual backup, including information representative of the files that it intends to back up. The backup server (240) is provided with a mechanism to receive backup requests from the client
agents (215) and accept or reject backup requests on the basis of backup server loading, network
loading, or the like. The backup server (240) is further provided with mechanisms to enact
redundant file elimination (RFE), whereby the server indicates to the client agents, prior to files
being backed up, that certain of the files to be backed up are already stored by the backup server.
Thus, the clients do not need to send the redundant files to be backed up.