Christophe Bertrand - Vice President, Product Marketing, Arcserve
Data&StorageAsean: What is the difference between deduplication and other Data Reduction technologies such as compression?
Christophe Bertrand: Compression is the process of reducing the size of a data file. It is very useful in limiting the amount of storage a file takes, which typically speeds up transfer times over networks.
In contrast, data deduplication looks for similar blocks across a set of data or backup nodes at the block level and only “saves” a unique block once. The technology includes advanced calculation algorithms to ensure that data is compared efficiently and coherently. There are different types of data deduplication technologies; in the backup space, the most efficient type is source-side deduplication, which ideally applies globally to all protected systems (global source-side deduplication). Benefits include lower bandwidth utilization, faster backups, significant storage savings, and overall improved disaster recovery capabilities.
Data&StorageAsean: Why do the use cases we see for deduplication seem to be limited to backup appliances and all flash arrays?
Christophe Bertrand: There are multiple benefits of deduplication that primarily lie in the reduction of storage footprints. Storage arrays are a natural place where an end user would want deduplication in order to optimize their investment. That said, there is a drawback in terms of performance and sometimes licensing to using deduplication on an array. More importantly, the deduplication benefits will only apply to the array in question. There’s nothing global to it – it just can’t be shared.
There is also a misconception that deduplication only applies to backup appliances, typically target appliances, meaning that deduplication happens after the backup stream (post-processing) or partially during the backup. These appliances typically require a “landing zone” to stage the data and are ultimately an additional cost to the backup infrastructure. Deduplication is actually software that can be deployed in many different ways depending on the vendor. Modern backup software should be able to de-duplicate at the source.
Data&StorageAsean: Are there different approaches to deduplication and if so what are the benefits and downsides of each?
Christophe Bertrand: There are multiple approaches to deduplication:
- Target deduplication: typically delivered as an appliance that will de-duplicate the backup stream from a backup software. This means end users have to have a backup solution and add on a deduplication appliance. This is an add-on cost that can be substantial.
- Source-side deduplication: the deduplication takes place at the server or VM to be backed up (source) and only blocks that are not in the central data store are actually sent over the network. The process involves a comparison of incremental changes and what is already in the data store. This is the better way of leveraging de-duplication technology.
Data&StorageAsean: Is deduplication technology relevant as companies virtualise and cloud enable?
Christophe Bertrand: Deduplication not only optimizes the use of storage but also the speed by which incremental backups can be performed. In a highly virtualized context, the auto-discovery and protection of new VMs, for example, is a great way to keep pace with the explosion of virtual machines whether on premise or in the cloud. Deduplication is particularly critical in highly-virtualized environments.
Deduplication is useful on premise or in the cloud. Combining deduplication and advanced disaster recovery techniques that leverage virtualization in the cloud optimizes costs and recovery service levels for those end-users who want to leverage public cloud in their BC/DR architecture.
Data&StorageAsean: Are there any unique features you would like to share about your own deduplication offerings?
Christophe Bertrand: Arcserve Unified Data Protection (UDP) is based on a unique, global source-side deduplication architecture, which leverages a 4k-32k variable length block size to deliver industry leading backup capacity and network bandwidth savings. The ability to deduplicate across all clients in the infrastructure is critical to limiting the unnecessary storage and transfer of duplicate backup data; this is what makes it “global.” Data is deduplicated across nodes, jobs and sites.
Arcserve UDP global deduplication goes beyond the limitations of legacy backup vendors whose deduplication technology only applies to a single logical volume, a single backup appliance target, and/or doesn’t include data that is transferred over the network, thus reducing the benefits of bandwidth and replication savings. With Arcserve UDP global deduplication, the database index is replicated and distributed so that all source and target data is deduplicated across all backup servers. Since backup data is globally deduplicated before it is transferred to the target backup server, only changes are sent over the network, which improves performance and reduces bandwidth usage.
Arcserve’s deduplication performs inline deduplication using a combination of variable block level deduplication and adaptive compression both at source and globally at the target, meaning no extra space is required on the target. Combined with Arcserve’s patented infinite incremental (I2) built-in changed block tracking technology, it reduces the amount of backup data that is transmitted over the network and stored on the backend disk.