PK Gupta, Senior Director, Specialty Presales & Presales Strategy, EMC Asia Pacific and Japan
DataStorageAsean: Archive is an old technology why is it becoming hot once more?
PK Gupta: Backup and archive both play a role in protecting data – though their purpose might differ. Backup should be treated as a short-term data protection for operational or disaster recovery as it is in proprietary format. On the other hand, archiving for both unstructured and exported structured data at a granular level for continuous protection can last for years in its native format.
Archive is becoming hot for many reasons. First, the world is becoming more and more virtualized, and moving to cloud. It is best to have image level backups of most virtual servers to quickly bring up the same from PBBA and restore it in a fraction of the time. Second, organizations are becoming increasingly digital; data growing 10x to petabyte scale by 2020 is making backups very difficult. Thousands of new regulations around the access and protection data are being created every year, requiring data to be retained longer and longer. Archiving data before backup is the only way, and for these reasons, we see it now becoming a more talked-about topic.
DataStorageAsean: Is archive just about file data or is it much more than that?
PK Gupta: Archive is not just about file data anymore. With rich media data growing much faster, archiving object data is a much bigger use case for archive. This includes the archiving of structured, semi-structured, and unstructured data such as emails, chat messages, RDBMS, flat files, zip files, documents, PDF and others including data from legacy solutions. Even for legacy applications that have been moved from a physical host to a virtual one, the extraction of data also needs to be archived along with data to fulfill regulatory and compliance requirements.
The enterprise data footprint is expanding so fast that it is no longer just about ERP, CRM, Supply Chain and other traditional systems of record. Data is being streamed from various social media such as Twitter, Facebook, blogger and other platforms, and is being generated by a myriad of devices from mobile devices to Fitbits to IOT sensors. There are new kinds of data from new sources being created every second – resulting in the need for real time analysis to discover both potential problems and opportunities. In a world of buyer’s protection where not only the return policy is important, it takes capturing feedback, monitoring and analysis in real-time to take the business to the next level. Capturing and analyzing all this data to extract knowledge requires capturing a variety of data types in huge amount at the speed of search and retrieve the data for analysis at the lowest cost and this is next to impossible through tradition backup approaches. Archive is no more an optional solution. With service-oriented architecture becoming essential, it is now a must-have solution.
DataStorageAsean: How about archive medium? Does tape still have a place or is it all about cloud?
PK Gupta: With massive growth of unstructured, semi-structured or even structured data, continuous archive with native format and indexing is must to adhere to business regulatory and compliance requirements. Challenge is archive storage platform must be able to support smallest to largest amount of data growth with ease. Again, it should not be so expensive that it nullifies primary storage or backup saving or complexity. With that, there is either cold storage such as tape technology or high-capacity high-density disks solutions.
From the cost perspective, though tapes are the cheapest media to archive huge amount of data, it brings with it the complexity of management and access. There is no way to push or pull archive data to tape directly without any third party software which crawl, pull and write data in proprietary format on tapes. Software should be not only be intelligent enough to handle hardware as tape drives and robotic library but also tapes wear and tear via copying on multiple tapes. This just adds one more layer of complexity on archive software itself instead of easing overall archive architecture.
While disk-based archive will not be able to compete with tape technology on cost, it has many more advantages that can easily justify overall archive to disks storage. First, it is much faster to archive and restore data using disk than tape. Second, it is easier to index and search archives on disk.
In summary, though tape technology is cheapest, it does not justify archive storage in comparison to various disk-based storage technologies. In particular, object storage with lowest waste on RAIN/RAID protection using Erasure code ensure data protection for the longest period of time with infinite scale and analytics to store massive objects and search with ease.
DataStorageAsean: How about Archive as service is that a viable model?
PK Gupta: As object storage can spread with single global name storage, it is easy to deploy archive as a service (AaaS) either in private or public cloud. Even object data can spill over from private to public cloud with hybrid model based on information life-cycle management for longer duration of time. As cloud moves to a more mature phase, most customers have already started evaluating user level services over the cloud such as emails and document management. Some examples are Microsoft Office365, Google document service, Drop box. As most charge based on capacity utilization, it makes sense to archive that to lower tier storage say object storage using HTTP based RESTAPI. This gives businesses the benefit of data extraction from primary expensive cloud service and saving to independent archive cloud provider that focus on better security, analysis, indexing for compliance or retrieve.
Cloud based archive as a service will gain benefit with massive scale to serve ratio, reducing cost of deployment and management. As most archive data are generated over a period of time, it is easy to upload unstructured data by thousands of users’ day over day. Once in the cloud archive, data must be easily searchable via metadata; protected from overwrites or tampering; and provide client or legally specified, automatically applied SLA define data retention policies. There should be strong trust relationship that compliance data is in safe hands and protected by sound disaster recovery systems.
Archive as a Service is no longer in early stages of adoption as it has been offered by various public service providers such as Google for saving photos, emails, documents at zero cost to individual user for long time. Social media sites such as Facebook or YouTube also serve as archiving platforms for photos and videos too.
DataStorageAsean: Does the ASEAN region have any uniqueness which lends itself to needing Archive technology?
PK Gupta: ASEAN with 10 different countries is in a very unique position. Though Sri Lanka is not part of ASEAN group, it is also following the same footprint of digital transformation. Each country has its own data retention and data governance requirements ranging from 3 years to 30 years.
With the growing economy, ASEAN as a single entity is the world’s 7th largest economy and 3rd largest market base. With that, there is huge pressure on banking, finance and insurance industry to keep most transaction as long as possible, which is normally 7 years, or more. This does not limit to only financial institutions, most enterprises and SMEs too need to keep their transaction as record for future analysis.
As part of their commitment to establish an integrated ASEAN Economic Community, the AEC, the ASEAN countries had all agreed to develop best practices and guidelines on data protection and the indicators are clear. Data protection regulation in the region will increase in coming years. The ability to keep up with these changes may make – or break – business enterprises with regional ambitions.
The Personal Data Protection Act (PDPA) is already in-place in Singapore, Malaysia and Philippines, with markets like Indonesia seeking to deploy the same sooner. Though the rest of the ASEAN members may not be ready yet, we are expecting them to follow suit in the near future. With that, it is more on Cloud and Internet service providers (CSP and ISP), and on the banking and finance industry to protect both users and business data in more secure manner such that it does not break PDPA while ensuring that data is easily accessible for regulatory or compliance requirement by government agencies.
DataStorageAsean: What is unique about your own offering?
PK Gupta: EMC provides high performance archival solutions that are scalable, secure, robust and cost effective especially for big data archiving. Archived data can be kept “warm” for easy access and analytics using both software and hardware under various scenarios such as pull, push, ftp, sftp, indexing, searching; and tier architecture such as production storage to archival storage to cloud based archive storage. There are archival solutions in the market today, such as EMC’s that help process structured, semi-structured, and unstructured data through easy archival of RDBMS, flat files, zip files, MS Office Word documents, PDF, and others including data from legacy solutions.