2021. What an interesting year. With the world turned upside down by a pandemic that seemingly had its sights set on...
Tiered and Policy Based Data Backup
The New Paradigm (Part 2)
Tributary Systems, Inc
DanIn the April issue of NonStop Insider we began with Part 1 of this article and introduced the concept of a tiered policy-based approach to data backup and covered three all-important steps to making the right decision about the best fit for a business. In this second part we look at those three steps in even more detail.
Introduction
The traditional approach to data backup is to develop and implement a strategy that typically leverages one or more techniques intended to safeguard data against corruption, loss, unauthorized disclosure, or denial of access. Ideally, protection techniques are assigned to specific data sets (or pools) supporting specific business processes following careful analysis of the relative criticality of, and requirements associated with, various data pools.
By itself, data is just an anonymous set of 1’s and 0’s; data inherits its importance from the business process that it supports. In business critical environments, companies invest significantly in systems that provide high-availability components, total redundancy of hardware and software, and real time replication and backup to prevent data loss. However, these precautions may not preclude a serious failure. Recovery may not seem to be business critical, until actual data loss occurs.
However, as noted in Part 1 of this article, the approach taken in too many circumstances unfortunately is to “just make a copy” and mission accomplished! No real consideration is given to the criticality of the data, appropriate backup locations for various data pools, required data retention periods, the cost per TB of backed-up data, and backup and restore windows. With the vast amount of data being backed up every second, this “one-size-fits-all” paradigm needs to change.
The Tiered Policy-Based Approach
A thoughtful data center manager or indeed any individual charged with the responsibility for building a data backup solution will determine the appropriate storage location based on the rarity, value, replacement cost and condition of the items being retained. So why should business data be any different? It shouldn’t! The tiered policy-based approach proposed by TSI simply is a reflection that there are choices when it comes to where to store data and that the optimal solution is oftentimes best served by utilizing multiple storage options – tiered to best reflect the trade-offs in accessibility versus cost as is almost always the consideration for CIOs and data center managers of any business.
What follows here in considerably more detail than was included in Part 1 of this article are the three steps businesses need to take in order to come up with the best approach to data backup.
Step 1: Intelligent Backup Data Analysis
It’s a given- all data is not the same. Data derives its importance – its criticality to the business and its priority for access following an interruption event – from the business process it supports. That’s why data management planning must be preceded by the due diligence of business process analysis. Determination of which business processes are mission critical and which are merely important is mandatory. A physical mapping of business processes to their data (e.g., data produced and used by that process) must be performed, and subsequently, of data to its actual storage location within the infrastructure.
This is the crux of data management, and it needs to be done for several reasons:
- All data isn’t the same in terms of criticality or priority to restore, there’s no one-size-fits-all data retention and protection strategy. In most organizations, data retention and protection involves a mixture of services used in appropriate combinations to meet the recovery requirements of different business processes and its data.
- The diversity of threats to data – including bit errors occurring close to the data itself, a storage hardware failure, or a power outage with a broad geographical footprint, occurring outside the storage device– might require different protective services. “Defense in depth” is a term often associated with this reimagining of policy-based data protection.
- The practical issue of cost is a huge factor: one-size-fits-all strategies for data management are generally not very cost effective. Backing up mission-critical data to tape might create problems with the recovery requirements of always-on applications, because data restore might exceed the allowable backup window. Conversely, using an expensive dedup device to replicate the data of a low-priority application or business process would be overkill and a waste of money.
So, getting more granular in the assignment of management, protection and recovery services to data assets – creating policies for tiered data sets based on a thorough analysis of requirements – is the best way to rationalize expense while streamlining and optimizing data backup.
Step 2: Building an Integrated Data Management Strategy
Once the business needs of data are understood and mapped, the challenge is building and maintaining a data management strategy that includes multiple services targeted at multiple data sets.
An analytical effort will be necessary to determine the recovery priorities of business processes (with their associated infrastructure and data), and it is obvious that this effort must be undertaken prior to developing any data protection strategy. A business impact analysis should be performed to identify the current infrastructure and data associated with a given business process, and the impact of an interruption in the services and access to data associated with that process.
The results of the impact analysis will naturally drive the setting of retention and recovery objectives that define the criticality and restoration priority of the subject process and its “time-to-data” requirement (sometimes called a recovery time objective). This in turn provides guidance for the definition of a strategy that applies policy-based data backup and recovery services and techniques to facilitate renewed access to data within the required timeframe and in a manner that fits budgetary realities and minimizes overall costs.
While this analysis will require commitment of usually scarce resources, the results should answer the three questions that must be addressed by any data protection plan:
- What data needs to be protected?
- Where is that data currently archived?
- And finally, what is the best way to protect the data?
Of course, there is truth to the assertion that protecting data is a simple matter: make a copy of the data and move the copy off-site so it is not consumed by a disaster that befalls the original. Almost every data protection strategy provides protection as a function of redundancy, as replacing data following a disaster event is not a feasible strategy.
While there is universal agreement on this concept, different vendors seek to promote different technologies for making the safety copies of the data, each promoting their wares as the one true path to data continuity. We find ourselves locked in a perpetual battle over what is the best technology for the protection of our data.
One possible approach would be to centralize service delivery using storage virtualization, which abstracts value-add services (like mirroring, replication, snapshots, etc.) away from storage arrays and centralizes them in the storage virtualization software “uber-controller.” But to the best of our knowledge no universal storage virtualization product works with the full range of enterprise host servers, local/remote storage sub-systems or Cloud Object storage.
In the absence of such a centralized strategy, another location is needed where data tends to aggregate, hence providing the opportunity to apply tiered policy based data management services. One idea is to employ a Virtual Tape Library (VTL), which has come into widespread use over the past decade.
VTL technology evolved from a location where backup jobs were stacked until sufficient data existed to fully fill a tape cartridge (first generation) to an appliance offering virtual tape devices to expedite backup (second generation). Modern VTLs have advanced to a location where 30, 60, or 90 days’ worth of data is stored for fast restore in the case of an individual file corruption event (one of the more common types of data disasters). VTLs have also been enhanced with additional storage services, including VTL-to-remote VTL replication across WANs, de-duplication or compression services, data vaulting to Cloud Object storage and even as the place to apply encryption to data.
The VTL is already a location where much of the corporate data is retained and where tiered policy-based data management services can be readily applied. It makes sense that these platforms become a “storage services director,” a location where data protection, data management and tiered archiving could be applied to data assets per pre-established policies.
With tiered policy-based data management individual pools of data can be retained in an appropriate, cost effective location based on the criticality of the data to its associated business process. In an ideal, service-oriented, data protection scheme, data from specific business processes/applications are assigned retention policies drawn from a menu of available services, delivered via a variety of hardware and software tools, all in a highly manageable way. This concept is illustrated below.
Step 3: Solution Selection
This cutting edge approach, in which a “storage director” serves as the cornerstone of efficient, tiered policy-based data management, is conceptually simple, almost “common-sense”. However there are multiple important considerations that must be taken into account before infrastructure selection.
- A storage director must facilitate clustering to allow the handling of large quantities of data via clustered nodes, ensuring adequate connectivity for both data sources and converged data storage infrastructure while streamlining the management of both the directors themselves and the tiered policies they implement.
- The storage director device must natively connect to all servers and storage in the enterprise, ideally with no middleware or agents or third party software. This is a key criterion; the storage director must be capable of being dropped into any complex of servers and storage devices, supporting all connectivity protocols, to minimize disruptions and simplify implementation.
- Being policy-based, this solution must enable users to place data from different applications and host servers into data pools with different retention, location and replication policies. This means that data can be retained in the most cost effective manner. This storage director must be storage device or location “agnostic”.
- Secure vaulting (preferably employing built-in encryption) of data to any cloud object storage is crucial. A tiered policy-based device must allow customers to vault appropriate data to the cloud for a second copy or disaster recovery while sending other, perhaps more essential, data to local disk or tape, all concurrently, all securely, with replication rates that meet the requirements of the enterprise.
- The storage director device must be specifically designed for fault-tolerant, high availability computing environments, given the centralized location of such a device in the data stream.
Is this “Storage Director” functionality available today?
Yes, as a matter of fact it is, with Tributary Systems’ aptly named Storage Director!
With Storage Director, enterprises can tier stored data and data policies down to individual data volumes on multiple host platforms based on business criteria and importance to business resiliency and restoration. This is intelligent data management!
In enterprises with multiple host platforms – HPE NonStop NB, NS and now NonStop X servers, HPE Open VMS, Windows and VMware running HPE Data Protector, IBM zOS mainframes, IBM AS/400s iOS (now IBM PowerSystems), among others – Storage Director enables sharing storage technologies otherwise dedicated to each host platform. Such storage technologies can include existing enterprise storage disk, HPE StoreOnce, EMC Data Domain, and Quantum DXI data de-duplication devices, physical tape, Cloud Object Storage, or any combination of storage technologies concurrently, as dictated by individual data management needs. Such a converged approach improves storage performance, enables consolidation, and can lead to measureable savings on a “per TB” of retained data.
In conclusion, there is no one-size-fits-all solution for data protection. A successful strategy typically involves the assignment of a combination of data protection services to the data assets of a given business process based on recovery objectives, technology availability, and budget.
For further information about the Storage Director Appliance, contact me, Glenn Garrahan, Director of HPE Business for Tributary Systems, Inc. at ggarrahan@tributary.com or visit our website: www.tributary.com