bnet

FindArticles > Computer Technology Review > Feb, 2003 > Article > Print friendly

Archival data has a new mission: Critical; it's not what it used to be

Fred Moore

Having chaired two panels at storage conferences in January, it was clear in both that the category of archival data is quickly demanding a new wave of focus. For years, archival data was used to describe data that was in the long-term, decreasing value stage. As customers and vendors now identify their more pressing storage needs going forward, the amount of data in the category of "long-term retention" is being viewed differently than in the past. Historically, when data reached archival status, it had reached its final state before being deleted ending the data lifecycle. Archiving almost always assumed that the value of data decreased as it aged. This is no longer the case. Recently, data lifecycle management has taken on renewed emphasis and at times it seems like all data is critical. The second-wave of archival storage management is underway.

Many organizations are now facing increasing regulatory pressure to comply with federal mandates for email, medical, insurance, legal, financial and government classified data. In addition, over half of the digital data being generated annually today (this is approximately one exabyte or 1x[10.sup.18]) now falls into the category of "fixed content," meaning that the data doesn't change after it is initially created. Fixed content is sometimes referred to as "reference data" "rich media," or archival data. Fixed content includes storage intensive applications such as critical business applications data, complex legal and reference documents, medical data, email attachments, blueprints, satellite imagery, security surveillance, check images, and broadcast content, among others which content is seldom if ever altered.

The assumption that older or aged data has lost its value no longer holds for several specific vertical markets. New applications and a variety of legal and business requirements are driving the need for many businesses to re-examine their archival policies. One of the most visible examples of the emphasis on the increasingly critical value of archival data lies with the HIPAA (Health Insurance Portability and Accounting Act) requirements. Not only does HIPAA require health providers to preserve data for a yet to be determined time period, but the failure to protect critical patient data presently carries with it penalties ranging up to $25,000 per violation. Just the threat of the fines and other forms of noncompliance are encouraging storage administrators to make sure that an increasing number of archival data applications will be kept indefinitely for future reference. The PACS application (Picture Archiving and Communications System) that captures and stores radiology information and other types of medic al images is a primary component of the HIPAA requirement. Email archives also fall into this category and face increasing pressure to be retained indefinitely for legal reasons. As a general rule used for common email retention policies, 80 percent of email can be immediately archived. Email will soon require HSM on steroids to meet the archival demands! Given today's legal, economic and political climate, the value of archival data has never been higher.

The increased emphasis for preserving critical archive data requires a different set of storage attributes than did previous archival management schemes.

Archival Storage and Data Characteristics

* Large-scale storage capacity needed, scalable to petabytes (1x[10.sup.15])

* Infinite data retention periods required (measured in years) as the data must be preserved, but not necessarily the media it resides on

* Archive data normally has low access and reference requirements but relatively high data transfer rate (bandwidth) requirements

* Much of archival data is static in nature or "fixed content," unstructured and is stored using a variety of formats

* WORM (Write-Once-Read-Many) capability is increasingly desirable for legal reasons

* Random and sequential access required based on the application

* Delayed initial access time is acceptable (from seconds up to a few minutes)

* Archive data can involve local and remote access (location independent) with many users in many locations

* Needs a data classification taxonomy to enable unique content search and access as some archival searches can cost six figures

* Multiple copies of archival data are needed given the criticality and increasing value of data

* Device security and data security (intrusion protection, authenticity) are required for archival data management

* Archive data requires its own policies consistent with regulatory practices for each industry category

The data lifecycle is traditionally described as having four distinct categories. In each case, we continue to observe that the probability of reuse of data decreases as data ages. In the past, the value of data most often decreased as data aged.

1) The active cycle -- this period often lasts for 30 days, typically disk storage (P=>.5)

2) The reference cycle -- this typically lasts for 60 days, typically disk and automated tape storage (P=>.1)

3) The archive cycle -- this period often lasts up to seven years, typically automated tape though the new class of archival disks are gaining momentum such as the 160Gb and 320Gb ATA disks for fixed content storage (P=<.01)

4) Destroy/delete cycle -- historically at the end of seven or more years (P<=.001)

Note: P is the probability that the data, file or object will be accessed during the various lifecycle stages.

Though the first two categories of the data lifecycle remain similar to the past, the last two components are changing. The third component, the archive cycle, is now extending indefinitely and often well past the traditional seven-year window. Less data is being deleted and more data is being kept for longer periods of time.

What does this mean to the storage industry? Digital archives are quickly defining new requirements for storage and its management.

Key data requirements for digital archive management:

* Retention/destruction management

* Audit provisions for tracking and reporting

* Long-term data preservation

* Compliance management for legal issues

* Authentication

* High availability for data and devices

* Advanced search and access capability with unique naming conventions (taxonomy)

* An industrial class HSM (Hierarchical Storage Management) for tiered SLAs

* Renewed use of WORM (Write-Once-Read-Many) functionality as certain data must never be changed

* The large-scale storage requirements mandate low-cost storage, TCO becomes a key consideration as data lifecycles increase

The time of viewing archival storage as the final stage of existence for data is passing. In some cases the value and utility of data is actually increasing as data ages even if the accesses to that data decline. Surprisingly, archive data is possibly becoming the fastest growing segment of the storage industry in terms of storage demand. What a surprise! Many of today's storage intensive applications are instantly creating fixed content and archive data. Applications including voice, text, graphic images, audio, HDTV, 3-D graphics, and movies all create the demand for archival data preservation. New and emerging digital applications will continue to fuel many years of explosive growth for storage as terabyte-plus data-warehouses, VCR to HDTV quality movies, the possible digital cinema, electronic voice and video-mail, digital security systems, and digital photography all will drive major changes in the way we view archival storage. Approximately 10 percent of the digital data produced in the world resides on magnetic disk storage, and an estimated 90 percent of digital storage resides on removable storage media such as tape, optical (CD, DVD) or small-diameter removable disks.

Given this, the storage industry is beginning to view archival data as a much more meaningful class of storage. Though low cost is important, the new requirements for preserving data and making it accessible on a broad scale are real. Again we visit the need and value of a tiered storage hierarchy (and HSM functionality) that differentiates between performance, capacity, retrieval capability and now data protection and security. Rigid disk drives, magnetic tape drives, optical disks, flexible drives and flash memory will all play bigger roles for storing fixed content for a wide class of users. The unique combination of complex objects, along with different availability and bandwidth requirements for archival data pose several new challenges for the storage management industry. The sheer size of fixed content and archive files changes the rules for moving data from place to place as transmission times are surpassing current architectural limits quickly.

How critical is archive data? Can we actually begin to call certain types of archival data mission critical data? In terms of describing data that is mandatory to instantly resume business operations in case of any type of disaster, archival data is probably not classified as mission critical. In terms of the value of archival data to businesses, it is clearly becoming increasingly critical.

COPYRIGHT 2003 West World Productions, Inc.
COPYRIGHT 2003 Gale Group