| ARTICLES | September 2006 |
New archiving software
makes unstructured data less expensive to manage
If you had large amounts of data, most of which was never accessed, would you store it on your most expensive tier-1 storage? If most of this data never changed, would you deploy expensive replication or
mirroring technologies to constantly make and update extra copies of it? Would you intentionally slow down the response times of your environment on a daily basis to make updated backup copies of this unchanged data, accumulating up to 27 identical backups on tape or disk volumes before you began to re-use the media? Would you hire and pay people to spend their time collecting these new backups of the same unchanged data in order to send the data to an offsite vault where many other backup copies of the same unchanged data already existed? The answer, if I describe it to you this way, is probably, “No!”
Yet this scenario – embarrassing when you see it from the perspective of un-accessed or static data – is exactly what most IT organisations do. And they shouldn’t be ashamed of how they arrived at this state. The last few years have seen a healthy commitment on the part of many industries to create reliable business continuance and disaster recovery capabilities. Making multiple copies of data on alternate media and locations is the essence of good storage management. Tier-1 RAID-based disk storage, mirroring and replication infrastructure, and frequent backup regimens are all vital strategies for copying data to alternative media and remote locations to ensure recoverability and protection.
Too expensive ... no value
The problem facing IT organisations now is that these technologies are either too expensive or else of no value for that portion of data which is not accessed or static. And, unfortunately, that is a lot of data. A recent independent survey in the UK by BridgeHead Software indicates that IT organisations believe that over 60 per cent of their data is never accessed and will, likely, never be accessed. Clearly, it is wasteful to dedicate RAID arrays and other technologies for mirroring or replication to this class of data. In the event of failure, this large percentage of the data is not going to be pertinent to recovering operations. Furthermore, static data, having been copied a few times, need not be part of the extremely resource-intensive programs for keeping multiple dynamic data stores in synch and up to date.
Many IT organisations that have invested in state-of-the-art storage infrastructure have begun to appreciate this distinction. As the amount of data they manage exceeds their deployed tier-1 storage capacity, they have a choice of either buying more expensive storage or devising a way to get un-accessed and static data onto less costly infrastructure. Considering that the overall cost of high availability storage arrays is not actually falling as fast as the decline in cost of raw disk storage might suggest, option two is becoming more and more attractive.
HSM and Archiving
For unstructured data, two technologies – Hierarchal Storage Management and Archiving – provide the best options to automatically identify un-accessed and static data and re-position that data without manual intervention through automated rules to more appropriate secondary storage. HSM or ‘infinite disk’ software emerged for Windows and Open Systems platforms in the mid-1990s as a common way to accomplish moving unstructured file data to secondary storage while maintaining the illusion through file header stubbing (reading it back from the archive transparently upon access) that the data is still on primary storage. However, today’s demands are often beyond the scope of what most HSM tools were designed to deliver. For that reason archiving software systems have rapidly become recognised as a better fit since they provide more sophisticated management of the secondary storage repository, can benefit both unstructured and structured data, and provide options to either fully migrate or simply copy data.
Since full-featured archiving products are relatively new to the market, it is worth outlining the range of functionality that IT users can expect. Archiving software should allows users to create multi-copy/multi-media repositories to ensure that un-accessed and static data can be secured and made available in a number of different failure or loss scenarios while eliminating the significant cost of regular backup or replication of the data.
A typical archiving repository topology might include a copy of data on local tier-2 disk storage for relatively rapid access, a second copy on a removable WORM volume (such as optical or tape) for safe, long-term vault storage, and a third copy on tier-2 disk storage at a remote site where the data could be immediately available as part of the disaster recovery regimen. Archiving software should also be capable of populating newly emerging CAS (content-addressable storage) devices such as EMC’s Centerra. Having secured data in a multi-copy archive, the archiving software can then apply automated rules to analyse the attributes of the original data copy and optionally stub it or explicitly migrate it to the archive where it might be accessible in its own ‘my archive places’ folder. Archiving products should even automate other data management tasks such as compression in place or deletion – the ultimate cure for un-accessed files!
Compliance
Of course, the most distinctive advantage to archiving software is its ability to ensure regulatory compliancy for how data is stored over the very long term. Compliant archiving software should help ensure that data cannot be modified, enforce access controls, provide audit trails, support encryption and provide management systems for the removable media that contain archived data, overseeing very long-term media migration schedules and tracking usage and maintenance so that the media is likely to fully survive its expected lifespan.
The benefits of this technology are significant. Users will be able to avoid additional tier-1 storage acquisitions, re-purpose and better use their existing older storage capacity for secondary storage, drastically reduce the amount of data that is routinely replicated, mirrored or backed up, and gain powerful new compliancy functions that could protect against lawsuits or regulatory action. If the “No” you answered to the questions in the first paragraph were wishful-thinking, archiving technology may provide you with the biggest savings potential of any decision you make over the next few years.
Bridgehead Software is exhibiting on stand 520 at Storage Expo 2006 the UK's largest and most important event dedicated to data storage. Now in its 6th year, the show features a comprehensive free education programme and over 90 exhibitors in the National Hall, Olympia, London from 18 - 19 October 2006 www.storage-expo.com
BridgeHead Software. www.bridgeheadsoftware.com/home/


