4 Important Considerations when Designing Data Storage Solutions
Data is at the heart of your Information Technology systems – without it, your computers would just lie dormant and you’d be out of business. For many years, network engineers and IT support staff have simply added more capacity when required, selected faster drives if budget permitted, and implemented backup systems to protect against data loss. With modern technology including Solid State Drives (SSD), cloud backups, real-time time data compression appliances, and Automated Tiered Storage (ATS) far more sophisticated, efficient, and performance-oriented solutions are emerging.
So what are the key factors to consider when designing a data storage solution?
Performance
There are a number of issues that affect performance. The goal ultimately is to ensure maximum responsiveness to application and user demands at an appropriate cost. Solid State Drives, for example, have none of the latency that is inherent with conventional hard disk drives and have incredibly fast seek times. However, they cost many times more per gigabyte than HDDs, which in turn cost more per GB than tape storage. Whilst we might dream of having arrays piled to the ceiling with thousands of Terabytes of SSD storage, it’s not financially viable to do that, and in their current format SSD drives fit less storage into a 3.5″ form factor space than HDDs.
A well-designed storage solution could therefore consist of an array based on multiple SSD, HDD, optical, and tape media, and ATS (Automated Tiered Storage) technology can assess the requirements of your applications to shuffle data between the various media according to your pre-defined rules and based on evaluation of performance metrics. This enables your system to have the most frequently accessed data stored on high-performance SSD storage, whilst data which is in less demand is stored on conventional HDD, and data which is very infrequently used could be migrated on to optical or tape media. The result is a cost-effective solution which ensures that files needing higher performance are, on the whole, available from the SSD media.
Another solution to enhance performance might be based upon a RAID array which writes a single file in stripes across multiple drives, thereby allowing the read and write operations to occur in parallel via the heads of all the drives at the same time. RAID configurations are described as RAID 0 – RAID 6, and not all RAID configurations will enhance performance. Further, the nature of the types of files can influence which configuration you would select – for example, if you were processing large terabyte-sized video footage files your objective is high performance sequential access of data blocks, whereas thousands of random database accesses would be better suited to another form of RAID configuration.
Cost
As discussed above, performance comes at a cost. Frequently the performance considerations are closely interwoven with the cost drivers. If cost were not a factor, SSD would probably wipe out HDD very quickly! Therefore most of the performance discussion above can be applied equally to the cost arguments.
Redundancy
Having plenty of high-performance storage capacity is all great – until it fails. This is when redundancy becomes important. Redundancy refers to the ability of a system to have a fail-over mode in which it can continue to function in spite of a component failure. This might occur when an entire hard drives fails, or could also occur when a single bit of data becomes corrupted.
One of most basic systems to ensure storage redundancy is ‘mirroring’. In this situation, two identical storage sub-systems are configured as a master and slave, with the slave receiving an exact replica at all times of the master system. If the master were to fail, the slave could pick up and continue working whilst the master system is repaired.
However, as the volume of storage required by an organisation grows, mirroring every terabyte of storage can become too expensive to be feasible. So storage technology companies developed a range of sophisticated solutions. One of the most common over the past two decades has been RAID 5 hard disk arrays, in which a collection of at least 3 drives, and frequently as many as 5 or 6, are grouped together to form a single virtual device with in-built error protection using parity data to cross-check against errors in the system. In a RAID 5 array data is ‘striped’ across each of the drives in the array then a final ‘parity bit’ (a mathematical check on the previous string of data) is written to confirm the validity of the data. When either reading or writing the data, this parity bit is used to check and verify the accuracy of the data.
Most RAID 5 arrays are built with hot-swappable hard disk drives, allowing for a faulty disk drive to be removed from the system whilst it’s still powered up, and a replacement unit installed. Whilst the faulty drive is missing the system can calculate the absent data and, with slight performance degradation, allow the system to continue working uninterrupted. Once the replacement drive is inserted back into the system, the RAID controller will automatically rebuild the missing data on the new drive and the system will then return to normal functioning.
Other factors to consider in designing a redundant storage solution (yes, it’s bad grammar – but it’s the accepted IT terminology!) are not only the drives but also (in mission critical situations and scenarios like high-end data centres) dual controllers, power supplies, and even redundant infrastructure such as AC power, air-conditioning, and network routing.
Data Security
Again, there are a number of angles to data security. At the most obvious level, you can’t afford to lose your data. Perhaps some more so than others, but as a general principal you need reliable and effective data backups. For decades, the most common form of high-capacity backup has been removable tape media and this continues to be at the heart of most mission-critical data backup solutions. Tape libraries and/or robot tape loaders ensure that there is adequate storage capacity and that tapes are swapped over as required. Backup procedures also provide for the secure handling of these tape, and off-site storage as required. Using Grandfather-Father-Son (GFS) or other rotation schedules, companies ensure a solid historical record of data going back for months, even years, enabling a company to ‘roll back’ if necessary to a data scenario prior to a given event (e.g. the infiltration of a virus).
However, there a wide range of other considerations nowadays as well. One of the most important is the loss or theft of data from a company. With the advent of portable/external USB-attached storage, flash media, notebook computers being used outside your office, and even company staff accessing the corporate information systems from their own home computer, the potential for data to be removed or corrupted is increasing dramatically. Your data security considerations may therefore include issues such as data encryption, encrypted pocket drives, policy-driven encryption of company laptops, audit trails and access restrictions on remote access of data, and even policy-based locked down and disabling of removable media on desktop computers.
Using cloud-based backups, companies are now finding new and creative solutions to ensure that a backup of their data is held offsite without requiring the physical transport of it or storage at a staff member’s home.
Hopefully the above has helped you recognise and start to evaluate your storage requirements in a more sophisticated manner. Are there issues that I haven’t discussed that you would like me to cover? Are there issues you’d disagree with? Or would you just like some advice on a particular scenario you’ve encountered? Please do comment on this post and I’d be delighted to respond, or email philip.brookes@techeffectiv.com for more specific advice.




