Search This Blog

Tuesday, July 25, 2017

There is nothing like an Enterprise grade solution for a Consumer grade problem

In current times, there is an ever increasing separation between the consumer and physical storage media as a product: with the generalized increase of Internet availability and bandwidth along with massification of server side infrastructures to support storage and other services (e.g. from providers such as Dropbox, Google, Amazon, Microsoft), the user tends to replace the physical storage by the convenience of a cloud storage solution.

While it is not entirely clear that this will indeed be the trend, it is clearly apparent that local storage will tend to have a thinner role in the fabric that lies between the user and the preservation of digital data.

However in current days, the consumer, or more likely, the "prosumer" still lives in the borderline, where by one hand he has a great demand for a reliable and dense storage solution, but on the other hand still cannot see cloud storage as the solution to embrace, given the constaints that still poses: cost for large amounts of data, and the bandwidth required for effective transfer of very large files in acceptable lengths of time. This is the case for videographers and content creators, who constantly produce large amounts of raw footage and various versions of edited work that has to be archived for later use.

Taking cloud storage out of the equation, this leaves a limited number of choices for the "prosumer":

  • hard disks - have a reasonably large density and bandwidth, but are relatively delicate and reliability tends to drop abruptly after 3 to 4 years, if subject to intense use, according to several studies performed by the industry;
  • flash memory - has a high bandwidth and consistently low access times (just as RAM, individual memory cells can be accessed in nearly constant time). On the other hand, write operations wear out the media, limiting its longevity under intense rewrite use. Also, the integrity of the information is not guaranteed to persist after more than 3 years in the shelf;
  • optical media - has very limited density (by todays standards), e.g. a single-layer writable Blu Ray disk (BD-R) capacity of 25 GB is since long surpassed by the very accessible 32 GB SD cards. Storage life is good, but depends on the dye chemistry employed. M-Disc type disks are advertised (and some studies appear to support it) to last thousands of years, but these are written in a different method, where the laser beam actually carves pits in an otherwise difficult material to geometrically modify;
  • magnetic tape - in spite of being one of the most traditional storage technologies, it is still favoured and seen by the industry as the most important backbone for long term data storage. From small companies, to the largest enterprises in the Internet, it is present in every data center. It suffers from long seek times, for the obvious fact that tapes take some time to rewind and fast forward, but it achieves data transfer rates that equal or even exceed some performance hard drives. In spite of the high cost of the drive (on par with most enterprise grade hardware), the tape media is very cheap and the capacity competing with the best removable hard-drives (e.g. 30 USD for an LTO-6 2.5 TB tape).
Classifying myself as a "junior" prosumer, I stumbled upon the described storage challenge, from the moment I started to accumulate footage and photos both from leisure and from my modest dimension of content creator. Not only for having enough physical disk space to keep on producing content, but also because of having some redundancy in case of a catastrophic disk failure.

That is when I started to look at what options exist in the market, and realizing that essentially there are no consumer oriented tape storage products. There were options in the past, such as the Iomega Ditto tape drive, which in its latest models reached the 10 GB per tape.

Left without consumer grade options, I went to see how viable would it be to buy a tape drive that is commonly found in data centers, which is the case of LTO drives.

I quickly found that brand new, these devices cost north of 3000 USD, a bit too much for my modest consumer budget (which if it existed could well be used to go to the supermarket and buy 30 x 2 TB hard drives).

Taking the tape drive cost out of the equation, the cost per GB is very appealing, even for smaller LTO-4 tapes, which cost around 25 USD and can store 800 GB of uncompressed data.

So instead of buying brand new, I looked up the market of used tape drives, and decided to take the gamble of buying a  LTO-4 drive that would be at least two orders of magnitude below the original price. I went for an IBM LTO-4 drive (full height). For 78 USD + shipping, I got from eBay a complete unit that was taken out from a data center Dell tape library. The unit came enclosed in the aluminium sled from the tape library:

Also, this sled contained the board that converts the standard 48 Volts to the voltages required by the drive (12 V DC and 5 V DC):

Besides the voltages, this board would be responsible for establishing an interface between the tape drive RS-422 interface and the tape library CAN bus, for "out of band" control of the tape drive without intervention of the host. For a seemingly simple task this board has rather sophisticated combination of hardware: a Freescale MC68HC16Z 16 bit microcontroller, a 1 MByte flash memory chip, and 128 of static RAM, not to mention other smaller components such as a Microchip CAN transceiver.

However installing a beast of a drive like this in a desktop computer, is not exactly a walk in the park. The first challenge: interfacing with the host - this drive features a Fiber Channel interface (a sophisticated optical link standard designed to interconnect storage devices in what is known as a Storage Area Network), which means that the host computer would need to have a Fiber Channel Host Bus Adapter (HBA).

Again, decommissioned enterprise grade hardware can be surprisingly cheap: shelled out 10 GBP and got myself a 4Gbit PCI Express QLogic HBA:

And between the two, of course, a fiber optic jumper cable was necessary. Another 5 euros paid 2 meters of LC-LC jumper cable.

As it wouldn't be perfect without a third challenge, so it came the power supply aspect: I had a dual 5V @ 2A  / 12 V @ 2A  power supply, but being the beast that it is, this drive would not be happy with this amount of juice, so a larger (at least 5V @ 4 A / 12 V @ 2A) was needed. The only option I found was buying (for another 32 Euros), a switching power supply module at a local electronics store.

As this is only the bare module, had to build a custom enclosure to safely operate it in a home environment.

Finally, the drive had to be housed in a proper enclosure. Had to modify a CD-ROM drive enclosure I had, to support the full height drive:

Also, had to be able to cram the original fan into this enclosure (yes, this drive produces quite a bit of heat - powerful reel spinning motors and realtime compression + encryption hardware adds to the party). This stuff has to encrypt and compress data on the fly, at a whooping 120 MBytes / second!

A bit of cutting and stitching and was able to adapt the back panel from the original sled, into this back of this enclosure.

At last it came the moment of truth, when I had put it all together and turn the thing on.

One curious aspect is that this does not boot up almost instantly like a hard drive or a CD-ROM drive. This device takes north of 20 seconds to initialize, while showing some animation in its single digit display, while doing so.

Then in the first test the computer simply wouldn't detect the drive. Installed SANsurfer, and nothing. After exhausting what I thought it was a comprehensive list of aspects, i.e.: checking that the HBA was operating correctly (the HBA diagnostic tools go to the level of having information about the current temperature of the optical transceiver, along with other aspects such as TX and RX power),

checking and playing with the configuration of the HBA (this thing even has code that loads on the PC during bootup, allowing to configure the board in a BIOS type menu system), checking that the fiber optic jumper cable was not broken, etc.

Until, at some moment when my hope was almost gone and after have found that only the HBA would emit IR light from its emitter, I looked at the rear of the drive and realized that there was a second slot, for another FC transceiver. With limited hope that it would be of any use, decided to swap the transceiver, to the second slot.

Initially nothing happened. Albeit the transceiver IR light was now on (a good sign), the drive still was not detected.

That was when I read one of the IBM manuals, and found that these drives only support the Arbitrated Loop and Switched Fabric modes. My HBA was in point-to-point mode. Changed the configuration, and voilá, drive being detected.

Ran the IBM Tape Diagnostics Tool, and was able to run some tests. All tests passed, in spite of the suprise of verifying that drive had quite a bit of mileage already (literally):

 PAGE                                : 14: Device Statistics

 #NO  PARAM CODE                PARAM VALUE 
 1    Lifetime Media Loads      6215
 2    Lifetime Cleaning Op.     86
 3    Lifetime POH              73322
 4    Lifetime MMH              10170
 5    Lt Meters Tape Processed  76506254

76506 Km of tape means that this drive have seen more tape than my car have seen asphalt. 

Another statistic, the POH (Power On Hours), means that this drive has over 8 years powered on,

And the MMH (Media Motion Hours), means that the drive have been moving tape for a total of 423 days.

Ran a 600 GB backup (took roughly 2 hours) and finalized it without any errors. 

In spite of the age, assuming that the MTBF figure is reliable (250 000 hours), it still should have many years of life before failing..lets backup files and see..

This is just some of the information, as this drive produces a dump with a comprehensive set of statistics and other information (relevant for the tape library and Sys Admins).