Tuesday, July 14, 2009

Re: [BLUG] Large RAID config suggestions?

I would tend to second these recommendations, though I have very
little professional experience and no experience with RAIDs.

However, I have lost lots of data several times. The first time, I
was in high school, and I'd had my father buy me a packaged version of
Mandrake Linux (I'd had no real exposure to linux before that). The
partitioner in the installer crashed and ate my drive. The second
time I was in college, and my roommate bumped my power cable where it
was hooked to the power supply. Both hard drives and both media
drives (back in the days when your DVD drive and CD-RW drive were
separate) were fried, along with the power supply (though my
motherboard etc were fine).

Since then I've kept 3 copies of all (well, most of) my data. I have
a live copy on whatever machine I'm primarily using (right now my Acer
Aspire One), a frequently synched copy on my off-site server at my
parents' house, and a copy on an external drive that I sync less
frequently and leave at home (i.e., when I throw my computer in my
backpack). Having a separate partition for all my data helps a lot,
and rsync makes a simple-but-powerful tool to back stuff up.

Everyone's backup strategy's going to be a little different, but
putting all your drives in one box sounds a little bit too much like
putting all your eggs in the same basket. A brief loss of power isn't
supposed to fry a power supply (and granted, that old power supply
wasn't top of the line), but things happen. Especially when you don't
prepare for or expect them. I think there're some a corollaries to
Murphy's Law about this.

--
Jonathan

2009/7/14 Steven Black <blacks@indiana.edu>:
> Back in my youth (1996 or so) I administered a RAID5 system. (I won't
> say how large it was. I said it was 1996, right? Things were smaller
> then.)
>
> Hot spares are the best invention *ever*. Back in 1996 they were not the
> norm.
>
> It was a simple RAID5 system. All the drives were of the same
> manufacturer. That manufacturer had a bad batch. Before the replacement
> drive arrived we had a second drive failure.
>
> It was an otherwise trustworthy drive manufacturer, too. (I think
> Seagate.) Everybody has a bad batch now and then, and these just managed
> to slip past through.
>
> More recently, I found out that a drive failed in one of my boxes. It's
> an ancient nightmare of a Solaris box, and I fully expected to need to
> type some obscure command to get the replacement drive up and in the
> system. I did a little investigation and found that all the data was
> already on another drive. Though I replaced the drive, the drive I added
> simply became a new hot spare.
>
> Hot spares become much more important when you deal with more data. If
> I have a drive fail at 4am and I have a hotspare I can show up at the
> office at my normal hours. By the time I come in, much of the data has
> already been copied over to the hot-spare. I can then make a support
> call during normal business hours for the replacement drive.
>
> If you don't have at least one hot spare in your system, you need to
> make sure you have 1-2 of the required drives on hand. Yeah, you could
> rely upon your service contract's same-day service, but it's a lot
> nicer to at least have one drive immediately on-hand. If you don't have
> same-day service, you better have a pair of spare drives because you
> might just need them both.
>
> It is important to have off-site backups, though. Not just backups,
> *off-site* backups. You don't want to explain what happened to the
> data when there was a building fire, flooding problem, etc. There are
> problems that strike that can take down your whole machine room.
>
> You also need a disaster recovery plan that goes from a set of documents
> detailing the process, backup media, and money from the insurance
> coverage, and turns that back in to what you have in your machine room.
> (And it needs to be doable by a replacement. -- Assume you've just been
> promoted.)
>
> Just my two cents,
> Steven Black
>
> --
> Steven Black <blacks@indiana.edu> / KeyID: 8596FA8E
> Fingerprint: 108C 089C EFA4 832C BF07  78C2 DE71 5433 8596 FA8E
>
> On Mon, Jul 13, 2009 at 05:35:30PM -0400, David Ernst wrote:
>> Well, I don't think I have anything very sophisticated to say, but I'm
>> inclined to agree with you about the 3x RAID6.  By my calculations,
>> you'll get 18T that way vs. 20T in your other proposal.  I don't know
>> what you're storing, but this is a lot of disk space, so probably no
>> one will mind that sacrifice.  Meanwhile, the RAID 6 option does give
>> a slight emphasis to reliability over performance, as you wanted.  So,
>> basically, I think I'm just saying "your reasoning makes sense to
>> me".
>>
>> I hate to bring this up, but twice in my life I've been affected by
>> the failure of entire RAID arrays... Both were high-quality hardware
>> RAID setups, and people said of both failures "this is supposed to
>> never happen."  In short, I recommend some other kind of backup in
>> addition to the RAID, because things happen, and if your organization
>> is concerned enough with reliability to consider RAID 6, I wouldn't
>> assume that something like this would never happen.
>>
>> David
>>
>>
>> On Mon, Jul 13, 2009 at 02:45:54PM -0400, Josh Goodman wrote:
>> >
>> >Hi all,
>> >
>> >I have a 24 x 1 TB RAID array (Sun J4400) that is calling out to be initialized and I'm going round
>> >and round on possible configurations.  The system attached to this RAID is a RHEL 5.3 box w/
>> >hardware RAID controller.  The disk space will be used for NFS and a database server with a slight
>> >emphasis given to reliability over performance.  We will be using LVM on top of the RAID as well.
>> >
>> >Here are 2 initial configuration ideas:
>> >
>> >* RAID 50 (4x RAID 5 sets of 6 drives)
>> >* RAID 60 (3x RAID 6 sets of 8 drives)
>> >
>> >I'm leaning towards the RAID 60 setup because I'm concerned about the time required to rebuild a
>> >RAID 5 set with 6x 1 TB disks.  Having the cushion of one more disk failure per set seems the better
>> >route to go.  I'm interested in hearing what others have to say especially if I've overlooked other
>> >possibilities.
>> >
>> >I'm off to start simulating failures and benchmarking various configurations.
>> >
>> >Cheers,
>> >Josh
>> >
>> >_______________________________________________
>> >BLUG mailing list
>> >BLUG@linuxfan.com
>> >http://mailman.cs.indiana.edu/mailman/listinfo/blug
>> _______________________________________________
>> BLUG mailing list
>> BLUG@linuxfan.com
>> http://mailman.cs.indiana.edu/mailman/listinfo/blug
>
>
> _______________________________________________
> BLUG mailing list
> BLUG@linuxfan.com
> http://mailman.cs.indiana.edu/mailman/listinfo/blug
>

_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug

No comments: