Tuesday, July 14, 2009

Re: [BLUG] Large RAID config suggestions?

On Tue, Jul 14, 2009 at 03:37:37PM +0000, Mark Krenz wrote:
> The whole recompression of the array that can take hours or days
> sounds VERY risky. I wouldn't do it unless you are experimenting. Some
> of these non-standard raid levels are just companies coming up with
> new combinations to have an extra feature over the competition, they
> aren't necessarily good things.

Yeah, I don't really see the benefit of this compared to having a
straight RAID5 + hotspare.

I mean, with a hotspare, the array is rebuilt on to the hotspare when a
drive fails. That's one major I/O operation.

With RAID 5EE you have a possibly similar major I/O operation during the
compression, then an additional (possibly similar) I/O operation during
decompression once you add the new drive.

What troubles me is that the system is subject to "a second drive
failure" during *decompression*. That is, it is subject to data loss if
you have a drive failure after you've inserted the hot spare until the
RAID5 finishes becoming RAID5EE. This means you're subject to a second
drive failure being an issue for a longer period than you would be under
RAID5.

An example: Your RAID5EE has a drive failure. It compresses it down to
RAID5. During that time it is subject to a second drive failure. (This
is on par to having a RAID5 with a hotspare. You're subject to a second
drive failure until the hotspare becomes a full member of the array.)
Then you have a happy period where the previously RAID5EE array is now a
RAID5 array and immune to a second drive failure. Then you plug in the
spare drive and -- unlike RAID5 with a hotspare -- you're subject to a
second period of time where the system is subject to data loss if there
is a drive failure.

Can you be sure that the compression and decompression will take less
total time than a RAID5 with a hotspare? This concerns me, as the total
I/O transferred for RAID5EE will be more than the total I/O for RAID5
with a hotspare.

Since the hotspare is spread across all of the drives, and after
compression it becomes a standard RAID 5 array, you're still
transferring a whole disk work of data. It's just, with RAID 5EE you're
doing the full disk worth of data transfer twice. (Okay, not quite a
full disk's worth of data, a full disk's worth of data minus the spare
percentage. Still, twice whatever-it-is is still well more than a disk
and a half worth of data.)

I mean, most hotspare systems treat the replaced drive as the new
hotspare. It isn't like slot 10 (or whatever) is always the hotspare
slot. With a traditional hotspare, you transfer data once. With RAID5EE
you transfer data twice.

> Also, PostgreSQL is different from MySQL when it comes to number of
> files. MySQL has 2 or 3 files per table, whereas PostgreSQL has many
> more files, but it might not get really high. I have a decent size
> database with lots of data and somewhere over 120 tables, and its only
> 2000 files in /var/lib/pgsql. But if you are doing a large database
> like flybase, you might pay attention to how many files you're using
> on the filesystem like this:

With MySQL it depends on the database engine being used. MyISAM (and
probably Maria -- I've not read much about it) uses three files per
table. InnoDB uses one per table plus three shared data files. (There
are additional database engines at this point, too.)

InnoDB and Falcon (I'm not sure how many files it has per table) should
both provide ACID compliance. (I'm not sure about Maria.)

Maria and Falcon are new in MySQL 6. Prior to MySQL 5 you couldn't get
anything close to full ACID compliance due to missing core features.

InnoDB can run in to issues of file-size limits on some systems. This
can be relieved by enabled "innodb_file_per_table" prior to creating
the tables. This results in InnoDB creating 2 files per table plus the
earlier mentioned three shared InnoDB files.

Please tell me you're not using MySQL prior to v5. :)

I'm actually a little surprised by the combined PostgreSQL/MySQL
approach. (Unless, of course, it started with MySQL prior to version 5,
at which point it would be understandable. That was before MySQL had the
features, and before PostgreSQL had the speed.) I would have expected a
consistent product with replication off to one or more slaves that only
accept read operations.

> Its nice talking about sysadmin type stuff on the blug list again,
> we don't do it enough.

Indeed.

--
Steven Black <blacks@indiana.edu> / KeyID: 8596FA8E
Fingerprint: 108C 089C EFA4 832C BF07 78C2 DE71 5433 8596 FA8E

No comments: