Tuesday, May 18, 2010

Re: [BLUG] new big computer for a lab

Hiya,

I run the computer infrastructure for a psychology lab at IU, and am
actually struggling with related questions right now. But anyway, my
observations are:

Having as many processors/cores available as possible is great. If 4
to 10 people are going to use the server at the same time, this lets
more of them get their work done at once. Right before a paper
deadline, I've seen 3 or 4 people running several MATLAB processes
each, at the same time, and it's great to have the capacity for this.
Of course, our system is really underutilized when that's not
happening, which I guess is an attraction of cloud computing...

We're just starting to run into some difficulties with getting enough
storage space available. If your dataset is going to expand, and of
course it is, it probably makes sense to have some sort of external
disk drive chassis, which connects to your server through iSCSI, SAS,
or something. Many of these systems will allow you to add storage to
your server without any downtime---just stick another terabyte into a
free drive bay. Unfortunately, there are a multitude of ways to do
this, and it seems to be very difficult to figure out using only the
internet. I think I need to consult with an actual expert before I'll
be able to recommend something to my boss.

Are all your users going to be collaborating on the same dataset? If
so, I'd recommend that you keep incremental backups, or better yet use
some kind of version control (svn or git would be best, hg doesn't
handle large files well). We've had a lot of problems with the
granularity of our permissions being too gross, and not being able to
track changes, on account of lacking a real version control system.

> Agreed as well, RAID is not a substitute for backups.  Although a
> question for the group: how to people do backups these days on large
> (multi-TB) data sets?  I assume to hard drive somehow, but when you're
> talking about 10-20TB, that's not necessarily very easy, especially if
> you want to take those backups off-site.
This is not very useful for most people, but I've been using IU's MDSS
--- http://uits.iu.edu/page/aiyi --- for backup storage. It has 4.2
petabytes of storage, on a tape robot system with a hard-drive-based
cache. It has 2 nodes, located in Bloomington and Indy, so it seems
very secure. And it's free for IU researchers. Yay!

I think that other people are using cloud-based storage. For
instance, Amazon S3 charges something like $0.10 / GB to transfer
data, then $0.10/GB/month for storage. So that's about $100/tb per
month for storage and another $100 for transfer. Hmm. That's sort of
expensive, but there are services that will manage this for you (Mozy,
ZManda).

Maybe you could find someone who will swap a few rack units of
colocation space with you, and back up to disk servers on each other's
networks?

Um, I guess I've talked a lot. Good luck, Ignasi! Do you have a set
date that you're leaving for Spain? Have you gone already?
-Thomas


>
> At a former job, I once had the pleasure of being able to buy a server
> from these folks:
>
> http://www.asaservers.com
>
> Very nice machine, at am impressive price.  I didn't stay at the job
> long enough to play with it all that much, but I wish I had...  I
> doubt that you'll want to buy from them if the machine is going to
> Spain, but it could give you an idea of what you can get for the
> money.  Of course, you could check Dell, etc., too.
>
> How do people feel about multi-processor machines?  I know little
> about them, but if you've got a big budget to buy a fast machine, I'd
> think that it'd be something you'd consider.
>
> David
>
>
> On Mon, May 17, 2010 at 07:13:57PM -0400, Jeremy L. Gaddis wrote:
>>On Mon, May 17, 2010 at 11:52 AM, Ignasi <ignasilucas@gmail.com> wrote:
>>> I expect between 4 and 10 people to use it at the same time. Most jobs will
>>> be CPU intensive, but I can also envision some sporadic jobs to require
>>> several GB of RAM. He also wants to store quite a bit of data there, and
>>> maybe host a database. I assume that nobody would use it as a desktop, but
>>> it would be accessed remotely. In principle, the main concern is to make it
>>> a fast computer.
>>
>>Have you considered the potential benefits of a number of smaller servers
>>instead of a single large server?  I'm not familiar with exactly what you're
>>doing, but it be worth investigating.
>>
>>> I've read a little bit about RAID arrays, but never met anybody who used
>>> them. I'm interested in those configurations where data is mirrored, so that
>>> the system can tolerate the failure of one of the disks. And I've been
>>> warned that if all the disks composing the array are of the same brand and
>>> design, more than one may fail at the same time. Do you think RAID is worthy
>>> at all, or not necessary with a good back up system? what is better, an
>>> operating system RAID controller, or a hardware one?
>>
>>If the data is considered important, RAID is a must.  The data on my home
>>servers isn't critical and no financial meltdown will occur if I lose
>>it, but it's
>>important enough to me that I use RAID at home.
>>
>>Steven mentioned hot spares.  I'm also a big fan of hot spares, as the
>>failed drive
>>is automatically replaced by another (hopefully good!) drive and
>>rebuilding begins
>>(almost) immediately.  Without a hot spare, someone must physically
>>pull the failed
>>drive and replace it before rebuilding of the array will begin.
>>During that time, if
>>another drive happens to fails, you will (typically) lose data -- this
>>is dependent on
>>what type of RAID you're using, however.
>>
>>Also, and I can't stress this enough:  RAID IS NOT A SUBSTITUTE FOR BACKUPS.
>>
>>As I said, if a drive in a RAID array fails, you can pull the dead
>>drive and replace it
>>without (online) losing any data.  Heck, I've done just that less than
>>an hour ago.
>>If, however, your data becomes corrupted/accidentally deleted/etc., RAID is not
>>going to help you out a bit, and you're going to wish you had those backups.
>>
>>--
>>Jeremy L. Gaddis
>>http://evilrouters.net/
>>_______________________________________________
>>BLUG mailing list
>>BLUG@linuxfan.com
>>http://mailman.cs.indiana.edu/mailman/listinfo/blug
> _______________________________________________
> BLUG mailing list
> BLUG@linuxfan.com
> http://mailman.cs.indiana.edu/mailman/listinfo/blug
>

--
http://resc.smugmug.com/

_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug

No comments: