Tuesday, September 23, 2008

Re: [BLUG] New OS

True, but what if the kernel already knows that it's a bad memory block. It can allocate the bad memory blocks so that they are not handed out to the userspace programs, thus effectively disabling them. That's exactly how BadRAM works. Coupled with memtest86+, this can be a way around defective RAM modules, albeit not in all cases. The supporting ideology for this is -- Why throw away a chip with a few faulty bits when the kernel can offer you a way around it?

Patches for this are already into the mainline kernel
http://lwn.net/Articles/274649/

 -- Abhishek

On Tue, Sep 23, 2008 at 6:54 AM, Shei, Shing-Shong <shei@cs.indiana.edu> wrote:
It's safer for the OS to panic than continuing to operate.  Here you are
assuming that the memory where OS is sitting on are good.  If this is
indeed the case, then it might be okay to mask the known to be bad ones
out just like those done for disks.  But if unfortunately the bad ones
occur in the area mapped to the kernel (say in the interrupt handling
routines, in the routine that clears the RAM, etc.), ...

Cheers,
Shing-Shong

> Such a task doesn't even seem all that complex in theory. Presumably
> Simón was using paritied RAM, (or some other type of RAM that can
> signal errors). Such RAM throws an NMI (non-maskable interrupt) when an
> error is encountered. Traditional OSes panic at this point. However,
> there is no reason why it has to panic. You know where the error is now,
> so turn off that RAM and keep going.
>
> The theory of operation is sound. It doesn't require RAM to be scanned
> before allocation. (If you clear the RAM when you allocate it, and catch
> the error then, that should be good enough for the general case.)
>


_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug

No comments: