Tuesday, September 23, 2008

Re: [BLUG] New OS

Abhishek,

Like I said, either the OS or memtest86+ needs to have a known to be
good area to start with. They won't even be able to function properly
if the memory they loaded into are bad.

Shing-Shong


> True, but what if the kernel already knows that it's a bad memory
> block. It can allocate the bad memory blocks so that they are not
> handed out to the userspace programs, thus effectively disabling them.
> That's exactly how BadRAM works. Coupled with memtest86+, this can be
> a way around defective RAM modules, albeit not in all cases. The
> supporting ideology for this is -- Why throw away a chip with a few
> faulty bits when the kernel can offer you a way around it?
>
> Patches for this are already into the mainline kernel
> http://lwn.net/Articles/274649/
>
> -- Abhishek
>
_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug

Re: [BLUG] New OS

Did someone say "Nexuiz?" :)

Simón Ruiz wrote:
> On Mon, Sep 22, 2008 at 3:48 PM, Shei, Shing-Shong <shei@cs.indiana.edu> wrote:
>
>> I am afraid that this is not a correct assumption. Different OSes has
>> different way of allocating/using memory. Probably it's just lucky that
>> Ubuntu had not touched the bad memory block. --SS
>>
>
> I most certainly concede "perhaps", as I definitely don't know what
> I'm talking about with any certainty.
>
> I find "probably" too strong a word, though.
>
> You see, I'm a memory hog and find myself regularly forcing Ubuntu to
> swap if I'm in a system with less than 2GB of memory.
>
> I played memory-intense 3-D games (Nexuiz, anyone?) and routinely ran
> the GIMP on many large photos at once, while keeping Firefox open with
> my customary zillion tabs.
>
> If Ubuntu didn't touch the bad memory blocks, I find design more
> probable than luck.
>
> Though I do concede that I don't really know what I'm talking about.
>
> Simón
>
> _______________________________________________
> BLUG mailing list
> BLUG@linuxfan.com
> http://mailman.cs.indiana.edu/mailman/listinfo/blug
>

_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug

Re: [BLUG] New OS

True, but what if the kernel already knows that it's a bad memory block. It can allocate the bad memory blocks so that they are not handed out to the userspace programs, thus effectively disabling them. That's exactly how BadRAM works. Coupled with memtest86+, this can be a way around defective RAM modules, albeit not in all cases. The supporting ideology for this is -- Why throw away a chip with a few faulty bits when the kernel can offer you a way around it?

Patches for this are already into the mainline kernel
http://lwn.net/Articles/274649/

 -- Abhishek

On Tue, Sep 23, 2008 at 6:54 AM, Shei, Shing-Shong <shei@cs.indiana.edu> wrote:
It's safer for the OS to panic than continuing to operate.  Here you are
assuming that the memory where OS is sitting on are good.  If this is
indeed the case, then it might be okay to mask the known to be bad ones
out just like those done for disks.  But if unfortunately the bad ones
occur in the area mapped to the kernel (say in the interrupt handling
routines, in the routine that clears the RAM, etc.), ...

Cheers,
Shing-Shong

> Such a task doesn't even seem all that complex in theory. Presumably
> Simón was using paritied RAM, (or some other type of RAM that can
> signal errors). Such RAM throws an NMI (non-maskable interrupt) when an
> error is encountered. Traditional OSes panic at this point. However,
> there is no reason why it has to panic. You know where the error is now,
> so turn off that RAM and keep going.
>
> The theory of operation is sound. It doesn't require RAM to be scanned
> before allocation. (If you clear the RAM when you allocate it, and catch
> the error then, that should be good enough for the general case.)
>


_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug

Re: [BLUG] New OS

It's safer for the OS to panic than continuing to operate. Here you are
assuming that the memory where OS is sitting on are good. If this is
indeed the case, then it might be okay to mask the known to be bad ones
out just like those done for disks. But if unfortunately the bad ones
occur in the area mapped to the kernel (say in the interrupt handling
routines, in the routine that clears the RAM, etc.), ...

Cheers,
Shing-Shong

> Such a task doesn't even seem all that complex in theory. Presumably
> Simón was using paritied RAM, (or some other type of RAM that can
> signal errors). Such RAM throws an NMI (non-maskable interrupt) when an
> error is encountered. Traditional OSes panic at this point. However,
> there is no reason why it has to panic. You know where the error is now,
> so turn off that RAM and keep going.
>
> The theory of operation is sound. It doesn't require RAM to be scanned
> before allocation. (If you clear the RAM when you allocate it, and catch
> the error then, that should be good enough for the general case.)
>


_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug