Wednesday, September 24, 2008

Re: [BLUG] New OS

Yes, some initial area of the kernel needs to be on known good RAM.
I do not disagree here.

But how much is really needed?

You talk like the space the kernel itself sits on doesn't also get
allocated. It is true, it doesn't get allocated in a traditional
sense, it gets claimed (there is no asking), but it is also true
that there is flexability in the physical address spaces used.
The kernel itself uses memory mapping.

This means that:
1. Unless you're using an OpenBIOS, there is no way to get your
BIOS to know about your bad RAM. RAM shadowing the BIOS at
bootup needs to be good.
2. The BIOS loads the boot sector from the fixed disk. This
512 bytes needs to land on a good region of RAM.
GRUB calls this "stage1".
3. Logic to read the filesystem and continue booting needs to
find a good region of RAM. GRUB calls this "stage 1.5".
4. The core boot manager logic also needs to land on a good
region of RAM. This probably couldn't be worked around
because it is where the configuration is processed --
by the time you knew that you wanted to skip a region, you
may already be using it.

Now the kernel hasn't been booted. We're not using any protected mode
features. Memory hasn't been mapped. These are the only regions that
need to be on good RAM, and they all fit in less than 1M.

I did a quick read of the multiboot standard
( http://www.uruk.org/orig-grub/boot-proposal.html ) and while for
non-ELF kernels there may be a need to be loaded them at specific address
spaces, for ELF-based kernels, the kernel is entirely in 32-bit mode
and appears to get mapped similarly to any other application. The boot
loader itself could use the same technique I described earlier, though
it would want to pass this information via the kernel command-line to
the kernel's BadRAM module.

Do I think this happened in Simón's case? No, not really.

In Simón's case, I suspect all the code address space was fine, and it
was caught in memory allocation. Whether this was memory allocation for
the kernel, or memory allocation for a user-mode application doesn't
matter. The kernel is up and can deal with it appropriate at that point.

Think about it. Assuming initrd and the kernel (sans modules) needs to
land on safe RAM, that's what? 4M? 16M? We've reached a state where 4G
of RAM isn't uncommon for a laptop machine, let alone a desktop.

It isn't absurd that the core OS landed on good address space. If we're
at a point where it is quite possible the only RAM that needs to be
known good is less than 0.09% of the total, then it is only barely
playing the odds. (That number comes from 4M, with a 4G total. Assuming
these numbers unrealistic, 16M in 1G is only 1.56%.)

Cheers,
Steven Black

On Tue, Sep 23, 2008 at 09:13:33AM -0400, Shei, Shing-Shong wrote:
> Abhishek,
>
> Like I said, either the OS or memtest86+ needs to have a known to be
> good area to start with. They won't even be able to function properly
> if the memory they loaded into are bad.
>
> Shing-Shong
>
>
> > True, but what if the kernel already knows that it's a bad memory
> > block. It can allocate the bad memory blocks so that they are not
> > handed out to the userspace programs, thus effectively disabling them.
> > That's exactly how BadRAM works. Coupled with memtest86+, this can be
> > a way around defective RAM modules, albeit not in all cases. The
> > supporting ideology for this is -- Why throw away a chip with a few
> > faulty bits when the kernel can offer you a way around it?
> >
> > Patches for this are already into the mainline kernel
> > http://lwn.net/Articles/274649/
> >
> > -- Abhishek
> >
> _______________________________________________
> BLUG mailing list
> BLUG@linuxfan.com
> http://mailman.cs.indiana.edu/mailman/listinfo/blug

--
Steven Black <blacks@indiana.edu> / KeyID: 8596FA8E
Fingerprint: 108C 089C EFA4 832C BF07 78C2 DE71 5433 8596 FA8E