Monday, December 21, 2009

Re: [BLUG] Fwd: possible causes of segfaults

Did you upgrade the system without rebooting it? (That should be
reproducible, though.) Are you using a version of ImageMagick compiled
for a different distribution of Linux? (I know a lot of RPM-based
systems do not bundle a lot of programs.)

If you downloaded an RPM that wasn't compiled specifically for your
distribution/version, I would expect that to be the cause of the
problem. If that's the cause of the problem, building it from source
should clear it up.

If you're using the version of ImageMagick that comes with your
distribution and you've not recently performed an upgrade, memory
corruption seems the most likely candidate. Does that machine have
unparitied memory? Any type of memory other than unparitied would likely
show an error instead of just producing bogus data. (It is why I hate
unparitied memory.)

It could also be a CPU fault.

Problems with hard drives tend to show up as errors with the specific
media. (It'll list the device producing the error.) Hardware problems
regarding media do not normally produce segfaults, unless the
application fails to handle the error case.

My recommendation: Pick an upcoming weekend and tell them the services
of this machine will be unavailable. Then start the memory test at
5:15pm (adjusted for the end of your workday) and run a memory checker
all weekend. Then come in 15 minutes early on Monday to check for errors
and reboot the system.

Cheers,
Steven Black

On Mon, Dec 21, 2009 at 03:58:40PM -0500, Thomas Smith wrote:
> Hiya,
>
> I just installed a new file / computation server at my work, and a few
> days into its tenure, I've started noticing some segfaults.
>
> For instance, I was running a bunch of image processing jobs, and
> ImageMagick's "convert" program segfaulted.  I ran it again on the
> same data, and it did fine.  So I'm wondering if I have bad hardware,
> or bad libraries, or what?
>
> Possible causes of segmentation faults that I know of:
> Hardware --- could be very random and difficult to find, might need to
> totally shut down the server and run a memory tester for days in order
> to find.
>
> Filesystem corruption --- should be reproducible, right?  if "convert"
> segfaults once, it should do it again...
>
> Libraries / os problems --- should be reproducible too, right?
>
>
>
> from dmesg:
> [17217.872070] ld[16027]: segfault at 0 ip 00002ae3b445411b sp
> 00007fffc39a74e8 error 4 in libc-2.9.so[2ae3b43d0000+168000]
> [17354.753195] ld[20115]: segfault at 0 ip 00002b832843d11b sp
> 00007fff729f2fa8 error 4 in libc-2.9.so[2b83283b9000+168000]
> [19463.265457] ld[3673]: segfault at 0 ip 00002ad2f7a3a11b sp
> 00007fffd11023a8 error 4 in libc-2.9.so[2ad2f79b6000+168000]
> [19474.653491] ld[3680]: segfault at 0 ip 00002b7f10f0e11b sp
> 00007fffbac8a978 error 4 in libc-2.9.so[2b7f10e8a000+168000]
> [19507.935271] ld[3687]: segfault at 0 ip 00002af5c9eb511b sp
> 00007fff12fe00d8 error 4 in libc-2.9.so[2af5c9e31000+168000]
> [19528.740436] ld[3701]: segfault at 0 ip 00002b265616a11b sp
> 00007fff2ceb8d98 error 4 in libc-2.9.so[2b26560e6000+168000]
> [19606.865585] ld[3754]: segfault at 0 ip 00002ae3a079811b sp
> 00007fff6bc4d3c8 error 4 in libc-2.9.so[2ae3a0714000+168000]
> [263529.064795] convert[24941]: segfault at 7fffd973b6b8 ip
> 00007fffdb1e2ed9 sp 00007fffd973b660 error 7 in
> libMagickCore.so.1.0.0[7fffdb0fe000+1b5000]
> [268495.776398] convert[28595]: segfault at 7fffc7e2b608 ip
> 00007fffc9e13ed9 sp 00007fffc7e2b5b0 error 7 in
> libMagickCore.so.1.0.0[7fffc9d2f000+1b5000]
>
>
>
> Any advice?
> Thanks,
> -Thomas
>
> _______________________________________________
> BLUG mailing list
> BLUG@linuxfan.com
> http://mailman.cs.indiana.edu/mailman/listinfo/blug

_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug

[BLUG] Fwd: possible causes of segfaults

Hiya,

I just installed a new file / computation server at my work, and a few
days into its tenure, I've started noticing some segfaults.

For instance, I was running a bunch of image processing jobs, and
ImageMagick's "convert" program segfaulted.  I ran it again on the
same data, and it did fine.  So I'm wondering if I have bad hardware,
or bad libraries, or what?

Possible causes of segmentation faults that I know of:
Hardware --- could be very random and difficult to find, might need to
totally shut down the server and run a memory tester for days in order
to find.

Filesystem corruption --- should be reproducible, right?  if "convert"
segfaults once, it should do it again...

Libraries / os problems --- should be reproducible too, right?

from dmesg:
[17217.872070] ld[16027]: segfault at 0 ip 00002ae3b445411b sp
00007fffc39a74e8 error 4 in libc-2.9.so[2ae3b43d0000+168000]
[17354.753195] ld[20115]: segfault at 0 ip 00002b832843d11b sp
00007fff729f2fa8 error 4 in libc-2.9.so[2b83283b9000+168000]
[19463.265457] ld[3673]: segfault at 0 ip 00002ad2f7a3a11b sp
00007fffd11023a8 error 4 in libc-2.9.so[2ad2f79b6000+168000]
[19474.653491] ld[3680]: segfault at 0 ip 00002b7f10f0e11b sp
00007fffbac8a978 error 4 in libc-2.9.so[2b7f10e8a000+168000]
[19507.935271] ld[3687]: segfault at 0 ip 00002af5c9eb511b sp
00007fff12fe00d8 error 4 in libc-2.9.so[2af5c9e31000+168000]
[19528.740436] ld[3701]: segfault at 0 ip 00002b265616a11b sp
00007fff2ceb8d98 error 4 in libc-2.9.so[2b26560e6000+168000]
[19606.865585] ld[3754]: segfault at 0 ip 00002ae3a079811b sp
00007fff6bc4d3c8 error 4 in libc-2.9.so[2ae3a0714000+168000]
[263529.064795] convert[24941]: segfault at 7fffd973b6b8 ip
00007fffdb1e2ed9 sp 00007fffd973b660 error 7 in
libMagickCore.so.1.0.0[7fffdb0fe000+1b5000]
[268495.776398] convert[28595]: segfault at 7fffc7e2b608 ip
00007fffc9e13ed9 sp 00007fffc7e2b5b0 error 7 in
libMagickCore.so.1.0.0[7fffc9d2f000+1b5000]

Any advice?
Thanks,
-Thomas

_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug

Re: [BLUG] power off CD?

On Mon, Dec 21, 2009 at 08:19:44AM -0500, David Ernst wrote:
> On Sun, Dec 20, 2009 at 11:27:21PM -0500, Steven Black wrote:
> >On Fri, Dec 18, 2009 at 01:11:29AM -0500, David Ernst wrote:
> >> So, we put the CD into the CD drive of my Ubuntu (Jaunty)
> >> machine... it spins up, and ... my computer turns off. power off. As
> >> if our power had gone out, but it hadn't.
> >
> >You should check the kernel logs. (/var/log/dmesg.*)
>
> I thought the dmesg logs wrote only boot-up info..? I took a look and
> didn't see anything, any clues on what you think I should be looking
> for?

After boot up the kernel writes logs to a pipe and either syslog-ng
or klogd will read from the pipe and write to a file. It should cover
issues that crop up after boot.

This log gets rotated at boot, so an 'ls -al dmesg*' will show you the
dates and times of the last write to each log. That can be used to find
the correct file, then look at the last line or so.

I don't know about your distribution, but my distro time-stamps each line
with time-since-boot information (in seconds?). This allows you to look
for the jump from boot time to post-boot messages.

However, if it was the CMOS/BIOS causing the shutdown (as I have seen
before) it is unlikely there will be any meaningful entries in the log.

> >Powering off is unusual. Now, kernel halting due to some hardware issue,
> >that's a lot more common.
>
> Agreed. But would the computer really turn off its power on a kernel
> panic?

Not normally, no.

I believe there is logic in place to prevent kernel dead-locks which
may reboot the system -- I thought it performed a reboot and didn't
shut it down, though.

Such an incident would be flagged in the kernel log.

> >I've seen low-level errors power off machines before. If you have logs
> >available in your CMOS/BIOS settings that may shine some light on it.
>
> next time I reboot, I'll try to take a look. :) I don't remember
> ever seeing such a thing, though.

These logs are fairly common for server hardware, but rare for
desktop/laptop hardware. Look for anything with "Log" in the name. Just
a forewarning, sometimes these are not directly human-readiable and just
contain time-stamps and hex numbers.

Cheers,
Steven Black

_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug

Re: [BLUG] power off CD?

On Sun, Dec 20, 2009 at 11:27:21PM -0500, Steven Black wrote:
>On Fri, Dec 18, 2009 at 01:11:29AM -0500, David Ernst wrote:
>> So, we put the CD into the CD drive of my Ubuntu (Jaunty)
>> machine... it spins up, and ... my computer turns off. power off. As
>> if our power had gone out, but it hadn't.
>
>You should check the kernel logs. (/var/log/dmesg.*)

I thought the dmesg logs wrote only boot-up info..? I took a look and
didn't see anything, any clues on what you think I should be looking
for?

>Powering off is unusual. Now, kernel halting due to some hardware issue,
>that's a lot more common.

Agreed. But would the computer really turn off its power on a kernel
panic?

>I've seen low-level errors power off machines before. If you have logs
>available in your CMOS/BIOS settings that may shine some light on it.

next time I reboot, I'll try to take a look. :) I don't remember
ever seeing such a thing, though.

David
_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug