D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

Re: [LUG] Segmentation fault - how should i fix this?

 

Sam Grabham wrote:
> 
> I have had a few boxes start to degrade be giving me a "Segmentation
> fault" errors while trying to use "vi" or "ls", these servers have been
> in constant use for around 3 years. i would like to repair the install
> to minimize downtime as other command tools seem to work OK.

> I can't see why this is happening as they all have ECC RAM and are of
> high end build and are on good dual  UPS feeds.

Segfault is nearly always a software problem not hardware.

Certainly very unlikely a hardware fault would affect two boxes, or
random disk corruption would affect two boxes, so can probably rule
those out and blame the system admin or malware (which is also the
system admins fault in most places since it presumably managed to get
root privilege).

Setting some environment variable can cause this sort of thing, if you
are poking the ones that change important stuff like linker behaviour,
or memory allocation in GNU libc. But again unlikely to do this, and
most times when this happens you only do it in the script that executes
the problem code.

In Debian "ls" dependencies are:

Pre-Depends: libacl1 (>= 2.2.11-1), libc6 (>= 2.6.1-1), libselinux1 (>=
2.0.15)

vim common
Depends: vim-common (= 1:7.1.314-3+lenny2), vim-runtime (=
1:7.1.314-3+lenny2), libacl1 (>= 2.2.11-1), libc6 (>= 2.7-1), libgpm2
(>= 1.20.4), libncurses5 (>= 5.6+20071006-3), libselinux1 (>= 2.0.59)

So if this was Debian the fault likely lies in libacl1, libc6 or
libselinux1, or some system config issue such as corrupted swap or
filesystem (but the file system fault would have to affect these files,
and occur on multiple boxes), or malware (some Linux malware kernel
modules can mess with system calls ).

Quite likely the problem was caused by an administrative error, such as
installing incompatible library file. Do you have any idea what changes
might have caused it? I'm thinking installing software from source or
outside of regular packaging system, forcing incompatible packages in,
or upgrading to a version not properly tested. If the boxes have similar
hardware it is possible a kernel upgrade might mess stuff up in this way.

If you've no idea what changed, I'd suggest reinstall from known good
media, and checking the system is properly secured. Ideally after trying
to establish likely cause (strace, and gdb are your friends, along with
some of the stuff in binutils).

If you've some idea you may get away with reinstalling relevant packages
(libc6 for example), or backing out the change.

> does any one else get these sort of problems?

Very rarely but usually only when I'm messing around on test servers,
poking stuff I don't yet understand.

More details would help, "segmentation fault" isn't exactly revealing of
itself.


-- 
The Mailing List for the Devon & Cornwall LUG
http://mailman.dclug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/linux_adm/list-faq.html