D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

[LUG] Linux Mint - Bad Kernel?

 

My parents PC has just gone mad - it's got a brand new OCZ Agility4 SSD in it dual booting win8/mint15.

ghost@SILO ~ $ uname -a
Linux SILO 3.8.0-21-generic #32-Ubuntu SMP Tue May 14 22:16:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
ghost@SILO ~ $ lsb_release -rc
Release:    15
Codename:    olivia
ghost@SILO ~ $ dmesg | grep ata | grep error | wc -l
72

The logs are full of ATA errors - sense status warnings, bus resets, the lot: this usually results in mountall failing during boot which disconnects Plymouth and dumps you in a broken system. On the rare times it does boot - with no dmesg errors - the system will usually panic 5-15 minutes in, remount / read-only and generally trash everything. The really weird thing is rebooting to a live USB distro and running fsck.ext4 on the linux partition returns "file system clean"! WTF?

Normally of course, I'd presume the SSD is faulty, but it's not. Now Dad's retired he's become a hardcore FPS gamer and he's happily booted win8 off the same SSD and is currently thrashing the system at 100% usage for hours on end playing FarCry 3: Blood Dragon, without a glitch. Very strange.

I installed both linux-image-3.8.0-19-generic and today's current mainline linux-image-3.10.0-999-generic with similarly strange results: initially the 3.10 kernel booted perfectly with no dmesg errors, the 3.8.0-19 kernel utterly dies on boot and drops to busybox (can't find /). And after crashing into busybox twice in a row, rebooting back into 3.10 is now also broken, despite the fact that booting into live media in between and fsck'ing it once again reported a clean filesystem (which it shouldn't have been considering the prior boot failure). Rebooting 3.8.0-21-generic, the original problem kernel, still results in filesystem errors and crashed Plymouth sessions. These are all amd64 kernels incidentally.

So, what on earth is happening here? I can honestly say that for once, I don't really know. It's a little academic as one thing is for sure, Mint (following some other recent disappointments with it) has just hit the top of my shit-list along with it's failure of a parent, Ubuntu. What are these people playing at? The system is getting converted to Debian Wheezy tomorrow and from now on I'm refusing to install or admin any more Ubuntu/Mint systems.

I'm guessing it's a combination of a bad kernel (3.8.0-21-generic specifically, all was well until yesterday) and with the frequent mount/dismounts during error prone reboots, possibly also the dreaded ext4 data corruption bug from around the turn of the year as well but I thought that was well and truly fixed in the later kernels, and definitely the 3.10 version. I have a feeling that disabling NCQ by adding "GRUB_CMDLINE_LINUX="libata.force=noncq"" to /etc/default/grub would probably alleviate the problem but as I just said, quite frankly I don't care about fixing it any more.

Would be very interested to know if anyone else bumps into this, I don't know how many of you are running Mint 15 amd64 off a SSD but if you are, please let me know.

Regards
-- 
The Mailing List for the Devon & Cornwall LUG
http://mailman.dclug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/listfaq