D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

Re: [LUG] Failing drives? Backing up/upgrading server?

 

I try not to top-post, but I figured this would save many of you from
reading this whole lot just to find out that... I forgot to talk about
the backing up/upgrading part of the question. /me is stupid.

Anyway, I'm asking if anyone knows what could be causing these errors
(below) and how to fix them (if possible) to save me from resorting to
potentially reinstalling with new drives.

If, however, it does come down to reinstalling using new drives, what's
the best strategy for finding out what packages I have installed
already (it's a Debian Stable server) so I can make a list of what
needs to be installed on the 'new' machine (if it comes to that), and
would I simply backup /etc (and data 'volumes' such as /home as well,
obviously) and then reinstate it after I had performed a fresh install
with all previously installed packages reinstalled?

Cheers.
Grant.

On Sat, 26 Jan 2008 17:05:44 +0000
Grant Sewell wrote:

> Hi all,
> 
> I have started to have 'problems' with my file/web/mail server.  I am
> getting the following message several times over in dmesg output:
> 
> hdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hdd: dma_intr: error=0x84 { DriveStatusError BadCRC }
> ide: failed opcode was: unknown
> 
> Occasionally I will also get:
> 
> ide1: reset: master: error (0x7f?)
> 
> fdisk shows hdd to be the following (which is correct):
> 
> Disk /dev/hdd: 163.9 GB, 163928604672 bytes
> 255 heads, 63 sectors/track, 19929 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/hdd1               1       19929   160079661   83  Linux
> 
> and "smartctl -a /dev/hdd":
> 
> Model Family:     Maxtor DiamondMax Plus 9 family
> Device Model:     Maxtor 6Y160P0
> Serial Number:    Y46CSYAE
> Firmware Version: YAR41BW0
> User Capacity:    163,928,604,672 bytes
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   7
> ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
> Local Time is:    Sat Jan 26 16:42:22 2008 GMT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> The above errors don't seem to affect general use of the machine,
> however quite concern-making is that recently I have also been getting
> these whenever I try to access *some* parts of the file-system on hdd
> (mounted as /home):
> 
> end_request: I/O error, dev hdd, sector 202454167
> end_request: I/O error, dev hdd, sector 202454663
> end_request: I/O error, dev hdd, sector 202454671
> end_request: I/O error, dev hdd, sector 202454167
> end_request: I/O error, dev hdd, sector 63
> Buffer I/O error on device hdd1, logical block 0
> lost page write due to I/O error on hdd1
> end_request: I/O error, dev hdd, sector 127
> Buffer I/O error on device hdd1, logical block 8
> lost page write due to I/O error on hdd1
> end_request: I/O error, dev hdd, sector 86507599
> Buffer I/O error on device hdd1, logical block 10813442
> lost page write due to I/O error on hdd1
> end_request: I/O error, dev hdd, sector 86507695
> Buffer I/O error on device hdd1, logical block 10813454
> lost page write due to I/O error on hdd1
> end_request: I/O error, dev hdd, sector 86507703
> Buffer I/O error on device hdd1, logical block 10813455
> lost page write due to I/O error on hdd1
> end_request: I/O error, dev hdd, sector 202454167
> end_request: I/O error, dev hdd, sector 86507695
> EXT3-fs error (device hdd1): ext3_get_inode_loc: unable to read inode
> block - inode=5407135, block=10813454 Aborting journal on device hdd1.
> end_request: I/O error, dev hdd, sector 4303
> Buffer I/O error on device hdd1, logical block 530
> lost page write due to I/O error on hdd1
> end_request: I/O error, dev hdd, sector 63
> Buffer I/O error on device hdd1, logical block 0
> lost page write due to I/O error on hdd1
> EXT3-fs error (device hdd1) in ext3_reserve_inode_write: IO failure
> end_request: I/O error, dev hdd, sector 63
> Buffer I/O error on device hdd1, logical block 0
> lost page write due to I/O error on hdd1
> EXT3-fs error (device hdd1) in ext3_dirty_inode: IO failure
> end_request: I/O error, dev hdd, sector 63
> Buffer I/O error on device hdd1, logical block 0
> lost page write due to I/O error on hdd1
> ext3_abort called.
> EXT3-fs error (device hdd1): ext3_journal_start_sb: Detected aborted
> journal Remounting filesystem read-only
> 
> Upon dropping to runlevel 1, then performing "umount /home" I
> immediately get:
> 
> end_request: I/O error, dev hdd, sector 4303
> Buffer I/O error on device hdd1, logical block 530
> lost page write to I/O error on hdd1
> 
> (or something like that)
> 
> Then a fsck /dev/hdd1 returns with:
> end_request: I/O error, dev hdd, sector 69
> (repeated lots, different sectors)
> 
> fsck.ext3: Attempt to read block from filesystem resulted in short
> read whilst trying to open /dev/hdd1 Could this be a zero-length
> partition?
> 
> Indeed, now an "fdisk -l /dev/hdd" shows:
> end_request: I/O error, dev hdd, sector 0
> printk: 30 messages suppressed.
> Buffer I/O error on device hdd, logical block 0
> (blah blah)
> 
> Reboot and all is file again, until I try to do this again... then I
> get errors again.
> 
> I'm not really sure where to begin.  I've disabled DMA by adding a
> kernel boot parameter of ide=nodma, but that doesn't seem to affect
> this problem at all.  Booting from another medium and fscking both
> hda1 and hdd1 come back fine.  When the disks are removed and
> attached to anther machine via a USB-ATA adapter, all is OK, so I'm
> inclined to think it might be the PATA controller on this motherboard
> (don't ask me what it is, I have no idea), however this machine has
> been working fine for ages... and more concerning I used to get these
> sorts of errors on my "old" server before I retired it and performed a
> harddrive-transplant to this "new" computer, and all was fine for a
> while.
> 
> Thanks for reading.  Any ideas?
> 
> Cheers.
> Grant. 
> 

-- 
The Mailing List for the Devon & Cornwall LUG
http://mailman.dclug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/linux_adm/list-faq.html