D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

[LUG] Failing drives? Backing up/upgrading server?

 

Hi all,

I have started to have 'problems' with my file/web/mail server.  I am
getting the following message several times over in dmesg output:

hdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdd: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown

Occasionally I will also get:

ide1: reset: master: error (0x7f?)

fdisk shows hdd to be the following (which is correct):

Disk /dev/hdd: 163.9 GB, 163928604672 bytes
255 heads, 63 sectors/track, 19929 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdd1               1       19929   160079661   83  Linux

and "smartctl -a /dev/hdd":

Model Family:     Maxtor DiamondMax Plus 9 family
Device Model:     Maxtor 6Y160P0
Serial Number:    Y46CSYAE
Firmware Version: YAR41BW0
User Capacity:    163,928,604,672 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:    Sat Jan 26 16:42:22 2008 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

The above errors don't seem to affect general use of the machine,
however quite concern-making is that recently I have also been getting
these whenever I try to access *some* parts of the file-system on hdd
(mounted as /home):

end_request: I/O error, dev hdd, sector 202454167
end_request: I/O error, dev hdd, sector 202454663
end_request: I/O error, dev hdd, sector 202454671
end_request: I/O error, dev hdd, sector 202454167
end_request: I/O error, dev hdd, sector 63
Buffer I/O error on device hdd1, logical block 0
lost page write due to I/O error on hdd1
end_request: I/O error, dev hdd, sector 127
Buffer I/O error on device hdd1, logical block 8
lost page write due to I/O error on hdd1
end_request: I/O error, dev hdd, sector 86507599
Buffer I/O error on device hdd1, logical block 10813442
lost page write due to I/O error on hdd1
end_request: I/O error, dev hdd, sector 86507695
Buffer I/O error on device hdd1, logical block 10813454
lost page write due to I/O error on hdd1
end_request: I/O error, dev hdd, sector 86507703
Buffer I/O error on device hdd1, logical block 10813455
lost page write due to I/O error on hdd1
end_request: I/O error, dev hdd, sector 202454167
end_request: I/O error, dev hdd, sector 86507695
EXT3-fs error (device hdd1): ext3_get_inode_loc: unable to read inode
block - inode=5407135, block=10813454 Aborting journal on device hdd1.
end_request: I/O error, dev hdd, sector 4303
Buffer I/O error on device hdd1, logical block 530
lost page write due to I/O error on hdd1
end_request: I/O error, dev hdd, sector 63
Buffer I/O error on device hdd1, logical block 0
lost page write due to I/O error on hdd1
EXT3-fs error (device hdd1) in ext3_reserve_inode_write: IO failure
end_request: I/O error, dev hdd, sector 63
Buffer I/O error on device hdd1, logical block 0
lost page write due to I/O error on hdd1
EXT3-fs error (device hdd1) in ext3_dirty_inode: IO failure
end_request: I/O error, dev hdd, sector 63
Buffer I/O error on device hdd1, logical block 0
lost page write due to I/O error on hdd1
ext3_abort called.
EXT3-fs error (device hdd1): ext3_journal_start_sb: Detected aborted
journal Remounting filesystem read-only

Upon dropping to runlevel 1, then performing "umount /home" I
immediately get:

end_request: I/O error, dev hdd, sector 4303
Buffer I/O error on device hdd1, logical block 530
lost page write to I/O error on hdd1

(or something like that)

Then a fsck /dev/hdd1 returns with:
end_request: I/O error, dev hdd, sector 69
(repeated lots, different sectors)

fsck.ext3: Attempt to read block from filesystem resulted in short read
whilst trying to open /dev/hdd1 Could this be a zero-length partition?

Indeed, now an "fdisk -l /dev/hdd" shows:
end_request: I/O error, dev hdd, sector 0
printk: 30 messages suppressed.
Buffer I/O error on device hdd, logical block 0
(blah blah)

Reboot and all is file again, until I try to do this again... then I
get errors again.

I'm not really sure where to begin.  I've disabled DMA by adding a
kernel boot parameter of ide=nodma, but that doesn't seem to affect
this problem at all.  Booting from another medium and fscking both hda1
and hdd1 come back fine.  When the disks are removed and attached to
anther machine via a USB-ATA adapter, all is OK, so I'm inclined to
think it might be the PATA controller on this motherboard (don't ask me
what it is, I have no idea), however this machine has been working fine
for ages... and more concerning I used to get these sorts of errors on
my "old" server before I retired it and performed a
harddrive-transplant to this "new" computer, and all was fine for a
while.

Thanks for reading.  Any ideas?

Cheers.
Grant. 

-- 
The Mailing List for the Devon & Cornwall LUG
http://mailman.dclug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/linux_adm/list-faq.html