D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

Re: [LUG] Failing drives? Backing up/upgrading server?

 

Grant Sewell wrote:
> Hi all,
> 
> I have started to have 'problems' with my file/web/mail server.  I am
> getting the following message several times over in dmesg output:
> 
> hdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hdd: dma_intr: error=0x84 { DriveStatusError BadCRC }
> ide: failed opcode was: unknown
> 
> Occasionally I will also get:
> 
> ide1: reset: master: error (0x7f?)
> 
> fdisk shows hdd to be the following (which is correct):
> 
> Disk /dev/hdd: 163.9 GB, 163928604672 bytes
> 255 heads, 63 sectors/track, 19929 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/hdd1               1       19929   160079661   83  Linux
> 
> and "smartctl -a /dev/hdd":
> 
> Model Family:     Maxtor DiamondMax Plus 9 family
> Device Model:     Maxtor 6Y160P0
> Serial Number:    Y46CSYAE
> Firmware Version: YAR41BW0
> User Capacity:    163,928,604,672 bytes
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   7
> ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
> Local Time is:    Sat Jan 26 16:42:22 2008 GMT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> The above errors don't seem to affect general use of the machine,
> however quite concern-making is that recently I have also been getting
> these whenever I try to access *some* parts of the file-system on hdd
> (mounted as /home):
> 
> end_request: I/O error, dev hdd, sector 202454167
> end_request: I/O error, dev hdd, sector 202454663
> end_request: I/O error, dev hdd, sector 202454671
> end_request: I/O error, dev hdd, sector 202454167
> end_request: I/O error, dev hdd, sector 63
> Buffer I/O error on device hdd1, logical block 0
> lost page write due to I/O error on hdd1
> end_request: I/O error, dev hdd, sector 127
> Buffer I/O error on device hdd1, logical block 8
> lost page write due to I/O error on hdd1
> end_request: I/O error, dev hdd, sector 86507599
> Buffer I/O error on device hdd1, logical block 10813442
> lost page write due to I/O error on hdd1
> end_request: I/O error, dev hdd, sector 86507695
> Buffer I/O error on device hdd1, logical block 10813454
> lost page write due to I/O error on hdd1
> end_request: I/O error, dev hdd, sector 86507703
> Buffer I/O error on device hdd1, logical block 10813455
> lost page write due to I/O error on hdd1
> end_request: I/O error, dev hdd, sector 202454167
> end_request: I/O error, dev hdd, sector 86507695
> EXT3-fs error (device hdd1): ext3_get_inode_loc: unable to read inode
> block - inode=5407135, block=10813454 Aborting journal on device hdd1.
> end_request: I/O error, dev hdd, sector 4303
> Buffer I/O error on device hdd1, logical block 530
> lost page write due to I/O error on hdd1
> end_request: I/O error, dev hdd, sector 63
> Buffer I/O error on device hdd1, logical block 0
> lost page write due to I/O error on hdd1
> EXT3-fs error (device hdd1) in ext3_reserve_inode_write: IO failure
> end_request: I/O error, dev hdd, sector 63
> Buffer I/O error on device hdd1, logical block 0
> lost page write due to I/O error on hdd1
> EXT3-fs error (device hdd1) in ext3_dirty_inode: IO failure
> end_request: I/O error, dev hdd, sector 63
> Buffer I/O error on device hdd1, logical block 0
> lost page write due to I/O error on hdd1
> ext3_abort called.
> EXT3-fs error (device hdd1): ext3_journal_start_sb: Detected aborted
> journal Remounting filesystem read-only
> 
> Upon dropping to runlevel 1, then performing "umount /home" I
> immediately get:
> 
> end_request: I/O error, dev hdd, sector 4303
> Buffer I/O error on device hdd1, logical block 530
> lost page write to I/O error on hdd1
> 
> (or something like that)
> 
> Then a fsck /dev/hdd1 returns with:
> end_request: I/O error, dev hdd, sector 69
> (repeated lots, different sectors)
> 
> fsck.ext3: Attempt to read block from filesystem resulted in short read
> whilst trying to open /dev/hdd1 Could this be a zero-length partition?
> 
> Indeed, now an "fdisk -l /dev/hdd" shows:
> end_request: I/O error, dev hdd, sector 0
> printk: 30 messages suppressed.
> Buffer I/O error on device hdd, logical block 0
> (blah blah)
> 
> Reboot and all is file again, until I try to do this again... then I
> get errors again.
> 
> I'm not really sure where to begin.  I've disabled DMA by adding a
> kernel boot parameter of ide=nodma, but that doesn't seem to affect
> this problem at all.  Booting from another medium and fscking both hda1
> and hdd1 come back fine.  When the disks are removed and attached to
> anther machine via a USB-ATA adapter, all is OK, so I'm inclined to
> think it might be the PATA controller on this motherboard (don't ask me
> what it is, I have no idea), however this machine has been working fine
> for ages... and more concerning I used to get these sorts of errors on
> my "old" server before I retired it and performed a
> harddrive-transplant to this "new" computer, and all was fine for a
> while.
> 
> Thanks for reading.  Any ideas?
> 
> Cheers.
> Grant. 
> 

Well for a start I'd get the data off the drive.  Then you could try 
downloading some Maxtor diagnostics software from www.seagate.com 
(Seagate own Maxtor).  Try running a complete test on the drive (you 
should be able to run a complete read test).  If as you say you think 
there is a controller problem, try it on two machines.  Another thing 
you could maybe try too is double check the jumper settings (is 
everything set to Cable Select or Master/Slave) and could the cable be 
faulty?

Hopefully this might give you something to go on.

Rob


-- 
The Mailing List for the Devon & Cornwall LUG
http://mailman.dclug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/linux_adm/list-faq.html